Stacks on Stacks

The Serverless Ecosystem Blog by Stackery.

Posts on Performance

Prototyping Serverless Applications
Anna Yovandich

Anna Yovandich | March 08, 2018

Prototyping Serverless Applications

When starting a prototype, it’s easy to get lost in the weeds before anything is built. What helps me before writing code is to outline a build plan that clarifies: What is the simplest approach to build an initial concept? What valuable features reach beyond the basics? How can they be progressively implemented? What architectural resources will be required for each stage?

For instance, I’m building a prototype for a browser-based multiplayer game that tracks player connections, turns, and scores in realtime. To initialize the game, a url will be generated by the “host” player, which will open a socket connection scoped to the url’s unique path. The url serves as the entry point for other players to join the game. A socket connection will enable bi-directional messages to be sent and received between client and server when a new player joins, a player takes their turn, or the game ends. I scoped three build strategies, from feature-light to most-robust – using Stackery to prototype, simplify, and expedite the heavy-lifting.

The first, most feature-light, approach can be achieved using only javascript on the client and server with Express (a node.js application framework) and socket.io (send/receive messages in realtime). When a player creates a new game, a unique game url path will be provided to Express as the endpoint to open a scoped socket connection. The game client will send and receive messages as players join, take turns, and score/win/lose. For lightweight data persistence, localStorage can be used to store game and player data so a game can be rejoined and resumed after a broken connection, by reloading the url. At this point, it would be helpful to test the game on a remote domain. To do this, I’ll create a simple stack with an ObjectStore and a CDN which will provide access to a stackery-stacks.io domain.

The next strategy adds data persistence, beyond localStorage capabilities, that can store user data (profiles), joinable game urls (lobby), and game scores (leaderboards). To quickly prototype these features without much overhead (especially for a frontender like me), it’s Stackery to the rescue. It’s quick to spin-up a Rest Api that will receive user and game data, then send it to a Function node that will pipe it into a Table.

The third, and most robust, implementation adds another Function node to the pipeline above to enable a multitude of user notification potential. When a Table connects its output to a Function, changes in state can be detected by the Function using the transaction events it receives from the Table. The Function can then notify users accordingly, in various ways:

  • Email an invite for another player to join a game
  • Notify a player when it’s their turn
  • Email a player when their highest score is defeated on the leaderboard

A solid starting point is the first approach – relying solely on javascript and the browser for a simple and useable multiplayer experience. From there, advanced features can be prototyped and implemented without too much architectural sweat. Depending on desired behavior (e.g. varying responses to state changes), the Function code will require a range of effort but that’s what’s great about Stackery – when architectural complexity becomes trivial, building behavior becomes central.

Serverless Health Status Dashboard
Sam Goldstein

Sam Goldstein | February 08, 2018

Serverless Health Status Dashboard

Stackery’s Operations Console is the place DevOps teams go to manage their serverless infrastructure and applications. This week we’re announcing the general availability of Serverless Health Dashboards which surfaces realtime health status data for deployed serverless applications. As early adopters of microservice and serverless architectures, we’ve experienced first hand how complexity shifts away from monolithic codebases towards integrating (and reasoning about) many distributed components. That’s why we designed Serverless Health Dashboards to provide visibility into the realtime status of serverless applications, surfacing the key data needed to identify production problems and understand the health of serverless applications.

Once you’ve setup a Stackery account you’ll see a list of all the Cloudformation stacks that you’ve deployed within your AWS account. When you drill into a stack we display a visual representation that shows the stack’s provisioned resources and architectural relationships. I personally love this aspect of the console, since it’s challenging to track the many moving parts of a microservices architecture. Having an always-up-to-date visualization of how all the pieces fit together is incredibly valuable to keeping a team coordinated and up to speed on the systems they manage.

Within the stack visualization we surface key health metrics related to each node. This enables you to assess the operational health of the stack at a glance, and quickly drilldown on the parts of the stack experiencing errors or other problems. When you need to dig deeper to understand complex interactions between different stack components you can access detailed logs, historical metrics, and X-Ray transaction traces through the node’s properties panel.

Getting access to Stackery’s Serverless Health Dashboards requires creating a free Stackery account. You’ll immediately be able to see health status for any application that’s been deployed via AWS CloudFormation, Serverless Framework, or Stackery Deployment Pipeline. We hope you’ll try it out and enjoy the increased visibility into the health and status of your serverless infrastructure.

The Economics of Serverless for IoT
Nate Taggart

Nate Taggart | January 11, 2018

The Economics of Serverless for IoT

It’s no surprise that the rise of connected devices and the Internet of Things is coinciding with the movement toward Functions-as-a-Service and serverless computing. Serverless, and its near-cousin “edge computing,” are both paradigms of pairing compute with event triggers and IoT opens the door for a whole new breed of event triggers.

In the classic model of the internet (as an aside: have we reached the point where there is now a “classic” internet?), resources were delivered to a user upon the event of a user request. These predominantly static resources were slowly replaced by dynamically computed resources, and over time user requests were augmented with APIs and similiar machine-generated requests. For decades, though, this model was fundamentally simple and worked well. Importantly, because it was driven primarily by human requests it had the advantage of being reasonably predictable, and that made managing costs an achievable human task.

At small scale, a service could simply be deployed on a single machine. While not neccessarily efficient, the cost of a single server has (at least in recent years) become very accessible and efficiency at this scale is not typically of paramount importance. Larger deployments required more complex infrastructure - typically in the form of a load balancer distributing traffic across a pool of compute resources. Still, this predominantly human-requested compute load followed certain patterns. There are predictable peak traffic times, and equally predictable troughs, in human-centric events.

Prior in my career, I led New Relic’s Browser performancing monitoring product. We monitored billions of page loads each day across tens of thousands of sites. At that scale, traffic becomes increasingly predictable. Small fluctuations of any given application are washed out in the aggregate of the world’s internet traffic. With enough scale, infrastructure planning becomes fairly straightforward – there are no spikes at scale, only gently rolling curves.

In the human-centric event paradigm, events are triggered upon human request. While any individual person may be difficult to predict (or maybe not), in aggregate, people are predictable. They typically sleep at night. Most work during the day. Many watch the Superbowl and most that don’t watch the World Cup. You get the idea.

However a major shift is underway, driven by the rapidly growing number of internet-connected devices generating a proliferation of new event-triggers that do not follow human patterns.

The new breed of events

The Internet of Things is still in its infancy, but it’s not hard to see where this trajectory leads. At the very least, it has the potential to increase internet traffic by an order of magnitude, but that’s potentially the easy part in comparison to how it will change events.

While the old event model was human-centric, the new model is device-centric – and the behavior of these devices, the requests they make will in many cases be triggered by events the device senses in its environment. And this is the heart of the problem: the environment is exponentially more unpredictable than people. The old infrastructure model is a poor fit for the dynamicism of the events that IoT gives rise to.

If you need an example of how unpredictable environment-centric events are, just think about airline flight delays. In an industry with razor-thin margins dependent on efficient utilization of multi-hundred-million dollar capital assets, I doubt you’ll find a more motivated group of forecasters. KLM currently ranks best globally for on-time arrivals, and they get it wrong 11.5% of the time. (At the bottom of the rankings are Hainan Airlines, Korean Air, and Air China with a greater than 67% delay rate.) Sure, people play a part in this unpredictability, but weather, natural disasters, government intervention, software glitches, and the complexity of the global airline network all conflate the problem. Predicting environmental events is very, very challenging.

If the best of the best are unable to hit 90% accuracy, how can we ever achieve meaningful infrastructure utilization rates under the current model?

Serverless Economics

One of the underlooked advantages of serverless computing, as offered by AWS Lambda and Azure Functions, is the power of aggregation. Remember, there are no spikes at scale. And it’s hard to envision a greater scale in the immediate future, than that of aggregating the world’s compute into public FaaS cloud offerings. By taking your individual unpredictability alongside that of all of their other customers, AWS and Azure are able to hedge away much of the environmental-event risk. In doing so, they enable customers to run very highly utilized infrastructure, and this will enable dramatic growth and efficiency for the vastly less predictable infrastructure needs of IoT.

What’s more, these public clouds are able to provide not just predictable performance and delivery for connected device manufacturers, but they’re able to provide predictable costs through their pay-per-use model. Why is that? It’s all about timing.

If you’re running servers, the capacity you require is largely dependent on when the traffic hits. In order to handle large spikes, you might bulk up your capacity and thus run at a relatively lower utilization rate – paying for unused capacity over time. This is a reasonable way to decrease failure risk, but it’s an expensive solution.

But serverless infrastructure is time-independent. You pay the same fractional rate per transaction whether all of your traffic hits at once or is perfectly smooth and consistent over time, and this price predictability will accellerate IoT adoption.

A quick refresher on Economics 101: price is determined by supply and demand, but economics of scale drive cost down with increases in volume. Since connected device manufacturers are motivated by profit (the difference between price and cost), they’re incentivized to increase volume (and lower their unit-costs) while filling the market demand. And this is where things get interesting.

A big component of the cost of a connected device is supporting the service (by operating the infrastructure) over the device’s lifespan. In the traditional model, those costs were very difficult to predict, so you’d probably want to estimate conservatively. This would raise the cost, which would potentially raise the device price, which would presumably decrease the demand. As you can see, this is a vicious cycle. Because now with less demand you lose economies of scale which further raises the price, and so on and so forth.

Serverless’ price predictability transforms this into a virtuous cycle of lower (and predictable) costs, which allows for lower prices, which can then increase demand. And, to close the loop, this increase in IoT success paves the way for more serverless usage and better predictability from the increased aggregate traffic volume. Win-win-win.

Automatic Image Processing Made Easy
Apurva Jantrania

Apurva Jantrania | October 06, 2017

Automatic Image Processing Made Easy

Handling images as they are uploaded by your users is a process that lends itself well to serverless. However, setting up buckets, functions, and the necessary permissions can become a surprisingly daunting task. Stackery can help by managing all of this complexity and letting you focus on just the image processing development.

In our new guide, you’ll learn how easy it is to trigger a Function that will automatically generate a thumbnail when an image is uplaoded to an ObjectStore. You’ll also get a glimpse into how to work with Stackery while still developing functions with your editor of chioce.

Check out our guide here!

Bastion Nodes For Your Virtual Network
Apurva Jantrania

Apurva Jantrania | July 28, 2017

Bastion Nodes For Your Virtual Network

So you’ve got a Virtual Network set up to secure your resources, fantastic! But sometimes, your users or developers will need access to those secured resources from outside the Virtual Network. Maybe they need to make a quick update to a database, or an unexpected debug session requires a peek into your tables. That’s exactly what a Bastion node is there to do.

The Bastion node allows you to easily grant specific users SSH access to a server inside the Virtual Network that will then let them access your private resources. We also make it easy for you to manage which users have access - all you need is their SSH public key and username, and we will do all the work to create an account on the Bastion server and grant them SSH access. No pesky passwords needed. Users and their keys can even be specified in Configuration Stores, making it easy to manage access from a central location.

We are releasing the new Bastion node today. If you need easy access to your Databases and Docker Services hosted inside a Virtual Network be sure you check it out!

Error Handling In a Serverless World
Chase Douglas

Chase Douglas | March 10, 2017

Error Handling In a Serverless World

Error handling is tricky. The easiest thing you can do with an error, and some argue the best thing to do, is to let the error do its thing and take your application down with it. There is merit to this approach, especially for unhandled exceptions, because by definition the application is in an unknown state.

But errors happen. When they happen, no matter how we handle the error in the moment, we need to learn from them to ensure the error does not happen again. We need a way to report the errors to developers who can fix them.

Error Aggregating Services

There are products that provide libraries to catch uncaught exceptions and report them to back to an error aggregating service. One example is the New Relic Browser product, which I had the fortune of being the technical lead for from concept to GA. New Relic Browser was extra helpful in that it not only caught and reported exceptions occurring within browsers, it was also good at correlating errors across browser vendors and versions. As a developer, you then had a wealth of knowledge to help you determine which errors were occurring most frequently and how to fix them.

The Challenge Of Serverless Error Reporting

Error aggregating services provide a ton of value to developers. But serverless functions present a challenge. Almost all error aggregating services will catch errors and queue them up to be reported periodically. For example, the New Relic Application Monitoring agents batch errors and report them once per minute. In contrast, many serverless functions are written to fail when an error occurs, never to be run again because a fresh function instance can take its place. Delaying the reporting of errors would prevent the errors from ever being reported at all.

Stackery To The Rescue!

At Stackery, we recognize the importance of proper error handling. There must be a way to aggregate errors from serverless functions, and hopefully in a way that allows for flexible handling of error instances. This is why we built the Errors node. As functions run, all error conditions, including timeouts, are captured by Stackery. When an error occurs due to a synchronous invocation from another function, the erroring function will respond with a proper Error object. Further, if an Errors node exists in the stack, all errors will be emitted from the Errors node.

One powerful use case for the Errors node is to send errors to an error aggregating service, like Rollbar. These errors can then be analyzed to determine how to resolve them.

At Stackery we are always working to help our customers build better apps. We hope our error handling features help you reduce mean time to resolution for issues in your apps. Stay tuned for even easier integrations with error aggregating services in the near future!

SQL Databases Rock
Chase Douglas

Chase Douglas | March 03, 2017

SQL Databases Rock

Relational SQL databases have a bad rap these days. Go ask all the startups you know what database technology they are using. Really, go ask.

Ok, how many of them said MongoDB? All of them?!

What’s wrong with SQL databases?

As awesome as SQL databases are, there are not-so-awesome challenges you face when you start to use them. Let’s list the major ones:

  • Requires up-front knowledge, especially since you can’t store any data until you set up your database schema
  • No out-of-the-box indexing beyond primary keys
  • Easy to shoot yourself in the foot with issues like N+1 queries or poor indexing
  • Even for those who plan ahead, high-availability is not very easy to attain

However, all of these issues can be overcome with a bit of research and experience. It’s just not as simple as running a MongoDB server and dumping your arbitrary JSON documents into it.

What do I get with SQL?

SQL setup can be more challenging. But what you get in return is long-term flexibility with reliable performance.

The vast majority of data we store in databases is relational. Product requirements often require locating data based on criteria other than the primary key of the resource. For example, you may need to query for all users between age 18 and 25, which means that unless you have an index on a users’ age you will need to scan the entire users table to find the users who meet the criteria. With SQL databases it is trivial to add an index on a column (or even multiple columns). However, NoSQL databases tend to lack flexible indexing support. Even the best NoSQL databases offer a limited number of secondary indexes for each table, and often times are limited in the coverage of data that can be included in the index, like multi-column support.

SQL databases also tend to have backup solutions that encompass the entire set of data, making it quicker and easier to restore functionality after an outage. NoSQL solutions often rely on their high-availability implementation details to allow for limited numbers of nodes to fail without affecting the integrity of the cluster and its data. But when a datacenter-wide power outage occurs it can take a long time to rebuild the cluster of nodes from backups.

SQL sounds pretty awesome, where do I sign?

Check yo self! As cool as SQL databases are, they are not perfect for every work load. NoSQL solutions like Cassandra and DynamoDB do have their place (though I would hesitate to say the same for MongoDB). They are great at storing high volumes of non-relational data. Timestamped metric data and logs are great use cases for horizontally scalable NoSQL solutions. SQL databases have a hard time with huge amounts of non-relational data because the data can’t be randomly sharded across a cluster of database nodes without impairing support for relational querying. The only real mechanism for horizontal scalability involves manual sharding based on some kind of over-arching key such as a customer account identifier. But this form of non-random sharding can lead to hot-spots, where some shards end up with a much larger than average amount of data or queries executed.

That said, it is definitely possible to scale SQL databases. One notable “big data” company is built on MySQL to this day, and their architecture hasn’t changed much since this article from 2011: New Relic Architecture - Collecting 20+ Billion Metrics A Day.

The next time you start a project and reach for a database, think hard about what kind of database it should be. Most data is highly relational, and the barriers to using SQL databases are more about up-front challenges than they are about long-term challenges (while the opposite is often true for simple NoSQL databases). More often than not, your next database choice should be a SQL database!

Shifting Left with Security
Nate Taggart

Nate Taggart | February 13, 2017

Shifting Left with Security

There’s been a lot of talk lately on Shift Left Testing, which is undoubtedly useful as a DevOps practice. There seems to be less talk on Shift Left Security, which surprises me.

Shifting Left, is a methodology of bringing historically late-stage processes earlier in the development cycle, like shifting left on a Gantt chart in an old waterfall-style release. Shift Left Testing fits well into the mantra, “test early, test often.”

Shifting left with security, therefore, means bringing security reviews, security planning, and security testing earlier in the development and product release cycle. As a component of a healthy DevOps strategy, I believe that Shift Left Security would dramatically improve application security and reduce vulnerabilities.

An obvious question to ask is: why? Why would doing security reviews sooner improve the overall security, particularly if the code is still in flux?

I see two answers to this question. The first, is simply that conducting security reviews sooner provides more time to correct any security flaws before they’re exposed. A review at the eleventh hour, when the project is already over scope and over budget, is too easy to ignore. At that stage, it’s tempting to log security issues as bugs to be prioritized in the backlog and addressed at a later date. Obviously, this increases the risk that security flaws will be shipped in production, even over the reservations of security engineers.

But secondly, and I think more importantly, we’re conditioning engineers to treat security as an afterthought. By conducting security reviews as a final checkoff item as the release is going out the door, we’re subconsciously communicating that security is a task rather than an attribute. In other words, we’re saying that secure apps are “secured” instead of saying that secure apps are “built securely.”

This subtle distinction has, to my mind, a big impact. First, it means that engineers aren’t learning about security and fixing issues in the moment, where they’re most likely to build habits around secure coding. When we treat security as an after-effect, we’re adding it in as a feature on top of work that the engineer previously considered finished.

Additionally, we’re losing the opportunity to fundamentally and structurally build our apps with a security-first mindset. And this, I think, is the real flaw. Secure applications are planned, not happenstance, and they’re fundamentally built on the whiteboard, not with a late-stage checklist.

Ready to Get Started?

Contact one of our product experts to get started building amazing serverless applications quickly with Stackery.

To Top