Stacks on Stacks

A serverless/FaaS technology blog by Stackery

The Economics of Serverless for IoT

The Economics of Serverless for IoT

Author | Nate Taggart

It’s no surprise that the rise of connected devices and the Internet of Things is coinciding with the movement toward Functions-as-a-Service and serverless computing. Serverless, and its near-cousin “edge computing,” are both paradigms of pairing compute with event triggers and IoT opens the door for a whole new breed of event triggers.

In the classic model of the internet (as an aside: have we reached the point where there is now a “classic” internet?), resources were delivered to a user upon the event of a user request. These predominantly static resources were slowly replaced by dynamically computed resources, and over time user requests were augmented with APIs and similiar machine-generated requests. For decades, though, this model was fundamentally simple and worked well. Importantly, because it was driven primarily by human requests it had the advantage of being reasonably predictable, and that made managing costs an achievable human task.

At small scale, a service could simply be deployed on a single machine. While not neccessarily efficient, the cost of a single server has (at least in recent years) become very accessible and efficiency at this scale is not typically of paramount importance. Larger deployments required more complex infrastructure - typically in the form of a load balancer distributing traffic across a pool of compute resources. Still, this predominantly human-requested compute load followed certain patterns. There are predictable peak traffic times, and equally predictable troughs, in human-centric events.

Prior in my career, I led New Relic’s Browser performancing monitoring product. We monitored billions of page loads each day across tens of thousands of sites. At that scale, traffic becomes increasingly predictable. Small fluctuations of any given application are washed out in the aggregate of the world’s internet traffic. With enough scale, infrastructure planning becomes fairly straightforward – there are no spikes at scale, only gently rolling curves.

In the human-centric event paradigm, events are triggered upon human request. While any individual person may be difficult to predict (or maybe not), in aggregate, people are predictable. They typically sleep at night. Most work during the day. Many watch the Superbowl and most that don’t watch the World Cup. You get the idea.

However a major shift is underway, driven by the rapidly growing number of internet-connected devices generating a proliferation of new event-triggers that do not follow human patterns.

The new breed of events

The Internet of Things is still in its infancy, but it’s not hard to see where this trajectory leads. At the very least, it has the potential to increase internet traffic by an order of magnitude, but that’s potentially the easy part in comparison to how it will change events.

While the old event model was human-centric, the new model is device-centric – and the behavior of these devices, the requests they make will in many cases be triggered by events the device senses in its environment. And this is the heart of the problem: the environment is exponentially more unpredictable than people. The old infrastructure model is a poor fit for the dynamicism of the events that IoT gives rise to.

If you need an example of how unpredictable environment-centric events are, just think about airline flight delays. In an industry with razor-thin margins dependent on efficient utilization of multi-hundred-million dollar capital assets, I doubt you’ll find a more motivated group of forecasters. KLM currently ranks best globally for on-time arrivals, and they get it wrong 11.5% of the time. (At the bottom of the rankings are Hainan Airlines, Korean Air, and Air China with a greater than 67% delay rate.) Sure, people play a part in this unpredictability, but weather, natural disasters, government intervention, software glitches, and the complexity of the global airline network all conflate the problem. Predicting environmental events is very, very challenging.

If the best of the best are unable to hit 90% accuracy, how can we ever achieve meaningful infrastructure utilization rates under the current model?

Serverless Economics

One of the underlooked advantages of serverless computing, as offered by AWS Lambda and Azure Functions, is the power of aggregation. Remember, there are no spikes at scale. And it’s hard to envision a greater scale in the immediate future, than that of aggregating the world’s compute into public FaaS cloud offerings. By taking your individual unpredictability alongside that of all of their other customers, AWS and Azure are able to hedge away much of the environmental-event risk. In doing so, they enable customers to run very highly utilized infrastructure, and this will enable dramatic growth and efficiency for the vastly less predictable infrastructure needs of IoT.

What’s more, these public clouds are able to provide not just predictable performance and delivery for connected device manufacturers, but they’re able to provide predictable costs through their pay-per-use model. Why is that? It’s all about timing.

If you’re running servers, the capacity you require is largely dependent on when the traffic hits. In order to handle large spikes, you might bulk up your capacity and thus run at a relatively lower utilization rate – paying for unused capacity over time. This is a reasonable way to decrease failure risk, but it’s an expensive solution.

But serverless infrastructure is time-independent. You pay the same fractional rate per transaction whether all of your traffic hits at once or is perfectly smooth and consistent over time, and this price predictability will accellerate IoT adoption.

A quick refresher on Economics 101: price is determined by supply and demand, but economics of scale drive cost down with increases in volume. Since connected device manufacturers are motivated by profit (the difference between price and cost), they’re incentivized to increase volume (and lower their unit-costs) while filling the market demand. And this is where things get interesting.

A big component of the cost of a connected device is supporting the service (by operating the infrastructure) over the device’s lifespan. In the traditional model, those costs were very difficult to predict, so you’d probably want to estimate conservatively. This would raise the cost, which would potentially raise the device price, which would presumably decrease the demand. As you can see, this is a vicious cycle. Because now with less demand you lose economies of scale which further raises the price, and so on and so forth.

Serverless’ price predictability transforms this into a virtuous cycle of lower (and predictable) costs, which allows for lower prices, which can then increase demand. And, to close the loop, this increase in IoT success paves the way for more serverless usage and better predictability from the increased aggregate traffic volume. Win-win-win.

How Does Docker Fit In A Serverless World?

How Does Docker Fit In A Serverless World?

Author | Chase Douglas @txase

The debut of AWS Lambda in 2014 spawned a debate: serverless vs Docker. There have been countless articles comparing cost efficiency, performance, constraints, and vendor lock-in between the two technologies.

Thankfully, the second half of 2017 has shown this all to be a bit beside the point. With recent product announcements from Azure and AWS, it is more clear than ever that serverless and Docker are not opposing technologies. In this article, we’re going to take a look at how Docker fits into a serverless world.

Docker Isn’t Serverless (circa 2016)

“Serverless” has about a dozen definitions, but a few are particularly important when we talk about Docker:

  • On-demand resource provisioning
  • Pay-per-use billing (vs up-front provisioning based on peak utilization)
  • Elimination of underlying server management

Until recently, Docker-based applications deployed to the major cloud providers were not “serverless” according to these three attributes. In the beginning, Docker containers were deployed to servers directly, and Ops teams had to build custom solutions to provision and manage pools of these servers. In 2014, both Kubernetes and AWS EC2 Container Service (since renamed Elastic Container Service) enabled easier provisioning and management of these server pools.

Kubernetes and AWS ECS provided two key benefits. On the technical side, they provided templates for provisioning pools of servers to run Docker containers, making it easier to get started and maintain for devops teams. On the business side, they provided proof that Docker as a technology was mature enough for production work loads. Partly because of these tools, in the past few years Docker became an increasingly common choice for hosting services.

And yet, with all that Kubernetes and AWS ECS provided, we were still left with the tasks of optimizing resource utilization and maintaining the underlying servers that make up the Docker cluster resource pools.

Docker Is Serverless (2017)

Two new services have brought Docker into the serverless realm: Azure Container Instances and AWS Fargate. These services enable running a Docker container on-demand without up-front provisioning of underlying server resources. By extension, this also means there is no management of the underlying server, either.

According to our definition above, Docker is now “serverless”. Now it starts to make sense to compare Docker and Functions-as-a-Service (FaaS), like AWS Lambda. In one sense, we’ve come full circle back to our familiar comparisons between Docker and “serverless”. Except the goal has shifted from the less useful question of which technology is “better” to the more interesting question of when you should use Docker vs when you should use FaaS.

FaaS vs Docker

Going back to the dozen definitions of “serverless”, there are a few definitions that are now clearly misplaced. They are instead definitions of FaaS:

  • Low latency scaling (on the order of a second or less to invoke computation)
  • Managed runtime (Node.js, Java, Python, etc.)
  • Short-lived executions (5 minutes or less)

The new Docker invocation mechanisms now show how these are not applicable to all forms of “serverless” computing. Serverless Docker has the following characteristics instead:

  • Medium latency scaling (on the order of minutes or less to invoke computation)
  • Complete control of runtime environment
  • Unlimited execution duration

These differences help us determine where Docker fits in the serverless world.

How Does Docker Fit In Serverless?

Now that we have seen how Docker can be serverless and also how it differs from FaaS, we can make some generalizations about where to use FaaS and Docker in serverless applications:

Use Cases For Functions-as-a-Service

  • Low-latency, highly volatile (e.g. API services, database side-effect computation, generic event handling)
  • Short-lived computations (FaaS is cheaper because of faster startup, which is reflected in the per-invocation costs)
  • Where the provided runtimes work (if the runtimes work for your application, let the service provider deal with maintaining them)

Use Cases For Docker

  • Background jobs (where invocation latency is not an issue)
  • Long-lived computations (execution duration is unlimited)
  • Custom runtime requirements

This categorization papers over a few cases and leaves a lot of gray area. For example, a serverless Docker application could still back a low-latency API service by spinning up multiple containers and load balancing across them. But having these gray areas is also helpful because it means that we now have two tools we can choose from to optimize for other concerns like cost.

Taking the low-latency API service example again, a decision could be made between a FaaS and a Docker backend based on the cost difference between the two. One could even imagine a future where base load for a highly volatile service is handled by a Docker-based backend, but peak demand is handled by a FaaS backend.

2018 Will Be Exciting For Serverless In All Forms

Given that it’s the beginning of the new year, it’s hard not to look forward and be excited about what this next year will bring in the serverless space. A little over three years since AWS Lambda was announced it has become clear that building applications without worrying about servers is empowering. With Docker joining the fold, even more exciting possibilities open up for serverless.

AWS Lambda Cost Optimization

AWS Lambda Cost Optimization

Author | Sam Goldstein

Serverless application architectures put a heavy emphasis on pay-per-use billing models. In this post I’ll look at the characteristics of pay-per-use vs. other billing models and discuss how to approach optimizing your AWS lambda usage for optimal cost/performance tradeoffs.

How Do You Want To Pay For That?

There are basically three ways to pay for your infrastructure.

  1. Purchase hardware up front. You install it in a datacenter and use it until it breaks or you replace it with newer hardware. This is the oldest method for managing capacity and the least flexible. IT procurement and provisioning cycles are generally measured in weeks, if not months, and as a result it’s necessary to provision capacity well ahead of actual need. It’s common for servers provisioned into these environments to use 15% of their capacity or less, meaning most capacity is sitting idle most of the time.
  2. Pay-to-provision. You provision infrastructure using a cloud provider’s pay-to-provision Infrastructure as a Service (IaaS). This approach eliminates the long procurement and provisioning cycles since new servers can be spun up at the push of a button. However it’s still necessary to provision enough capacity to handle peak load, meaning it’s typical to have an (often large) buffer of capacity sitting idle, waiting for the next traffic spike. It’s common to see infrastructure provisioned with this approach with an average utilization in the 30-60% range.
  3. Pay-per-use. This is the most recent infrastructure billing model and it’s closely tied to the rise of serverless architectures. Functions as a Service (FaaS) compute services such as AWS Lambda and Azure Functions bill you only for the time your code is running and scale automatically to handle incoming traffic. As a result it’s possible to build systems that handle large spikes in load, without having a buffer of idle capacity. This billing model is gaining popularity since it aligns costs closely with usage and it’s being applied to an increasing variety of services like databases (both SQL and NoSQL) and Docker-based services.

Approaching AWS Cost Optimization

There’s a few things that are important to note before we get into how to optimize your AWS lambda costs.

  1. AWS Lambda allows you to choose the amount of memory you want for your function from 128MB to 3GB.
  2. Based on the memory setting you choose, a proportional amount of CPU and other resources are allocated.
  3. Billing is based on GB-SECONDS consumed, meaning a 256MB function invocation that runs for 100ms will cost twice as much as a 128MB function invocation that runs for 100ms.
  4. For billing purposes the function duration is rounded up to the nearest 100ms. A 128MB function that runs for 50ms will cost the same amount as one that runs for 100ms.

There’s also a few questions you should ask yourself before diving into Lambda cost optimization:

  1. What percentage of my total infrastructure costs is AWS Lambda? In nearly every serverless application FaaS components integrate with resources like databases, queueing systems, and/or virtual networks, and often are a fraction of the overall costs. It may not be worth spending cycles optimizing Lambda costs if they’re a small percentage of your total.
  2. What are the performance requirements of my system? Changing your functions memory setttings can have a significant impact on cold start time and overall run time. If parts of your system have low latency requirements you’ll want to avoid changes that degrade performance in favor of lower costs.
  3. Which functions are run most frequently? Since the cost of a single Lambda invocation is insanely low, it makes sense to focus cost optimization on functions with monthly invocation counts in hundreds of thousands or millions.

AWS Lambda Cost Optimization Metrics

Now let’s look at the two primary metrics you’ll use when optimizing Lambda cost.

Allocated Memory Utilization

Each time a Lambda function is invoked two memory related values are printed to CloudWatch logs. These are labeled Memory Size and Max Memory Used. Memory size is the function’s memory setting (which also controls allocation of CPU resources). Max Memory Used is how much memory was actually used during function invocation. It may make sense to write a Lambda function that parses these value out of Cloudwatch logs, calculates the percentage of allocated memory used. Watching this metric you can decrease memory allocation on functions that are overprovisioned, and watch for increasing memory use that may indicate functions becoming underallocated.

Billed Duration Utilization

It’s important to remember that AWS Lambda usage is billed in 100ms intervals. Like Memory usage, Duration and Billed Duration are logged into Cloudwatch after each function invocation, and these can be used to calculate a metric representing the percentage of billed time for which your functions were running. While 100ms billing intervals are granular compared to most pay-to-provision services there can still be major cost implications to watch out for. Take, for example, a 1GB function that generally runs in 10ms. Each invocation of this function will be billed as if it takes 100ms, a 10x difference in cost! In this case it may make sense to decreae the memory setting of this function, so it’s runtime is closer to 100ms with significantly lower costs. An alternative approach is to rewrite the function to perform more work per invocation (in use cases where this is possible), for example processing multiple items from a queue instead of one, to increase Billed Duration Utilization.

Conversely there are cases where increasing the memory setting can result in lower costs and better performance. Take as an example a 1Gb function that runs in 110ms. This will be billed as 200ms. Increasing the memory setting (which also controls CPU resources) slightly may allow the function to execute under 100ms, which will decrease the billing duration by 50%, and result in lower costs.

The New Cost Optimization

The pay-per-use billing model significantly changes the relationship between application code and infrastructure costs, and in many ways enforces a DevOps approach to managing these concerns. Instead of provisioning for peak load, plus a buffer, infrastructure is provisioned on demand and billed based on application performance characteristics. In general this dramatically simplifies the process of tracking utilization and optimizing costs, but it also transforms this concern. Instead of using a pool of servers and monitoring resource utilization it becomes necessary to track application level metrics like invocation duration and memory utilization in order to fully understand and optimize costs. Traditional application performances metrics like response time, batch size, and memory utilization now have direct cost implications and can be used as levers to control infrastructure costs. This is yet another example of where serverless technologies are driving the convergence of developmental and operational concerns. In the serverless world the infrastucture costs and application performance and behavior become highly coupled.


Dealing with the AWS Lambda invocation payload limits

Dealing with the AWS Lambda invocation payload limits

Author | Apurva Jantrania

If you’ve dealt with lambda functions you may have run across the RequestEntityTooLargeException - * byte payload is too large for the Event invocation type (limit 131072 bytes) AWS Lambda exception that occurs when a function is invoked with too large of a payload. Current AWS Lambda limits are set at 6 MB for synchronous/RequestResponse invocations, and 128 K for asynchronous/Event invocations. I have the feeling that most people don’t give a lot of thought to these invocation limits until something starts failing.

While rare, there are certain scenarios that are more likely to hit these limits - such as invoking an async function from a sync function or dealing with larger than expected results from a DB query. Depending on the situation, there are many ways to deal with this issue.

The easiest solution is to, prior to the function invocation, check the message length and then drop/clip it if it’s too large. This is really only viable if the message is a non-critical in the invocation or if all the critical elements are always at the beginning. While somewhat brittle, this is relatively easy to implement as the code change is isolated to invoking function.

A more robust solution would be do use an Object Store or S3 bucket as your message repository and just pass the message key to the invoked function. This will make sure you never run into this invocation limit but will require changes in both the invoking and invoked functions while also adding latency to every function call as you will need to both store and fetch the message. Deletions can be handled via Object Expiration so as to avoid incurring even more latency.

The third solution is a hybrid solution, which has the same robustness and code impact as using an Object Store, but only incurs the latency penalties if the messages are actually over the limits. Using two wrapper functions, let’s call them hydrate and dehydrate, messages are ‘dehydrated’ before invoking a function, and ‘rehydrated’ inside the invoked function prior to consumption.

For example, in the above Stackery stack, I have Sync Function invoking asynchronously Async Function. I’ve added a port to both functions to a shared Object Store that will be used as the message store if needed. My two wrapper functions are as defined:

function dehydrate (message) {
  if (JSON.stringify(message).length > 131071) {
    console.log('Dehydrating message');
    let key = `${Date.now()}-${Math.floor(Math.random() * 100)}`;
    let params = {
      action: 'put',
      key: key,
      body: JSON.stringify(message)

    return stackery.output(params, { port: 1 })
      .then(() => {
        return { _messageKey: key };
  } else {
    return Promise.resolve(message);
function hydrate (message) {
  if (typeof message === 'object' && Object.keys(message).length === 1 && '_messageKey' in message) {
    console.log(`Hydrating message`);
    let params = {
      action: 'get',
      key: message._messageKey

    return stackery.output(params)
      .then((response) => {
        return JSON.parse(response[0].body.toString());
  } else {
    return Promise.resolve(message);

Note that dehydrate is currently written only with the asynchronous limit in mind, but could easily be expanded to deal with both. Sync Function now needs to invoke dehydrate and would look like this:

const stackery = require('stackery');

module.exports = function handler (message) {
  // Do Stuff

  return dehydrate({ message: message.message })
    .then((dehydratedMessage) => {
      return stackery.output(dehydratedMessage, { waitFor: 'TRANSMISSION' });
    .then(() => {
      return response;
    .catch((error) => {
      console.log(`Sync Function: Error: ${error}`);

And likewise, Async Function needs to hydrate the message before consuming it:

const stackery = require('stackery');

module.exports = function handler (message) {
  return hydrate(message)
    .catch((error) => {
      console.log(`asyncFunction: Error - ${error}`);

function handleMessage (message) {
  // Do Stuff
  return {};

With these two functions (and with Stackery taking care of the tedious setup such as the IAM policies to allow functions to access the S3 bucket) it’s relatively straightforward to implement a robust solution to get around the AWS lambda payload limitations.

There is no serverless

There is no serverless "lock in"

Author | Nate Taggart

If you’ve spent any time researching serverless infrastructure like AWS Lambda and Azure Functions, you’ve probably heard about serverless lock in. The argument goes something like this: because your code is tied to infrastructure, and even to a specific datacenter, on hardware that you don’t control and can’t access, you operate at the mercy of the cloud provider and are thus completely locked in.

This is, in a word, nonsense.

Technology Lock In

Fundamentally all technology choices are a tradeoff between perceived risks and benefits. When you’re responsible for steering an organization’s technology choices it’s critial to consider risks, like being locked into a declining technology, open source community, or vendor. These are balanced against benefits a technology offers, such as development speed, cost to scale, and ease of operations. This is why lock in is an important consideration even when evaluating open source solutions. Before deciding to build on top of an open source project any experienced technical leader will ask how actively maintained and supported that project is. Nobody wants to get led into a technological dead end.

The risk of lock in exists when you build against any standard. For example, Terraform’s proprietary syntax is a form of lock in. Building GPU applications on CUDA is too. So is relying on third-party APIs like Auth0, or standardizing on a CI/CD tool like Jenkins for your team.

So does using serverless technologies create forms of lock in? Yeah, just like every other tech. ¯\_(ツ)_/¯

It’s true that some of the ways that you write and architect your application will be Lambda-specific. Of course, the same would be the case when using Kubernetes, or ec2, or VMWare, or Heroku, or whatever else you could pick to run your application on. This is the nature of making choices.

On the other hand, the most significant forms of lock in aren’t related to serverless (or any stateless compute component). It’s primarily data that locks you in. If you have significant data gravity in AWS, you’ll choose Lambda, not because Lambda has any significant benefits over Azure Functions or any other FaaS option, but because Lambda is a stateless compute component, and it’s an order of magnitude easier to move your compute component to your data than to move your data to your compute component.

FaaS offerings abstract away the underlying infrastructure to such a level as to make the compute environment almost generic. As a result, Lambda’s APIs don’t create significant lock in; this is one of the easiest parts of your infrastructure to change. In counterpoint to the claim that your code is tied to infrastructure, I might point out that the infrastructure is completely invisible to you and that’s not an accident. Can you change it? No. Do you have control? No. Are you better at infrastructure management than AWS? No. Giving up “control” to an expert service provider isn’t new, and it isn’t a drawback, and in comparison to migrating data and event sources between cloud providers, moving stateless functions is a minor concern.

Avoiding Lock In

So let’s get honest here. The alarmist arguments about serverless lock in come from vendors that want you locked in to their specific technology. If your product is a container-specific varient of Linux, then of course you’ll say technologies that abstract away that detail of your runtime environment are a “bad” form of lock in. But that’s just marketing smoke and mirrors.

The right way to evaluate lock in is to look hardest at the parts of your system that are hardest to move. In most cases you’ll want focus on using the cloud your data is in, and where your event sources are. Evaluate which cloud provider you’re most comfortable deploying into and managing systems within. If you’re just moving into the cloud focus lock in decisions on data gravity, event sources, operational integrations.

Oh, and guess what? If you eschew AWS Lambda or serverless in general, you’ll still be locked in to whatever you settle on. And in the meantime, you’ll miss out on the benefits that have driven hundreds of thousands of customers to Lambda over the last couple of years.

Previewable Pull Requests

Previewable Pull Requests

Author | Anna Yovandich

Reviewing changes in a UI as the result of a pull request is a common occurrence in a development team. This typically involves switching the local working branch to the PR branch, compiling a build, viewing it on localhost, then giving functional/behavioral/visual feedback. There are certainly many solutions to alleviate this context and code switch. One we have built and adopted recently uses Stackery as a CI tool to clone, compile, and preview a pull request.

Check out our guide that details how we built it with step-by-step instruction and sample code.


Try Stackery For Free

Gain control and visibility of your serverless operations from architecture design to application deployment and infrastructure monitoring.