Stacks on Stacks

The Serverless Ecosystem Blog by Stackery.

Posts on Engineering

Serverless in 2019: From 'Hello World' to 'Hello Production'
Nate Taggart

Nate Taggart | January 04, 2019

Serverless in 2019: From 'Hello World' to 'Hello Production'

A Look Ahead

As the CEO of Stackery, I have had a unique, inside view of serverless since we launched in 2016. I get to work alongside the world’s leading serverless experts, our customers, and our partners and learn from their discoveries. It’s a new year: the perfect time to take stock of professional progress, accomplishments, and goals. The Stackery team has been in this mindset for months, focusing on what 2019 means for this market. After two-and-a-half years of building serverless applications, speaking at serverless conferences, and running the world’s leading serverless company, I have a few ideas of what’s in store for this technology.

1) Serverless will be “managed cloud services,” not “FaaS”

As recently as a year ago, every serverless conference talk had an obligatory “what is serverless” slide. Everyone seemed to have a different understanding of what it all meant. There were some new concepts, like FaaS and “events” and a lot of confusion on the side. By now, this perplexity has been quelled and the verdict is in: serverless is all about composing software systems from a collection of cloud services. With serverless, you can lean on off-the-shelf cloud services resources for your application architecture, focus on business logic and application needs, while (mostly) ignoring infrastructure capacity and management.

In 2019, this understanding will reach the mainstream. Sure, some will continue to fixate on functions-as-a-service while ignoring all the other services needed to operate an application. Others will attempt to slap the name onto whatever they are pitching to developers. But, for the most part, people will realize that serverless is more than functions because applications are more than code.

I predict that the winners in serverless will continue to be the users capturing velocity gains to build great applications. By eschewing the burden of self-managed infrastructure and instead empowering their engineers to pull ready-to-use services off the shelf, software leaders will quickly stand up production-grade infrastructure. They’ll come to realize that this exciting movement is not really “serverless” so much as it is “service-full” - as in applications full of building blocks as a service. Alas, we’re probably stuck with the name. Misnomers happen when a shift is born out of necessity, without time to be fine-tuned by marketing copywriters. I’ll take it.

2) The IT Industrial Complex will throw shade

The IT Industrial Complex has billions of dollars and tens of thousands of jobs reliant on the old server model. And while these vendors are cloud-washing their businesses, the move to serverless renders them much less excited about the cloud-native disruption.

So get ready for even more fear, uncertainty, and doubt that the infrastructure old-guard is going to bring. It won’t be subtle. You’ll hear about the limitations of serverless (“you can’t run long-lived jobs!”), the difficulty in adoption (“there’s no lift-and-shift!”), and the use cases that don’t fit (“with that latency, you can’t do high-frequency trading!”). They’ll shout about vendor lock-in — of course they’d be much happier if you were still locked-in with their physical boxes. They’ll rail against costs (“At 100% utilization, it’s cheaper to run our hardware”), and they’ll scream about how dumb the name “serverless” is (you’ve probably gathered that I actually agree with this one).

I’d rather write software than patch infrastructure any day.

The reality? The offerings and capabilities of the serverless ecosystem are on an improvement velocity, unlike anything the IT infrastructure market has ever delivered. By the end of 2019, we’ll have more languages, more memory, longer run times, lower latency, and better developer ergonomics. They’ll ignore the operational cost of actually running servers — and patching, and scaling, and load-balancing, and orchestrating, and deploying, and… the list goes on! Crucially, they’ll ignore the fact that every company invested in serverless is able to do more things faster and with less. Serverless means lower spend, less hassle, more productive and focused engineers, apps with business value, and more fun. I’d rather write software than patch infrastructure any day.

Recognize these objections for what they are: the death throes of an out-of-touch generation of technology dinosaurs. And, as much as I like dinosaurs, I don’t take engineering advice from them.

3) Executives will accelerate pioneering serverless heroes

Depending on how far your desk is from the CEO of your company, this will be more or less obvious to you, but: your company doesn’t want to invest in technology because it’s interesting. Good technology investments are fundamentally business investments, designed to drive profits by cutting costs, innovation, or both.

Serverless delivers on both cost efficiency and innovation. Its pay-per-use model is substantially cheaper than the alternatives and its dramatically improved velocity means more business value delivery and less time toiling on thankless tasks. The people who bring this to your organization will be heroes.

So far, most organizations have been adopting serverless from the bottom-up. Individual developers and small teams have brought serverless in to solve a problem and it worked. But in 2019 a shift will happen. Project milestones will start getting hit early, developers will be more connected to customer and business needs, and IT spend will come in a little lower than budgeted… And the executive team is going to try to find out why, so they can do more of it.

So my prediction is that in 2019, serverless adoption will begin to win executive buy-in and be targeted as a core technology initiative. Serverless expertise will be a very good look for your team in 2019.

4) The great monolith to serverless refactoring begins

While greenfield apps led the way in serverless development, this year, word will get out that serverless is the fastest path to refactoring monoliths into microservices. In fact, because serverless teams obtain significant velocity from relying largely on standard infrastructure services, many will experience a cultural reset around what it means to refactor a monolith. It’s easier than ever before.

While “you can’t lift and shift to serverless” was a knock in 2018, 2019 will show the enterprise that it’s faster to refactor in serverless than migrate. They will see how refactoring in serverless takes a fraction of the time we thought it would take for a growing number of applications. Check out the Strangler Pattern to see how our customers are doing this today. When you combine this method with Lambda Layers and the rapid march of service innovations, the options for evolving legacy applications and code continue to broaden the realm of where serverless shines.

5) Serverless-only apps will transition to serverless-first apps

“Hello World” applications in tutorials are good fun and their initial functions deliver rapid purpose without an operations team. They are great wins for serverless.

However, when it comes to building serverless business applications, every software team will need to incorporate existing resources into their applications. Production databases and tables, networks, containers, EC2 instances, DNS services, and more. Today, complex YAML combined with the art of managing parameters across dev, test, staging, and production environments hold many teams back from effectively building on what already exists. A note: Stackery makes using existing resources across multiple environments easy.

In 2019, serverless will serve you more.

In 2019, we’ll see enormous growth in applications that are serverless-first, but not serverless only. The “best service for the job” mantra is already driving teams under pressure to deliver results to serverless. We believe teams who want to move fast will turn to serverless for most of what they need, but won’t live in a serverless silo.

To conclude: In 2019, serverless will serve you more.

All of these predictions add up to one obvious conclusion from my perspective: Serverless is finally mainstream and it’s here to stay. Stackery already helps serverless teams accelerate delivery from “Hello World” to “Hello Production”. We’d love to help your team, too.

Conquering a Double-Barrel Webpack Upgrade
Anna Yovandich

Anna Yovandich | December 20, 2018

Conquering a Double-Barrel Webpack Upgrade

Over the last couple of weeks, we’ve prioritized some sustaining product goals to polish the codebase and update some big ticket dependencies. Among those updates were: React, Redux, and Webpack - the biggies. The first two were pretty painless and inspired the confidence to approach updating Webpack from v2 to v4 like maybe no big deal! Though confidence level was on high, I felt a slight chill and a twinge of doubt by the prospect of making changes to our build configs.

Enter Webpack 4

The latest version of Webpack has the lowest barrier to entry of any other version. Its new mode parameter comes with default environment configs and enables built-in optimizations. This “no config” option is ideal for a new project and/or a newcomer to Webpack that wants to get started quickly. Migrating an existing config is a little trickier but following the migration guide got our development environment in pretty good shape. I was pleasantly shocked by the Webpack documentation. It’s thorough, well organized, and has improved significantly from the early days of v1.

Development Mode

To begin migrating our development config, I added the new mode property, removed some deprecated plugins, and replaced autoprefixer with postcss-preset-env in the post-css-loader plugin config. Starting the dev server (npm start) at this point led to the first snag: this.htmlWebpackPlugin.getHooks is not a function. Hunting that error landed in an issue thread, suggesting a fix - which did the trick. Development mode: good to go. Confidence mode: strong.

Production Mode

Continuing migration with the production config was a similar process. We have a fairly standard setup to compile the static build directory: transpile (ES6 and JSX) and minify JS; transform, externalize, and minify CSS; then generate an index.html file to tie it all together. However, running the production build (npm run build) was a different story.

FATAL ERROR

The first issue was harsh: FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory. Ooof! Lots of searching and skimming repeatedly offered the same suggestion: to pass an argument to the node process --max_old_space_size=<value> which increases the heap memory allocation. It felt like slapping some tape on a shiny new toy but it enabled the build process to complete successfully.

Feeling unsatisfied with band-aiding an ominous failure, I investigated why the build was consistently choking on source map generation and here is where I discovered a 2 alarm fire:

  1. Our main (and only) bundle is 1.6MB
  2. That one giant bundle is accompanied by a behemoth source map…19MB to be exact. 😱 Not ok.
Code Splitting

First, the bundle needs to be split by configuring optimization.splitChunks. Then, the vendor source maps need to be excluded by configuring SourceMapDevToolPlugin exclude option. An important step when using SourceMapDevToolPlugin, is setting dev-tool: false. Otherwise, the its configuration (with exclude rules) will get trampled by Webpack’s dev-tool operation and output another monster source map (mapping the entire build again).

devtool: false,
optimization: {
 splitChunks: {
  chunks: 'all',
  name: true,
  cacheGroups: {
   vendors: {
    test: /[\\/]node_modules[\\/].*\.js$/,
    filename: 'static/js/vendors.[chunkhash:8].js',
    priority: -10
   },
   default: {
    minChunks: 2,
    priority: -20,
    reuseExistingChunk: true
   }
  }
 }
}
...
plugins: [
 new webpack.SourceMapDevToolPlugin({
  filename: 'static/js/[name].[chunkhash:8].js.map',
  exclude: /static\/js\/vendors*(.+?).js/
 })
]

With the build output in much better shape (though the vendors bundle should be further split into smaller chunks), I try removing the node argument band-aid and re-running the build command (sans gargantuan source map). Success! The fatal error was almost exclusively due to source mapping one enormous build.

Minify CSS

Now the build succeeds and I’m cookin with gas. However, the CSS file is much bigger than it used to be…it’s no longer minified. One of the plugins that changed with this upgrade was replacing ExtractTextPlugin with MiniCssExtractPlugin (extracts all css modules into a separate file). However, MiniCssExtractPlugin does not handle minification(https://github.com/webpack-contrib/mini-css-extract-plugin#minimizing-for-production) like ExtractTextPlugin did. To minify CSS, the OptimizeCSSAssetsWebpackPlugin (aka OCAWP) is necessary.

To include OCAWP, add optimization.minimizer configuration to the module:

optimization: {
 minimizer: [
  new OptimizeCSSAssetsWebpackPlugin({
   cssProcessorOptions: {
    parser: require('postcss-safe-parser'),
    map: {
     inline: false,
     annotation: true
    }
   },
   cssProcessorPluginOptions: {
    preset: ['default', {
     discardComments: {
      removeAll: true
     }
    }]
   }
  })
 ]
}

Now, CSS is minified but…JavaScript is not. 😑 Hoo boy.

Minify JS

By default, Webpack uses UglifyJs to minify JavaScript. When optimization.minimizer is customized (in this case for CSS minification), JS minification needs to be explicitly handled as well. Now the optimization.minimizer config contains OCAWP and UglifyJs but the build script fails again - citing: Unexpected token: keyword (const) error from UglifyJs. Siiigh.

It turns out, uglify-js (the parser used by UglifyJsWebpackPlugin) does not support ES6 uglification. The maintainer of UglifyJsWebpackPlugin, as well as the Webpack docs urge the adoption of TerserWebpackPlugin instead. This works out great, since the next version of Webpack will use Terser as its default minifier. Thank you, next!

optimization: {
 minimizer: [
  new OptimizeCSSAssetsWebpackPlugin({...}),
  new TerserWebpackPlugin({
   sourceMap: true,
   parallel: true,
   terserOptions: {
    parse: {
     ecma: 8
    },
    compress: {
     ecma: 5,
     warnings: false,
     comparisons: false,
     inline: 2
    },
    output: {
     ecma: 5,
     comments: false,
     ascii_only: true
    }
   }
  })
 ]
}

The production build is finally compiling as expected. There are still improvements to be made but I will rest easier knowing that this configuration isn’t exploding CPUs and that I have a better grip on optimizations going forward.

It’s been a tough and humbling week. Configuring Webpack’s loaders and plugins correctly can feel overwhelming - there are countless options and optimizations to understand. If you or someone you love is going through frontend dependency hardships, just know: it gets better and you are not alone. Hang in there!

Webhooks Made Easy with Stackery
Anna Spysz

Anna Spysz | November 08, 2018

Webhooks Made Easy with Stackery

Webhooks are about as close as you can get to the perfect serverless use case. They are event-driven and generally perform just one (stateless) function. So of course we wanted to show how easy it is to implement webhooks using Stackery.

Our newest tutorial, the Serverless Webhooks Tutorial, teaches you to create a GitHub webhook and connect it to a Lambda function through an API Gateway. Or, to put it simple terms: when your GitHub repository does a thing, your function does another thing. What that second thing does is completely up to you.

Here are some possible use cases of a GitHub webhook:

  • Connect your webhook to the Slack API and have Slack ping your team members when someone has opened a PR
  • Have your function deploy another stack when its master branch is updated
  • Expanding on that, you can even have your function deploy to multiple environments depending on which branch has been updated
  • Write an Alexa Skill that plays a certain song when your repository has been starred - the possibilities are endless!

The best part is, GitHub allows you to be very specific in what events you subscribe to, and you can further narrow down events in your function logic.

So for example, do you want to be notified by text message every time Jim pushes a change to the master branch of your repository, because Jim has been known to push buggy code? You can set that up using webhooks and Stackery, and never have master go down again (sorry, Jim).

Check out the tutorial to see what else you can build!

Building a Single-Page App With Stackery & React
Jun Fritz

Jun Fritz | October 30, 2018

Building a Single-Page App With Stackery & React

After completing this tutorial, you’ll have a serverless SPA built using Stackery and React. Stackery will be used to configure, deploy, and host our application which will be built using the React library.

The newest tutorial on our documentation site guides you through the process of building a Serverless Single-Page App using Stackery and React.

You’ll be using Stackery to set up the cloud resources needed to deploy, host, and distribute your single-page application. You’ll configure a Lambda function, an S3 Bucket, and a CloudFront CDN in this tutorial with the goal of keeping this application within AWS Free Tier limits.

By the end of this tutorial, you’ll have a fully-scalable backend and an organized React front-end to add to, and grow your application. Watch part one of the tutorial below to see what we’re building, or follow along with the plain-text version here.

Stay tuned for more serverless tutorials from Stackery!

Building Serverless Applications with AWS Amplify
Danielle Heberling

Danielle Heberling | October 24, 2018

Building Serverless Applications with AWS Amplify

So you want to use AWS Cognito to authenticate users and have your user pool, identity pool, and app client all set up in the AWS console. …the next question is how can you connect this with your React based frontend? While there are a few ways to go about doing this, this post is going to give you a brief overview on how to do this via a library called AWS-Amplify.

AWS-Amplify is an open source project managed by AWS described as “a declarative JavaScript library for application development using cloud services.” I liked this particular library, because it has a client first approach and abstracts away some of the setup required in the JavaScript SDK.

My favorite features of Amplify are: Authentication (via Cognito), API (via API Gateway), and Storage (via S3), but this library has a lot more to offer than just those features. This post will focus on how to authenticate users from a React based frontend…more specifically user signup that has an email address verification step.

The Setup

First you’ll need to setup a config file to reference your already created AWS resources (in this case the user pool, identity pool, and client id) in your /src folder. The file will look something like this :

src/config.js

export default {
   cognito: {
    REGION: ‘YOUR_COGNITO_REGION’,
    USER_POOL_ID: ‘YOUR_USER_POOL_ID’,
    APP_CLIENT_ID: ‘YOUR_APP_CLIENT_ID’,
    IDENTITY_POOL_ID: ‘YOUR_IDENTITY_POOL_ID’
  }
};

Then in your index.js file where you setup your react app, you’ll need to configure aws Amplify. It’ll look similar to this:

src/index.js

import React from ‘react’;
import ReactDOM from ‘react-dom’;
import Amplify from ‘aws-amplify’;

import config from ‘./config’;
import App from ‘./App’;

Amplify.configure({
  Auth: {
    mandatorySignIn: true,
    region: config.cognito.REGION,
    userPoolId: config.cognito.USER_POOL_ID,
    identityPoolId: config.cognito.IDENTITY_POOL_ID,
    userPoolWebClientId: config.cognito.APP_CLIENT_ID
  }
});

ReactDOM.render(
  <Router>
    <App />
  </Router>,
  document.getElementById(‘root’)
);

The mandatorySignIn property is optional, but is a good idea if you are using other AWS resources via Amplify and want to enforce user authentication before accessing those resources.

Also note that for now having a separate config file might seem a bit overkill, but once you add in multiple resources (i.e. Storage, API, Pub Sub etc.) you’ll want that extra config file to keep things easy to manage.

Implementation Overview

The signup flow will look like this:

  1. The user submits what they’ll use for login credentials (in this case email and password) via a signup form and a second form to type in a confirmation code will appear.
  2. Behind the scenes the Amplify library will sign the user up in Cognito.
  3. Cognito will send a confirmation code email to the user’s signup email address to verify that the email address is real.
  4. The user will check their email > get the code > type the code into the confirmation form.
  5. On submit, Amplify will send the information to Cognito which then confirms the signup. On successful confirmation, Amplify will sign the user into the application.

Implementation Part 1

First in your signup form component, you’ll need to import Auth from the Amplify library like this:

import { Auth } from ‘aws-amplify’;

As you create your form, I’d suggest using local component state to store the form data. It’ll look like your typical form with the difference being using the Amplify methods in your handleSubmit function whenever the user submits the form. The handleSubmit function will look like this:

 handleSubmit = async event => {
    event.preventDefault();

    try {
      const newUser = await Auth.signUp({
        username: this.state.email,
        password: this.state.password
      });
      this.setState({
        newUser
      });
    } catch (event) {
      if (event.code === ‘UsernameExistsException’) {
        const tryAgain = await Auth.resendSignUp(this.state.email);
        this.setState({
          newUser: tryAgain
        });
      } else {
        alert(event.message);
      }
    }
  }

On success, Amplify returns a user object after the signUp method is called, so I’ve decided to store this object in my component local state so the component knows which form to render (the signup or the confirmation).

Before we continue let’s go over a quick edge case. So if our user refreshes the page when on the confirmation form and then tries to sign up again with the same email address, they’ll receive an error that the user already exists and will need to signup with a different email address. The catch block demonstrates one way of handling that possibility by resending the signup code to the user if that email is already present in Cognito. This will allow the user to continue using the same email address should they refresh the page or leave the site before entering the confirmation code.

Implementation Part 2

So now the user is looking at the confirmation form and has their confirmation code to type in. We’ll need to render the confirmation form. Similar to the signup form it’ll look like a typical form with the exception being the function that is called whenever the user submits the confirmation form. The handleSubmit function for the confirmation form will look similar to this when using Amplify:

 handleConfirmationSubmit = async event => {
    event.preventDefault();

    try {
      await Auth.confirmSignUp(this.state.email, this.state.confirmationCode);
      await Auth.signIn(this.state.email, this.state.password);

      this.props.isAuthenticated(true);
      this.props.history.push("/");
    } catch (event) {
      alert(event.message);
    }
  }

So it is taking in the form data, using Amplify to confirm the user’s email address via the conformation code and signing in the user if successful. You can then verify if a user is signed in via props at the route level if you’d like. In this case, I arbitrarily named it isAuthenticated and redirected the user to the root path.

The complete docs for using the Auth feature of Amplify can be found here. We’ve only scratched the surface in this post, so go forth and explore the all of the different features that Amplify has to offer. I’ve found it has a very nice declarative syntax and is very readable for folks who are new to a codebase. For building further on your React-based serverless applications, I highly recommend Stackery for managing all of your serverless infrastructure backed up by seamless, git-based version control.

The '8 Fallacies of Distributed Computing' Aren't Fallacies Anymore
Apurva Jantrania

Apurva Jantrania | October 23, 2018

The '8 Fallacies of Distributed Computing' Aren't Fallacies Anymore

In the mid 90’s, centralized ‘mainframe’ systems were in direct competition with microcomputing for dominance of the technology marketplace and developers’ time. Peter Deutsch, a Sun Microsystems engineer who was a ‘thought leader’ before we had the term, wrote seven fallacies that many developers assumed about distributed computing, to which James Gosling added one more to make the famous list of The 8 Fallacies of Distributed Computing.

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn’t change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Microcomputing would win that debate in the 90’s, with development shifting to languages and platforms that could be run on a single desktop machine. Twenty years later we’re still seeing these arguments used against distributed computing, especially against the most rarefied version of distributed computing, serverless. Recently, more than one person has replied to these fallacies by saying they ‘no longer apply’ or ‘aren’t as critical’ as they once were, but the truth is none of these are fallacies any more.

How can these fallacies be true? There’s still latency and network vulnerabilities.

Before about 2000, the implied comparison with local computing didn’t need to be stated: “The network is more reliable than your local machine…” was obviously a fallacy. But now that assumption is the one under examination. Networks still have latency, but the superiority of local machines over networks is now either in doubt or absent.

1. The Network is Reliable

This is a pretty good example of how the list of fallacies ‘begs the question.’ What qualifies as ‘reliable?’ Now that we have realistic stats about hard drive failure rates, even a RAID-compliant local cluster has some failure rate.

2. Latency is Zero

Latency is how much time it takes for data to move from one place to another (versus bandwidth, which is how much data can be transferred during that time). The reference here is to mainframe ‘thin client’ systems where every few keystrokes had to round-trip to a server over an unreliable, laggy network.

While our networking technology is a lot better, another major change ha been effective AJAX and async tools that check in when needed and show responsive interfaces all the time. On top of a network request structure that hasn’t been much updated since the 90’s, and a browser whose memory needs seem to double annually, we still manage to run cloud IDE’s that perform pretty well.

3. Bandwidth is Infinite

Bandwidth still costs something, but beyond the network improvements mentioned above, the cost of bandwidth has become extremely tiny. While bills for bandwidth do exist, and I’ve even seen some teams optimize to try and save on their bandwidth costs!

In general this costs way more in developer wages than it saves, and brings us to the key point. Bandwidth still costs money, but the limited resource is not technology but people. You can buy a better network connection at 2AM on a Saturday, but you cannot hire a SQL expert who can halve your number of DB queries.

4. The Network is Secure

As Bloomberg struggles to back up its reports of a massive hardware bugging attack against server hardware, many people want to return to a time when network’s were inherently untrustworthy. More accurately, since few developers can do their jobs without constant network access for at least Github and NPM, untrustworthy networks are an easy scapegoat for poor operational security practices that almost everyone commits.

The Dimmie attack which peaked in 2017 targeted the actual spot where most developers are vulnerable: their laptops. With enough access to load in-memory modules on your home machine, attackers can corrupt your code repos with malicious content. In a well-run dev shop it’s the private computing resources that tend to be entry points for malicious attacks. The laptops that we take home with us for personal use that should be the least trustworthy component.

5. Topology Doesn’t Change

With the virtualization options available in something serverless like AWS’s Relational Database Service (RDS), it’s likely that topology never has to change from the point of view of the client. On a local or highly controlled environment there are setups where no DB architectures, interfaces, or request structures have changed in years. This is called ‘Massive Tech Debt’.

6. There is One Administrator

If this isn’t totally irrelevant (no one works on code that has a single human trust source anymore, or if they do that’s… real bad get a second Github admin please), it might still be a reason to use serverless and not roll your own network config.

For people just dipping a toe in to managed services, there are still horror stories about the one single AWS admin leaving for 6 weeks of vacation, deciding to join a monastery, and leaving the dev team unable to make changes. In those situations where there wasn’t much consideration of the ‘bus factor’ on your team there still is just one administrator it’s the service provider and as long as you’re the one paying for the service you can wrest back control.

7. Transport Cost is Zero

Yes, transport cost is zero. This one is just out of date.

8. The Network is Homogeneous

Early networked systems had real issues with this, I am reminded of the college that reported they could only send emails to places within 500 miles, there are ‘special places’ in a network that can confound your tests and your general understanding of the network.

This fallacy isn’t so much true as now the awkward parts of a network are clearly labelled as such. CI/CD is explicitly testing in a controlled environment, and even AWS which does its darndest to present you with a smooth homogenous system intentionally makes you aware of geographic zones.

Conclusions

We’ve all seen people on Twitter pointing out an AWS outage shouting how this means we should ‘not trust someone else’s computer’ but I’ve never seen an AWS-hosted service have 1/10th the outages of a self-hosted services. Next time someone shares a report of a 2 hour outage in a single Asian AWS region, ask to see their red team logs from the last 6 months.

At Stackery, we have made it our mission to make modern cloud infrastructure as accessible and useful as possible. Get your engineering team the best solution for building, managing and scaling serverless applications with Stackery today.

What Successful Serverless Teams Know
Nate Taggart

Nate Taggart | October 10, 2018

What Successful Serverless Teams Know

Shipping serverless applications feels good. And it should! Serverless lets us focus on our software and ignore the tedium of managing servers. You download a framework, write a little code, and deploy your first Lambda function. Congrats! You’re a serverless developer!

But, as you run through that first “Hello, world” serverless tutorial, you might notice that you’re cutting a few corners that you can’t really cut in a professional setting. Security? Permissions? Secrets management? Dev environments? Testing? CI/CD? Version control? And the other two hundred little details that matter when you’re doing professional software development with a team.

On the one hand, these are solvable problems. On the other hand, though, if you have to re-invent the wheel for the development and operations cycle, maybe you won’t get to focus on the code as much as you thought…

Successful Serverless Teams

Successful serverless teams use software tools to solve these challenges. They deliver projects on time and reliably by automating the manual, error-prone parts of serverless development. While we could write a book on all of the best team ergonomics for serverless, let’s focus on the big three areas where you’ll want a tool: configuration, release automation, and visibility.

Configuration

Regardless of which framework you choose, once you get past your first “Hello, World” function, you’re going to have to start writing configuration code. Congrats! You’re a serverless YAML developer!

You (and everyone else on your team) will need to learn to configure every single cloud resource you want to use down to the smallest details. Event streams, VPCs, API gateways, datastores, etc, etc. And I mean really down in the weeds here – like, the be ready to map your IP Routing Tables kind-of-in-the-weeds…

The right tooling can automate this configuration for you and let you pull pre-configured resources off the shelf and into your framework automatically. That’s trickier than it sounds! Most “resources” are actually a collection of services. It’s not enough just to say “I need an API,” you’ll be configuring IAM roles as part of the assembly process, unless you have professional tooling.

Oh, and um, this is awkward… everyone on your team is going to have their own configuration file. Each developer will need to sandbox their own resource instances with scoped IAM roles and namespace their resources so you don’t overwrite each other with collisions. Even with master-level git-fu, this is really hard. That’s coming from me, and I came to Stackery from GitHub.

Release Automation

For serverless release automation, we’re going to need to figure out how to solve a few specific challenges: defining deployment stages, managing permissions, and integrating into a central CI/CD pipeline.

Once you’ve got your application built and your infrastructure configured, you’re ready to deploy. For your first app, that probably meant giving your framework God-like privileges in your personal AWS account. Yeah, ok, no, we’re not going to do that at work, in production. Right?

For serverless release automation, we’re going to need to figure out how to solve a few specific challenges: defining deployment stages, managing permissions, and integrating into a central CI/CD pipeline.

Managing deployment stages is a very similar problem to juggling your multiple configuration files. In fact, you could just define each stage in that one file… except that now when you make a configuration change, you have to remember to make it in every environment. I’m not pointing fingers here, but it’ll probably get messed up by someone at some point. And that will suck. Plus, these environments each have their own secrets and environment parameters which you’ll want to keep out of version control (and out of your config file) but available to the newly provisioned resources.

We’ll also want to create limited access roles for provisioning which, unfortunately, some frameworks just don’t support. This is why Stackery’s CLI leverages your existing user roles to enforce your access policy, rather than requiring admin rights to your AWS account like other tools.

Finally, while you could write your own scripts, scripting up serverless deployments can be tricky and brittle. With the right CLI tool, you can simply drop it into your CI/CD pipeline and have it automatically support your deployment stages and environment parameters.

Serverless Visibility

When you’re developing an application to run on static infrastructure (you know, the old way with servers), it’s pretty easy to visualize the architecture in your head. There’s an app; it’s on a server. If someone makes a change, the architecture remains stable. If there’s an error, it’ll show up in the server logs. Need metrics? Dropping a library or agent in one place will do the trick. Pretty straightforward.

With serverless, visibility suddenly becomes way more important. The dynamic architecture changes as your team builds more functions. Errors and performance bottlenecks can get distributed to other services. Logs and metrics collection need to be in place in advance – once that function instance dies, it and its data are gone forever.

It may not be obvious in advance, but the day will come when having a place to quickly glance and see a real-time view of your application architecture and performance will save you. Plan accordingly.

Get Back to Development

While we focused on three big challenges, the truth is that there are a lot. Centralized build process, dependency management, standardized instrumentation, error monitoring, and on and on. Pioneering teams have solved most of the above, for the rest of you, we’re making sure you can do all of the above without having to build it yourself.

The leading serverless teams today spent the last two or three years solving these challenges. Again, they are solvable. But if you’re trying to deliver your application and meet your deadlines (and not create a bunch of extra risk for your organization in the process), you have three choices:

  1. Give up the velocity advantages of serverless and go back to legacy software development.
  2. Delay the velocity advantages of serverless and spend the next several sprints trying to invent your own patterns (and then the subsequent ones refining them and training everyone on how to do it your way) and roll your own tooling scripts.
  3. Embrace the velocity advantages of serverless and plug in a software tool to manage these challenges and get back to development.

And really, that’s a pretty easy choice. Smart companies will always stand on the shoulders of giants and focus their efforts on building problems unique to their business. Try Stackery today and get back to development.

Disaster Recovery in a Serverless World - Part 2
Apurva Jantrania

Apurva Jantrania | September 17, 2018

Disaster Recovery in a Serverless World - Part 2

This is part two of a multi-part blog series. In the previous post, we covered Disaster Recovery planning when building serverless applications. In this post, we’ll discuss the systems engineering needed for an automated solution in the AWS cloud.

As I started looking into implementing Stackery’s automated backup solution, my goal was simple: In order to support a disaster recovery plan, we needed to have a system that automatically creates backups of our database to a different account and to a different region. This seemed like a straightforward task, but I was surprised to find that there was no documentation on how to do this in an automated, scalable solution - all existing documentation I could find only discussed partial solutions and were all done manually via the AWS Console. Yuck.

I hope that this post will make help fill that void and help you understand how to implement an automated solution for your own disaster recovery solution. This post does get a bit long so if that’s not your thing, see the tl;dr.

The Initial Plan

AWS RDS has automated backups which seemed like the perfect platform to base this automation upon. Furthermore, RDS even emits events that seem ideal for using to kick off a lambda function that will then copy the snapshot to the disaster recovery account.

Discoveries

The first issue I discovered was that AWS does not allow you to share automated snapshots - AWS requires that you first make a manual copy of the snapshot before you can share it with another account. I initially thought that this wouldn’t be a major issue - I can easily make my lambda function first kick off a manual copy. According to the RDS Events documentation, there is an event RDS-EVENT-0042 that would fire when a manual snapshot was created. I could then use that event to then share the newly created manual snapshot to the disaster recovery account.

This leads to the second issue - while RDS will emit events for snapshots that are created manually, it does not emit events for snapshots that are copied manually. The AWS docs aren’t clear about this and it’s an unfortunate feature gap. This means that I have to fall back to a timer based lambda function that will search for and share the latest available snapshot.

Final Implementation Details

While this ended up more complicated than initially envisioned, Stackery still makes it easy to add all the needed pieces for fully automated backups. My implementation ended up looking like this:

The DB Event Subscription resource is a CloudFormation Resource in which contains a small snippet of CloudFormation that subscribes the DB Events topic to the RDS database

Function 1 - dbBackupHandler

This function will receive the events from the RDS database via the DB Events topic. It then creates a copy of the snapshot with an ID that identifies the snapshot as an automated disaster recovery snapshot

const AWS = require('aws-sdk');
const rds = new AWS.RDS();

const DR_KEY = 'dr-snapshot';
const ENV = process.env.ENV;

module.exports = async message => {
  // Only run DB Backups on Production and Staging
  if (!['production', 'staging'].includes(ENV)) {
    return {};
  }

  let records = message.Records;
  for (let i = 0; i < records.length; i++) {
    let record = records[i];

    if (record.EventSource === 'aws:sns') {
      let msg = JSON.parse(record.Sns.Message);
      if (msg['Event Source'] === 'db-snapshot' && msg['Event Message'] === 'Automated snapshot created') {
        let snapshotId = msg['Source ID'];
        let targetSnapshotId = `${snapshotId}-${DR_KEY}`.replace('rds:', '');

        let params = {
          SourceDBSnapshotIdentifier: snapshotId,
          TargetDBSnapshotIdentifier: targetSnapshotId
        };

        try {
          await rds.copyDBSnapshot(params).promise();
        } catch (error) {
          if (error.code === 'DBSnapshotAlreadyExists') {
            console.log(`Manual copy ${targetSnapshotId} already exists`);
          } else {
            throw error;
          }
        }
      }
    }
  }

  return {};
};

A couple of things to note:

  • I’m leveraging Stackery Environments in this function - I have used Stackery to define process.env.ENV based on the environment the stack is deployed to
  • Automatic RDS snapshots have an id that begins with ‘rds:’. However, snapshots created by the user cannot have a ‘:’ in the ID.
  • To make future steps easier, I append dr-snapshot to the id of the snapshot that is created

Function 2 - shareDatabaseSnapshot

This function runs every few minutes and shares any disaster recovery snapshots to the disaster recovery account

const AWS = require('aws-sdk');
const rds = new AWS.RDS();

const DR_KEY = 'dr-snapshot';
const DR_ACCOUNT_ID = process.env.DR_ACCOUNT_ID;
const ENV = process.env.ENV;

module.exports = async message => {
  // Only run on Production and Staging
  if (!['production', 'staging'].includes(ENV)) {
    return {};
  }

  // Get latest snapshot
  let snapshot = await getLatestManualSnapshot();

  if (!snapshot) {
    return {};
  }

  // See if snapshot is already shared with the Disaster Recovery Account
  let data = await rds.describeDBSnapshotAttributes({ DBSnapshotIdentifier: snapshot.DBSnapshotIdentifier }).promise();
  let attributes = data.DBSnapshotAttributesResult.DBSnapshotAttributes;

  let isShared = attributes.find(attribute => {
    return attribute.AttributeName === 'restore' && attribute.AttributeValues.includes(DR_ACCOUNT_ID);
  });

  if (!isShared) {
    // Share Snapshot with Disaster Recovery Account
    let params = {
      DBSnapshotIdentifier: snapshot.DBSnapshotIdentifier,
      AttributeName: 'restore',
      ValuesToAdd: [DR_ACCOUNT_ID]
    };
    await rds.modifyDBSnapshotAttribute(params).promise();
  }

  return {};
};

async function getLatestManualSnapshot (latest = undefined, marker = undefined) {
  let result = await rds.describeDBSnapshots({ Marker: marker }).promise();

  result.DBSnapshots.forEach(snapshot => {
    if (snapshot.SnapshotType === 'manual' && snapshot.Status === 'available' && snapshot.DBSnapshotIdentifier.includes(DR_KEY)) {
      if (!latest || new Date(snapshot.SnapshotCreateTime) > new Date(latest.SnapshotCreateTime)) {
        latest = snapshot;
      }
    }
  });

  if (result.Marker) {
    return getLatestManualSnapshot(latest, result.Marker);
  }

  return latest;
}
  • Once again, I’m leveraging Stackery Environments to populate the ENV and DR_ACCOUNT_ID environment variables.
  • When sharing a snapshot with another AWS account, the AttributeName should be set to restore (see the AWS RDS SDK)

Function 3 - copyDatabaseSnapshot

This function will run in the Disaster Recovery account and is responsible for detecting snapshots that are shared with it and making a local copy in the correct region - in this example, it will make a copy in us-east-1.

const AWS = require('aws-sdk');
const rds = new AWS.RDS();

const sourceRDS = new AWS.RDS({ region: 'us-west-2' });
const targetRDS = new AWS.RDS({ region: 'us-east-1' });

const DR_KEY = 'dr-snapshot';
const ENV = process.env.ENV;

module.exports = async message => {
  // Only Production_DR and Staging_DR are Disaster Recovery Targets
  if (!['production_dr', 'staging_dr'].includes(ENV)) {
    return {};
  }

  let [shared, local] = await Promise.all([getSourceSnapshots(), getTargetSnapshots()]);

  for (let i = 0; i < shared.length; i++) {
    let snapshot = shared[i];
    let fullSnapshotId = snapshot.DBSnapshotIdentifier;
    let snapshotId = getCleanSnapshotId(fullSnapshotId);
    if (!snapshotExists(local, snapshotId)) {
      let targetId = snapshotId;

      let params = {
        SourceDBSnapshotIdentifier: fullSnapshotId,
        TargetDBSnapshotIdentifier: targetId
      };
      await rds.copyDBSnapshot(params).promise();
    }
  }

  return {};
};

// Get snapshots that are shared to this account
async function getSourceSnapshots () {
  return getSnapshots(sourceRDS, 'shared');
}

// Get snapshots that have already been created in this account
async function getTargetSnapshots () {
  return getSnapshots(targetRDS, 'manual');
}

async function getSnapshots (rds, typeFilter, snapshots = [], marker = undefined) {
  let params = {
    IncludeShared: true,
    Marker: marker
  };

  let result = await rds.describeDBSnapshots(params).promise();

  result.DBSnapshots.forEach(snapshot => {
    if (snapshot.SnapshotType === typeFilter && snapshot.DBSnapshotIdentifier.includes(DR_KEY)) {
      snapshots.push(snapshot);
    }
  });

  if (result.Marker) {
    return getSnapshots(rds, typeFilter, snapshots, result.Marker);
  }

  return snapshots;
}

// Check to see if the snapshot `snapshotId` is in the list of `snapshots`
function snapshotExists (snapshots, snapshotId) {
  for (let i = 0; i < snapshots.length; i++) {
    let snapshot = snapshots[i];
    if (getCleanSnapshotId(snapshot.DBSnapshotIdentifier) === snapshotId) {
      return true;
    }
  }
  return false;
}

// Cleanup the IDs from automatic backups that are prepended with `rds:`
function getCleanSnapshotId (snapshotId) {
  let result = snapshotId.match(/:([a-zA-Z0-9-]+)$/);

  if (!result) {
    return snapshotId;
  } else {
    return result[1];
  }
}
  • Once again, leveraging Stackery Environments to populate ENV, I ensure this function only runs in the Disaster Recovery accounts

TL;DR - How Automated Backups Should Be Done

  1. Have a function that will manually create an RDS snapshot using a timer and lambda. Use a timer that makes sense for your use case
    • Don’t bother trying to leverage the daily automated snapshot provided by AWS RDS.
  2. Have a second function, that monitors for the successful creation of the snapshot from the first function and shares it to your disaster recovery account.

  3. Have a third function that will operate in your disaster recovery account that will monitor for snapshots shared to the account, and then create a copy of the snapshot that will be owned by the disaster recovery account, and in the correct region.

Get the Serverless Development Toolkit for Teams

Sign up now for a 30-day free trial. Contact one of our product experts to get started building amazing serverless applications today.

To Top