I spent last week at DevOps Enterprise Summit in Las Vegas where I had the opportunity to talk with many people from the world's largest companies about DevOps, serverless, and the ways they are delivering software faster with better stability. We were encouraged to hear of teams using serverless from cron jobs to core bets on accelerating digital transformation initiatives.
Lots of folks had questions about what we've learned running the serverless engineering team at Stackery, how to ensure innovative serverless projects can coexist with enterprise standards, and most frequently, how serverless changes DevOps workflows. Since I now have experience building developer enablement software out of virtual machines, container infrastructures, and serverless services I thought I'd share some of the key differences with you in this post.
At its core, serverless development is all about combining managed services in the cloud to create applications and microservices. The serverless approach has major benefits. You can move incredibly fast, outsourcing tons of infrastructure friction and orchestration complexity.
However, because your app consists of managed services, you can't run it on your laptop. You can't run the cloud on your laptop.
Let's pause here to consider the implications of this. With VMs and containers, deploying to the cloud is part of the release process. New features get developed locally on laptops and deployed when they're ready. With serverless, deploying to the cloud becomes part of the development process. Engineers need to deploy as part of their daily workflow developing and testing functionality. Automated testing generally needs to happen against a deployed environment, where the managed service integrations can be fully exercised and validated.
This means the environment management needs of a serverless team shift significantly. You need to get good at managing a multitude of AWS accounts, developer specific environments, avoiding namespace collisions, injecting environment specific configuration, and promoting code versions from cloud-side development environments towards production.
Note: While there are tools like SAM CLI and localstack that enable developers to invoke functions and mimic some managed services locally, they tend to have gaps and behave differently than a cloud-side environment.
The serverless approach focuses on leveraging the cloud provider do more of the undifferentiated heavy lifting of scaling the IT infrastructure, freeing your team to maintain laser focus on the unique problems which your organization solves.
To repeat what I wrote a few paragraphs ago, serverless teams build applications by combining managed services that have the most desirable scaling, cost, and durability characteristics. However, here's another big shift. Developers now need familiarity with a hefty catalog of services. They need to understand their pros and cons, when to use each service, and how to configure each service correctly.
A big part of solving this problem is to leverage Infrastructure as Code (IaC) to define your serverless infrastructure. For serverless teams this commonly takes the form of an AWS Serverless Application Model (SAM) template, a serverless.yml, or a CloudFormation template. Infrastructure as Code provides the mechanism to declare the configuration and relationships between the managed services that compose your serverless app. However, because serverless apps typically involve coordinating many small components (Lambda functions, IAM permissions, API & GraphQL gateways, datastores, etc.) the YAML files containing the IaC definition tend to balloon to hundreds (or sometimes thousands) of lines, making it tedious to modify and hard to keep consistent with good hygiene. Multiply the size and complexity of a microservice IaC template by your dev, test, and prod environments, engineers on the team, and microservices; you quickly get to a place where you will want to carefully consider how they'll manage the IaC layer and avoid being sucked into YAML hell.
Serverless is now an option for both new applications and refactoring monoliths into microservices. We've seen teams deliver highly scalable, fault-tolerant services in days instead of months to replace functionality in monoliths and legacy systems. We recently saw a team employ the serverless strangler pattern to transition a monolith to GraphQL serverless microservices, delivering a production ready proof of concept in just under a week. We've written about the Serverless Strangler Pattern before on the Stackery blog, and I'd highly recommend you consider this approach to technical transformation.
A key difference with serverless is the potential to eliminate infrastructure and platform provisioning cycles completely from the project timeline. By choosing managed services, you're intentionally limiting yourself to a menu of services with built-in orchestration, fault tolerance, scalability, and defined security models. Building scalable distributed systems is now focused exclusively on the configuration management of your infrastructure as code (see above). Just whisper the magic incantation (in 500-1000 lines of YAML) and microservices spring to life, configured to scale on demand, rather than being brought online through cycles of infrastructure provisioning.
Regardless of platform, enforcing cross-cutting operational concerns when the number of services increases is a (frequently underestimated) challenge. With microservices it's easy to keep the pieces of your system simple, but it's hard to keep them all consistent as the number of pieces grows.
What cross-cutting concerns need to be kept in sync? It's things like:
Addressing cross-cutting concerns is an area many serverless teams struggle, sometimes getting bogged down in a web of inconsistent tooling, processes, and visibility. However the serverless teams that do master cross-cutting effectively are able to deliver on microservice transformation initiatives much faster than those using other technologies.
Just like serverless teams, the serverless ecosystem is moving fast. Cloud providers are pushing out new services and features every day. Serverless patterns and best practices are undergoing rapid, iterative evolution. There are multiple AWS product and feature announcements every day. It's challenging to stay current on the ever expanding menu of cloud managed services, let alone best practices.
Our team at Stackery is obsessed with tracking changes in the serverless ecosystem, identifying best practices, and sharing these with the serverless community. AWS Secrets Manager, easy authorization hooks for REST APIs in AWS SAM, 15 minute Lambda timeouts, and AWS Fargate Containers are just a few examples of recent serverless ecosystem changes our team is using. Only a serverless team can keep up with a serverless team. We have learned a lot of lessons, some of them the hard way, about how to do serverless right. We'll keep refining our serverless approach and can honestly say we're moving faster and with more focus than we'd ever thought possible.
Raise your hand if the productivity of your team ever ground to a halt because you needed to fight fires or were blocked waiting for infrastructure to be provisioned. High profile security vulnerabilities are being discovered all the time. The week Heartbleed was announced a lot of engineers dropped what they had been working on to patch operating systems and reboot servers. Serverless teams intentionally don't manage OS's. There's less surface area for them to patch, and as a result they're less likely to get distracted by freshly discovered vulnerabilities. This doesn't completely remove a serverless team's need to track vulnerabilities in their dependencies, but it does significantly scope them down.
Capacity constraints are a similar story. Since serverless systems scale on demand, it's not necessary to plan capacity in a traditional sense, managing a buffer of (often slow to provision) capacity to avoid hitting a ceiling in production. However serverless teams do need to watch for a wide variety AWS resource limits and request increases before they are hit. It is important to understand how your architecture scales and how that will effect your infrastructure costs. Instead of your system breaking, it might just send you a larger bill so understanding the relationship between scale, reliability, and cost is critical.
As a community we need to keep pushing the serverless envelope and guiding more teams in the techniques to break out of technical debt, overcome infrastructure inertia, embrace a serverless mindset, and start showing results they never knew they could achieve.