featured-article-image

Stacks on Stacks

A serverless/FaaS technology blog by Stackery


Self Healing Serverless Applications - Part 2

Self Healing Serverless Applications - Part 2


Author | Nate Taggart

This blog post is based on a presentation I gave at Glue Conference 2018. The original slides: Self-Healing Serverless Applications – GlueCon 2018

This is part two of a three-part blog series. In the last post we covered some of the common failure scenarios you may face when building serverless applications. In this post we’ll introduce the principles behind self-healing application design. In the next post, we’ll apply these principles with solutions to real-world scenarios.

Learning to Fail

Before we dive into the solution-space, it’s worth introducing the defining principles for creating self-healing systems: plan for failure, standardize, and fail gracefully.

Plan for Failure

As an industry, we have a tendency to put a lot of time planning for success and relatively little time planning for failure. Think about it. How long ago did you first hear about Load Testing? Load testing is a great way to prepare for massive success! Ok, now when did you first hear about Chaos Engineering? Chaos Engineering is a great example of planning for failure. Or, at least, a great example of intentionally failing and learning from it. In any event, if you want to build resilient systems, you have to start planning to fail.

Planning for failure is not just a lofty ideal, there are easy, tangible steps that you can begin to take today:

  • Identify Service Limits: Remember the Lambda concurrency limits we covered in Part 1? You should know what yours are. You’re also going to want to know the limits on the other service tiers, like any databases, API Gateway, and event streams you have in your architecture. Where are you most likely to bottleneck first?
  • Use Self-Throttling: By the time you’re being throttled by Lambda, you’ve already ceded control of your application to AWS. You should handle capacity limits in your own application, while you still have options. I’ll show you how we do it, when we get to the solutions.
  • Consider Alternative Resource Types: I’m about as big a serverless fan as there is, but let’s be serious: it’s not the only tool in the tool chest. If you’re struggling with certain workloads, take a look at alternative AWS services. Fargate can be good for long-running tasks, and I’m always surprised how poorly understood the differences are between Kinesis vs. SQS vs. SNS. Choose wisely.

Standardize

One of the key advantages of serverless is the dramatic velocity it can enable for engineering orgs. In principle, this velocity gain comes from outsourcing the “undifferentiated heavy lifting” of infrastructure to the cloud vendor, and while that’s certainly true, a lot of the velocity in practice comes from individual teams self-serving their infrastructure needs and abandoning the central planning and controls that an expert operations team provides.

The good news is, it doesn’t need to be either-or. You can get the engineering velocity of serverless while retaining consistency across your application. To do this, you’ll need a centralized mechanism for building and delivering each release, managing multiple AWS accounts, multiple deployment targets, and all of the secrets management to do this securely. This means that your open source framework, which was an easy way to get started, just isn’t going to cut it anymore.

Whether you choose to plug in software like Stackery or roll your own internal tooling, you’re going to need to build a level of standardization across your serverless architecture. You’ll need standardized instrumentation so that you know that every function released by every engineer on every team has the same level of visibility into errors, metrics, and event diagnostics. You’re also going to want a centralized dashboard that can surface performance bottlenecks to the entire organization – which is more important than ever before since many serverless functions will distribute failures to other areas of the application. Once these basics are covered, you’ll probably want to review your IAM provisioning policies and make sure you have consistent tagging enforcement for inventory management and cost tracking.

Now, admittedly, this standardization need isn’t the fun part of serverless development. That’s why many enterprises are choosing to use a solution like Stackery to manage their serverless program. But even if standardization isn’t exciting, it’s critically important. If you want to build serverless into your company’s standard tool chest, you’re going to need for it to be successful. To that end, you’ll want to know that there’s a single, consistent way to release or roll back your serverless applications. You’ll want to ensure that you always have meaningful log and diagnostic data and that everyone is sending it to the same place so that in a crisis you’ll know exactly what to do. Standardization will make your serverless projects reliable and resilient.

Fail Gracefully

We plan for failure and standardize serverless implementations so that when failure does happen we can handle it gracefully. This is where “self-healing” gets exciting. Our standardized instrumentation will help us identify bottlenecks automatically and we can automate our response in many instances.

One of the core ideas in failing gracefully is that small failures are preferable to large ones. With this in mind, we can oftentimes control our failures to minimize the impact and we do this by rerouting and unblocking from failures. Say, for example, you have a Lambda reading off of a Kinesis stream and failing on a message. That failure is now holding up the entire Kinesis shard, so instead of having one failed transaction you’re now failing to process a significant workload. If we, instead, allow that one blocking transaction to fail by removing it from the stream and (instead of processing it normally) simply log it out as a failed transaction and get it out of the way. Small failure, but better than a complete system failure in most cases.

Finally, while automating solutions is always the goal with self-healing serverless development, we can’t ignore the human element. Whenever you’re taking an automated action (like moving that failing Kinesis message), you should be notifying a human. Intelligent notification and visibility is great, but it’s even better when that notification comes with all of the diagnostic details including the actual event that failed. This allows your team to quickly reproduce and debug the issue and turn around a fix as quickly as possible.

See it in action

In our final post we’ll talk through five real-world use cases and show how these principles apply to common failure patterns in serverless architectures.

We’ll solve: - Uncaught Exceptions - Upstream Bottlenecks - Timeouts - Stuck steam processing - and, Downstream Bottlenecks

If you want to start playing with self-healing serverless concepts right away, go start a free 60 day evaluation of Stackery. You’ll get all of these best practices built-in.

READ MORE >
Self Healing Serverless Applications - Part 1

Self Healing Serverless Applications - Part 1


Author | Nate Taggart

This blog post is based on a presentation I gave at Glue Conference 2018. The original slides: Self-Healing Serverless Applications – GlueCon 2018

This is part one of a multi-part blog series. In this post we’ll discuss some of the common failure scenarios you may face when building serverless applications. In future posts we’ll highlight solutions based on real-world scenarios.

What to expect when you’re not expecting

If you’ve been swept up in the serverless hype, you’re not alone. Frankly, serverless applications have a lot of advantages, and my bet is that (despite the stupid name) “serverless” is the next major wave of cloud infrastructure and a technology you should be betting heavily on.

That said, like all new technologies, serverless doesn’t always live up to the hype, and separating fact from fiction is important. At first blush, serverless seems to promise infinite and immediate scaling, high availability, and little-to-no configuration. The truth is that while serverless does offload a lot of the “undifferentiated heavy lifting” of infrastructure management, there are still challenges with managing application health and architecting for scale. So, let’s take a look at some of the common failure modes for AWS Lambda-based applications.

Runtime failures

The first major category of failures are what I classify as “runtime” or application errors. I assume it’s no surprise to you that if you introduce bugs in your application, Lambda doesn’t solve that problem. That said, you may be very surprised to learn that when you throw an uncaught exception, Lambda will behave differently depending on how you architect your application.

Let’s briefly touch on three common runtime failures:

  • Uncaught Exceptions: any unhandled exception (error) in your application.
  • Timeouts: your code doesn’t complete within the maximum execution time.
  • Bad State: a malformed message or improperly provided state causes unexpected behavior.
Uncaught Exceptions

When Lambda is running synchronously (like in a request-response loop, for example), the Lambda will return an error to the caller and will log an error message and stack trace to CloudWatch, which is probably what you would expect. It’s different though when a Lambda is called asynchronously, as might be the case with a background task. In that event, when throwing an error, Lambda will retry up to three times. In fact, it will even try indefinitely when reading off of a stream, like with Kinesis. In any case, when a Lambda fails in an asychronous architecture, the caller is unaware of the error – although there is still a log record sent to CloudWatch with the error message and stack trace.

Timeouts

Sometimes Lambda will fail to complete within the configured maximum execution time, which by default is 3 seconds. In this case, it will behave like an uncaught exception, with the caveat that you won’t get a stack trace out in the logs and the error message will be for the timeout and not for the potentially underlying application issue, if there is one. Using only the default behavior, it can be tricky to diagnose why Lambdas are timing out unexpectedly.

Bad State

Since serverless function invocations are stateless, state must be supplied on or after invocation. This means that you may be passing state data through input messages or by connecting to a database to retrieve state when the function starts. In either scenario, it’s possible to invoke a function but not properly supply the state which the function needs to properly execute. The trick here is that these “bad state” situations can either fairly noisily (as an uncaught exception) or fail silently without raising any alarms. When these errors occur silently it can be nearly impossible to diagnose, or sometimes to even notice that you have a problem. The major risk is that the events which trigger these functions have expired and since state may not be being stored correctly, you may have permanent data loss.

Scaling failures

The other class of failures worth discussing are scaling failures. If we think of runtime errors as application-layer problems, we can think of scaling failures as infrastructure-layer problems.

The three common scaling failures are:

  • Concurrency Limits: when Lambda can’t scale high enough.
  • Spawn Limits: when Lambda can’t scale fast enough.
  • Bottlenecking: when your architecture isn’t scaling as much as Lambda.
Concurrency Limits

You can understand why Amazon doesn’t exactly advertise their concurrency limits, but there are, in fact, some real advantages to having limits in place. First, concurrency limits are account limits that determine how many simultaneously running instances you can have of your Lambda functions. These limits can really save you in scenarios where you accidentally trigger an unexpected and massive workload. Sure, you may quickly run up a tidy little bill, but at least you’ll hit a cap and have time to react before things get truly out of hand. Of course, the flipside to this is that your application won’t really scale “infinitely,” and if you’re not careful you could hit your limits and throttle your real traffic. No bueno.

For synchronous architectures, these Lambdas will simply fail to be invoked without any retry. If you’re invoking your Lambdas asynchronously, like reading off of a stream, the Lambdas will fail to invoke initially, but will resume once your concurrency drops below the limit. You may experience some performance bottlenecks in this case, but eventually the workload should catch up. It’s worth noting, by contacting AWS you can usually get them to raise your limits if needed.

Spawn Limits

While most developers with production Lambda workloads have probably heard of concurrency limits, in my experience very few know about spawn limits. Spawn limits are account limits on what rate new Lambdas can be invoked. This can be tricky to identify because if you glance at your throughput metrics you may not even be close to your concurrency limit, but could still be throttling traffic.

The default behavior for spawn limits matches concurrency limits, but again, it may be especially challenging to identify and diagnose spawn limit throttling. Spawn limits are also very poorly documented and, to my knowledge, it’s not possible to have these limits raised in any way.

Bottlenecking

The final scaling challenge involves managing your overall architecture. Even when Lambda scales perfectly (which, in fairness is most of the time!), you must design your other service tiers to scale as well or you may experience upstream or downstream bottlenecks. In an upstream bottleneck, like when you hit throughput limits in API Gateway, your Lambdas may fail to invoke. In this case, you won’t get any Lambda logs (they really didn’t invoke), and so you’ll have to be paying attention to other metrics to detect this. It’s also possible to create downstream bottlenecks. One way this can happen is when your Lambdas scale up, but deplete your connection pool for a downstream database. These kind of problems can behave like an uncaught exception, lead to timeouts, or distribute failures to other functions and services.

Introducing Self-Healing Serverless Applications

The solution to all of this is to build resiliency with “Self-Healing” Serverless Applications. This is an approach for architecting applications for high-resiliency and for automated error resolution.

In our next post, we’ll dig into the three design principles for self-healing systems:

  • Plan for Failure
  • Standardize
  • Fail Gracefully

We’ll also learn to apply these principles to real-world scenarios that you’re likely to encounter as you embrace serverless architecture patterns. Be sure to watch for the next post!

READ MORE >
Custom CloudFormation Resources: Real Ultimate Power

Custom CloudFormation Resources: Real Ultimate Power


Author | Chase Douglas @txase

my ninja friend mark

Lately, I’ve found CloudFormation custom resources to be supremely helpful for many use cases. I actually wanted to write a post mimicing Real Ultimate Power:

Hi, this post is all about CloudFormation custom resources, REAL CUSTOM RESOURCES. This post is awesome. My name is Chase and I can’t stop thinking about custom resources. These things are cool; and by cool, I mean totally sweet.

Trust me, it would have been hilarious, but rather than spend a whole post on a meme that’s past its prime let’s take a look at the real reasons why custom resources are so powerful!

an awesome ninja

What Are Custom Resources?

Custom resources are virtual CloudFormation resources that can invoke AWS Lambda functions. Inside the Lambda function you have access to the properties of the custom resource (which can include information about other resources in the same CloudFormation stack by way of Ref and Fn::GetAtt functions). The function can then do anything in the world as long as it (or another resource it invokes) reports success or failure back to CloudFormation within one hour. In the response to CloudFormation, the custom resource can provide data that can be referenced from other resources within the same stack.

another awesome ninja

What Can I Do With Custom Resources?

Custom resources are such a fundamental resource that it isn’t obvious at first glance all the use cases it enables. Because it can be invoked once or on every deployment, it’s a powerful mechanism for lifecycle management of many resources. Here are a few examples:

You could even use custom resources to enable post-provisioning smoke/verification testing:

  1. A custom resource is “updated” as the last resource of a deployment (this is achieved by adding every other resource in the stack to its DependsOn property)
  2. The Lambda function backing the custom resource triggers smoke tests to run, then returns success or failure to CloudFormation
  3. If a failure occurs, CloudFormation automatically rolls back the deployment

Honestly, while I have begun using custom resources for many use cases, I discover new use cases all the time. I feel like I have hardly scratched the surface of what’s possible through custom resources.

And that’s what I call REAL Ultimate Power!!!!!!!!!!!!!!!!!!

more awesome ninja

READ MORE >

Stackery 2018 Product Updates

Stackery 2018 Product Updates


Author | Sam Goldstein

Our product engineering team ships every single day.

That means Stackery’s product gets better every single day. Stackery engineers commit code into git which marches into our continuous delivery pipeline. We promote each version of our microservices, frontend, and CLI through multiple testing environments, rolling shiny new features into production or notifying the team of failures. This is the best way we know to develop modern software and explains why our team is able to ship so much functionality so rapidly.

However, because we’re constantly shipping, it means we need to pause periodically to take note of new features and improvements. In this post I’ll summarize some of the most significant features and changes from our product team over the past few months. For a more detailed list of changes, you can read and/or follow Stackery’s Release Notes.

Referenced Resource

One of the best things about microservice architecture is the degree which you can encapsulate and reuse functionality. For example, if you need to check if a user is authorized to perform a certain action, there’s no need to scatter permissioning code throughout your services. Put it all in one place (an AuthorizationService perhaps), and call out to that in each service that needs to check permissions.

Stackery’s Referenced Resource nodes let’s you reference existing infrastructure resources (be they Lambda functions, S3 buckets, VPCs, you name it) by their AWS ARN and seamlessly integrate these into your other services.

One of the best uses I’ve seen for Referenced Resources is using it as the mechanism to implement centralized error reporting for serverless architectures. Write one central Lambda function that forwards exceptions into your primary error reporting and alerting tool. Configure every other stack to send error events to this central handler. Viola! Complete visiblity into all serverless application errors.

Support for Multiple AWS Accounts

Every company we work with uses multiple AWS accounts. Sometimes there’s one for production and one for everything else. In Stackery’s engineering team each engineer has multiple accounts for development and testing, as well as shared access to accounts for integration testing, staging, and production. Splitting your infrastructure across multiple accounts has major benefits. You can isolate permissions and account-wide limits, minimizing risk to critical accounts (e.g. production).

However, managing deployment of serverless architectures across multiple accounts is often a major PITA. This is why working across multiple accounts is now treated as a first class concern across all of Stackery’s functionality. Multiple AWS accounts can be registered within a Stackery account using our CLI tool. Stackery environments are tied to an AWS accounts, which maps flexibly into the vast majority of AWS account usage patterns.

Managing multiple AWS accounts is a key part of most organizations’ cloud security strategy. Stackery supports this by relying on your existing AWS IAM policies and roles when executing changes. If the individual executing the change doesn’t have permission in that AWS account, the action will fail. This makes it straightforward to set up workflows where engineers have full control to make changes in development and testing environments, but can only propose changes in the production account, which are then reviewed and executed by an authorized individual or automation tool.

You can read more in our knowledge base article about Working with multiple AWS accounts in Stackery

CloudFormation Resource Nodes

Sometimes you need to do something a little different, which is why we built custom CloudFormation Resource nodes. You can use these to provision any AWS resource and take advantage of the full power and flexibility of CloudFormation, for situations when that’s required or desireable.

What’s been coolest about rolling this feature out is the variety of creative uses we’ve seen it used. For example use CloudFormation Resource nodes to automatically configure and seed a database the first time you deploy to a new environment. You can also use them to automatically deploy an HTML front end to CloudFront each time you deploy your backend serverless app. The possibilities are endless.

AWS Resource Tagging

Resource Tagging may not be the most glamorous of features, but it’s a critical part of most organizations’ strategies for tracking cost, compliance, and ownership across their infrastructure. Stackery now boasts first class support for tagging provisioned resources. We also provide the ability to require specific tags prior to deployment, making it orders of magnitude to get everyone on the same page on how to correctly tag resources.

Always Shipping

Our goal is to always be shipping. We aim to push out valuable changes every day. Customer’s gain more control and visiblity over their serverless applications each day, so they can ship faster and more frequently too. Look out for more great changes rolling out each day in the product, and watch this blog for regular announcements summarizing our progress. We also love to hear what you think so if you have wants or needs managing your serverless infrastructure, don’t hesitate to let us know.

READ MORE >
Five Steps to a Successful Product Launch

Five Steps to a Successful Product Launch


Author | Susan Little

Many organizations are discovering how serverless technology enables the creation of new features or applications in a matter of hours and days instead of weeks or months. With this accelerated development timeframe value can be delivered to customers faster.

So, how do you ensure you’re introducing the most innovative products that truly meet your customer’s needs?

At Stackery we often ask ourselves this very question.

While our focus and expertise are providing serverless cloud governance and DevOps tools to organizations who use our technology to achieve compliance, security, and auditability with serverless applications - we’ve found one of the keys to delivering products is having a New Product Introduction Plan (NPI). This plan guides your go-to-market initiatives and includes having a keen understanding of your customer needs and wants, how your product compares to your competitors and describing your product in such a way that it is compelling, unique and different.

We’ve found that by following these five steps increases the success rate of product adoption and customers certainly will appreciate having their needs considered.

1. Develop a competitive comparison - Analyze competitor strengths and weaknesses; identify and understand key competitors - and how your product is unique, better and different.

2. Define the target audience - What are their problems and how does your product meet their need? Paint a picture of the target audience by developing a buyer persona where you capture what is important to them. A detailed buyer persona will help guide product development, increase marketing effectiveness and allow for alignment across your organization.

3. Create a market facing product/feature description - This is the most critical element of the NPI as it needs to accurately describe the new product/feature. The product description needs to also be compelling and make the product/feature attractive to a perspective buyer or industry influencer, such as a publication that wants to provide value to their readers.

Product descriptions can make or break a sale. While they can be easy to overlook, choosing the right set of words can compel even the most skeptical of customers to make a purchase.

4. Build a Creative Brief - The Creative Brief integrates the market, competitor, and product knowledge into differentiated statements of product value that form the foundation of the outbound marketing plan. It will simplify and expedite content creation, by building content from an agreed upon messaging framework and includes:

  • Background/Market Overview
  • Competitive Landscape
  • Target Audience
  • Pain Points
  • Product Description
  • Features and Benefits
  • Key Messages

5. Create a Go-To-Market (GTM) Plan - The GTM plan creatively applies marketing methods that complement demand generation and sales enablement to achieve the desired new product/feature introduction results. This includes setting metrics for attaining leads and other success factors.

Following these five steps has proved highly successful for many companies launching products. While serverless offers many benefits in bringing products to market faster and Stackery is paving the way for enterprises in their adoption of this new technology, spending time upfront to architect your New Product Introduction will pay dividends down the road.

If you are interested in learning more about Stackery’s approach to introducing products, feel free to reach out to me at: susan@stackery.io.

READ MORE >
Alexa, tell Stackery to deploy

Alexa, tell Stackery to deploy


Author | Apurva Jantrania

We have a couple of Amazon Dot’s around the office and one day, Nate was wondering if we could use Alexa to deploy a stack. That sounded like a fun side project, although I’d never created an Alexa skill before. So this week, I’m going to write a bit about the proof-of-concept I made, and some of the learnings I came across.

To learn about Alexa skills, I used two guides:

  1. Steps to Build a Custom Skill to guide me through building the custom Alexa Skill
  2. Developing Your First Skill to understand how custom Skill Handlers are written in NodeJS

Creating the Alexa Skill

Designing and building the Alexa Skill following the first guide was surprisingly straight-forward. I decided I wanted to build my skill to enable deploying a stack into a specific environment. For the purpose of this POC, I decided that adding which branch to use for the deployment to start getting to be too long of an utterance/dialog. My ideal phrasing was to be able to say “Alexa, tell stackery to deploy $STACK_NAME into $ENVIRONMENT_NAME”.

The first issue I came across is the skill invocation name. I wanted to just use stackery but there is a very large dialog box that lists requirements, and at the top of that list is that the invocation name should be two or more words. That seemed increadibly unwieldy and I wasn’t sure what I’d go with. This requirement also seemed to go against some of the examples I’d seen in some of Amazon’s own guides: I decided that I really did want stackery as my invocation and I got lucky when I tried it - turns out that Amazon’s definition of requirement here is synonomous with guideline.

I then created a new intent that I called deployIntent and populated the sample utterences with a couple of phrases:

deploy
deploy {stackName}
deploy {stackName} to {env}

Where {stackName} and {env} are slots that I was able to dive into their Edit Dialog settings to tell Alexa that both slots are required and how to prompt for them if the user doesn’t provided it.

I got to say, the Alexa Skills UI/UX was really making this easy for me as a first time developer. It felt slick.

With this, I was pretty much done creating the skill, and now I needed to create the handler that would actually do the deployment.


Creating the Alexa Skill Handler

I created a new stack in Stackery called alexaDeployments. As an Alexa skill can directly invoke an AWS Lambda function, I deleted all of the existing resources and started with a fresh function which I called alexaHandler. I updated the timeout to be 300 seconds. Note that deployments can easily take more than 5 minutes. To really be robust, the stack deployment should be handled by Docker Task resource instead, but since this was just a POC, I was willing to accept this limitation to speed things up.

I then saved the stack in the UI and cloned the repo locally to start developing the handler. Following the second guide quickly gave me the skeleton of my alexaHandler lambda function. It’s a lot of relatively repetative code, thats well outlined in the guide, so I’m not going to add it here. What I needed to do now was code my DeployIntentHandler and add the stackery CLI to the function.

When Stackery packages a function to lambda, it includes everything in the function directory, so taking advantage of that, I downloaded the Linux varient of the Stackery CLI into the /Stackery/functions/alexaHanlder folder in my repo. The Stackery CLI requires a few steps to be able to deploy:

  • A .stackery.toml file that is created by running through the stackery login command
  • AWS credentials provided either via the command line (--access-key-id and --secret-access-key) or via a profile in the ~/.aws/credentials file

To make things easier, I took my .stackery.toml file and added that to the function folder so I could skip the stackery login step on each invocation. As for my AWS Credentials, I will get them from environment variables set via Stackery’s Environment Configurations.

With that, my DeployIntentHandler looked like this

const DeployIntentHandler = {
  canHandle (handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'DeployIntent';
  },
  handle (handlerInput) {
    console.log('DeployIntent Invoked');
    console.dir(handlerInput);

    const request = handlerInput.requestEnvelope.request;

    if (request.dialogState !== 'COMPLETED') {
      return {
        directives: [{"type": "Dialog.Delegate"}]
      };
    }

    const stackName = request.intent.slots.stackName.value;
    const env = request.intent.slots.env.value;

    let args = ['deploy', stackName, env, 'master',
                '--config', './.stackery.toml',
                '--access-key-id', process.env.accessKeyId,
                '--secret-access-key', process.env.secretAccessKey];

    return childProcess.execFile('./stackery', args)
      .then(result => {
        console.log(`stackery returned: stdout: ${result.stdout}`);
        console.log(`stackery returned: stderr: ${result.stderr}`);
      })
      .catch(error => {
        console.log(`ChildProcess errored with ${JSON.stringify(error)}`);
        if (error.stdout) {
          console.log(error.stdout);
          console.log(error.stderr);
        }
      })
      .then(() => {
        const speechText = `Starting deployment of ${stackName} into ${env}`;

        return handlerInput.responseBuilder
          .speak(speechText)
          .getResponse();
      })
  }
};

I commited my changes and deployed my alexaDeployments stack. Once deployed, I was able to go into the Deployed Stack Dashboard and click on the alexaHandler resource to get the Lambda ARN, which let me finish the last step in setting up my Alexa Skill - connecting the Alexa Skill to the Lambda function.

Function Permission Errors

However, when I tried to add the ARN of the Lambda function to the Alexa skill, I got an error The trigger setting for the Lambda arn:aws:lambda:us-west-2:<account>:function:<functionName> is invalid. Error code: SkillManifestError - Friday, Apr 27, 2018, 1:43 PM. Whoops, I forgot to give Alexa permission to access the lambda function. Stackery usually takes care of all the permissions needed, but since it didn’t know about the Alexa Skill, I was going to have to manually add the needed permission. Luckily, Stackery makes this easy with Custom CloudFormation Resources. I added a custom resource to my stack with the following CloudFormation:

{
  "Resources": {
    "alexaSkillPolicy": {
      "Type": "AWS::Lambda::Permission",
      "Properties": {
        "Action": "lambda:InvokeFunction",
        "FunctionName": "stackery-85928785027043-development-33181332-alexaHandler",
        "Principal": "alexa-appkit.amazon.com"
      }
    }
  }
}

This let’s alexa-appkit.amazon.com invoke my function. After re-deploying my stack with this change, I was able to finish linking my Alexa Skill to my handler, and it was time to test!

Timeouts and Retry Errors

Initial testing looked good - Alexa was able to run my skill, my handler was getting invoked, and I could see my test stack (a simple hello world stack) was getting re-deployed. However, when I looked into the CloudWatch logs for my alexaHandler function, I noticed that I was getting the errors printed from the Stackery CLI Failed to prepare deployment: \nStackery API responded with status code: 409\nYou probably already have a deployment in progress\n. With some inspection, I realized that since the handler took the time to actually deploy before responding to Alexa, Alexa was seemingly timing out and retrying in about 30 seconds. So this error was from the re-invocation of the Stackery CLI.

Ideally, I’d be able to provide intermittent status updates via Alexa, but unfortunately you are only allowed to respond once. To handle this issue, I refactored my alexaHandler function to asynchronously invoke another lambda function stackeryWrapper.

So now, my DeployIntentHandler looked like this:

const DeployIntentHandler = {
  canHandle (handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'DeployIntent';
  },
  handle (handlerInput) {
    console.log('DeployIntent Invoked');
    console.dir(handlerInput);

    const request = handlerInput.requestEnvelope.request;

    if (request.dialogState !== 'COMPLETED') {
      return {
        directives: [{ "type": "Dialog.Delegate" }]
      };
    }

    const stackName = request.intent.slots.stackName.value.replace(' ', '');
    const env = request.intent.slots.env.value.replace(' ', '');
    let message = { stackName, env };
    const Payload = JSON.stringify(message, null, 2);

    return lambda.invoke({
      FunctionName: stackeryWrapper.functionName,
      InvocationType: 'Event',
      Payload
    }).promise()
      .then(() => {
        const speechText = `Starting deployment of ${stackName} into ${env}!`;

        return handlerInput.responseBuilder
          .speak(speechText)
          .getResponse();
      })
  }
};

And my new stackeryWrapper function looks like this:

const childProcess = require('child-process-promise');

module.exports = async message => {
  console.dir(message);

  const stackName = message.stackName;
  const env = message.env;

  return childProcess.execFile('./stackery', ['deploy', stackName, env, 'master', '--config', './.stackery.toml', '--access-key-id', process.env.accessKeyId, '--secret-access-key', process.env.secretAccessKey])
    .then(result => {
      console.log(`stackery returned: stdout: ${result.stdout}`);
      console.log(`stackery returned: stderr: ${result.stderr}`);
    })
    .catch(error => {
      console.log(`ChildProcess errored with ${error}`);
      if (error.stdout) {
        console.log(error.stdout);
        console.log(error.stderr);
      }
    });
}

And my stack looks like this:


Final Thoughts

While this project is far from being useable by anyone else as it stands, I found it interesting and honestly exciting to be able to get Stackery deployment to work via Alexa. Ramping on Alexa was relatively painless, although Amazon does have some contradictory documentation that can confuse the waters. And with Stackery, it was painless to handle adding the CLI and the refactoring that I needed. There’s a lot that could still be done to this project such as authorization, authentication, status updates, etc, but that will have to wait for another day.

READ MORE >

Try Stackery For Free

Gain control and visibility of your serverless operations from architecture design to application deployment and infrastructure monitoring.