IAM Policy Basics and Best Practices

And how to use stack.new to build resilient secure policies

Chase Douglas | March 23, 2021 | 6 min read
Share this:

One of the most powerful aspects of AWS is their Identity and Access Management (IAM) service. The obvious aspect of its power is that it controls who can do what with all the resources inside your AWS account. But the non-obvious side is how configurable it is. You can encode permissions that are so finely grained that a Lambda Function could, for example, be given just enough permissions to be able to read one attribute from one record for the current user of a DynamoDB Table. The upshot, however, is that IAM policies are very hard to implement correctly. To achieve the aforementioned DynamoDB example, the policy might look like:

{
  "Effect": "Allow",

  // Only allow reading single records, as opposed to querying for many records
  "Action": "dynamodb:GetItem",

  // Only allow access to the StatRecords DynamoDB Table in the us-west-2 region
  "Resource": "arn:aws:dynamodb:us-west-2:012345678901:table/StatRecords",

  "Condition": {
    "ForAllValues:StringEquals": {

       // Only allow reading the stat_value attribute of records
      "dynamodb:Attributes": [
        "stat_value"
      ],

     // Only allow reading records where the partition key is
      // the user id from 'Login with Amazon ID'
      "dynamodb:LeadingKeys": [
        "${www.amazon.com:user_id}"
      ]
    },
    "StringEquals": {

      // Only allow reading specific attributes instead of all attributes
      "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
    }
  }
}

Whew! Oftentimes when you read code you aren't familiar with you can follow it along and figure out how it works.

But if you're new to IAM you likely have questions like:

"What's the format for Resource?"
"What does ForAllValues:StringEquals mean?"
"Where does ${www.amazon.com:user_id} come from?"

IAM truly is very complex!

In this guide we'll take a look at the basics of IAM policies, just enough to understand best practices, and then look at some of the tools available to help us validate that our permissions follow best practices to secure our resources.

IAM Policy Basics

Now that we've seen a complex policy example, let's look at a different example:

{
  "Effect": "Allow",
  "Action": "s3:*",
  "Resource": "*"
}

Here we see the three common properties of an IAM policy:

Effect: Whether this policy Allows or Denys access to resources
Action: The type of interaction for the policy, which can also be specified as a list of actions
Resource: Which resources in AWS this policy affects, specified as Amazon Resource Names (ARNs)

These are just the three most-common properties of an IAM policy. If you want all the nitty-gritty details you can read the full IAM spec here.

In plain English, the above policy grants permission to perform any interaction with AWS Simple Storage Service (S3) on any resource in this AWS account. S3 stores files in buckets, and you can find examples where these permissions are granted in order for a Lambda Function or EC2 Server to upload or download files to a bucket.

While this policy looks very simple, it opens up all kinds of potential issues.

The policy allows any action in S3, including creating and deleting entire buckets or modifying the permissions of buckets or files so they could be read or written by the public.
The policy allows these actions on all buckets and files in this AWS account, rather than a specific set of files within a specific bucket.

This policy is a great example of an overly broad permission set that can lead to data manipulation and/or exfiltration, both highly concerning security issues.

What can we do about this? We can train everyone who writes policies to follow all applicable best practices and scope the permissions so they pose less risk, but this can be very difficult to achieve across an entire organization of developers. Another approach that is easier to apply at a larger scale is automatically checking for whether IAM best practices are being followed.

Auditing Policies in Infrastructure-as-Code

One of the best practices for web application development is to provision resources, including IAM policies, using Infrastructure-as-Code (IaC). We won't dive into all the concepts behind IaC here, but it's important to know that IaC gives us a consistent mechanism to review IAM policies by analyzing those written inside IaC templates.

IaC templates are written in a declarative syntax that is easy for computers to analyze. This has given rise to tools that evaluate templates for best practices. At stack.new, we use Stelligent's cfn_nag. These evaluate the templates for best practices, including looking for issues like overly broad permissions via * actions and resources.
Audit Results.png

Audit results for the aws-samples/happy-path backend template

Here we see some problematic audit results from an AWS example showing how to build an API that manages state park information. Let's take a look at the first two issues.

The lone FAILURE result is due to the unscoped action in the permission policy on line 182. This gives the ProcessDynamoDBStream function the ability to perform any AWS IoT action, such as creating an Over-the-Air (OTA) update to IoT devices using the iot:CreateOTAUpdate action. If we were malicious, we might be able to send an update to all connected IoT devices that causes them to malfunction or send data to somewhere else instead of the secure application it was intended to reach.

The first WARNING exacerbates the prior FAILURE. The WARNING concerns the use of * in the Resource statement on line 183. For example, if the Resource statement only allowed actions on an IoT Topic resource, like arn:aws:iot:us-east-1:012345678901:topic/MyTopic, then it would block me from being able to call the iot:CreateOTAUpdate action because the Resource for the CreateOTAUpdate call would have a different schema and would not match. Because the Resource is *, this function would match an AWS IoT OTA Update ARN and be allowed to create an OTA update.

Fixing IAM Audit Issues

Identify the problematic code

We now know there is an IAM policy that should be scoped better. It's time to dive into the code to figure out how to fix this!

We can take a look at the code for this function in streams/ddb/app.js. We see it creates an iotdata object to make requests to the IoT service on line 20.

IAM Code line 18-20.png

We then see the iotdata.publish() action invoked on line 59.
IAM Code.png

This is the only AWS SDK action invoked in this Lambda Function. That means we can fix the FAILURE result by updating the Action in the policy on line 182 of our template to iot:Publish.

Limit scope (with help of AWS docs)

Looking at the Actions, resources, and condition keys for AWS IoT page (most AWS services have a documentation page like this) we can find the Publish action to see what kind of resource is allowed for scoping the permission. We see the Publish action allows an IoT Topic ARN to be specified.
IAM permission.png
The Topic ARN format, shown below, has four variables we need to supply.
IAM resource types.png

The first three, Partition, Region, and Account, are specific to where we deploy our app and can be substituted in by AWS CloudFormation, which we'll see in a moment. The fourth, TopicName, is the name of the topic we are publishing to. If we go back to our Function code and trace the logic we'll see the Topic we publish events to is based on the Place ID of state parks in the app's DynamoDB Table. This means the Topic name is not a small set of Topics we can encode into the policy, so we should simply put a wildcard * in for it.

We can now update the Resource property of our policy in line 183 of our template to be !Sub arn:${AWS::Partition}:iot:${AWS::Region}:${AWS::AccountId}:topic/*. The !Sub syntax asks CloudFormation to substitute in the pseudo-parameters for the AWS partition, account, and region we are deploying into.

With this update we are now doubly prevented from the possibility of malicious actions being invoked by our Function because actions other than iot:Publish are explicitly not allowed anymore, and even if they were they would have a resource ARN that would not match the policy's Resource specification.

Conclusion

We hope through stack.new folks can learn all about the architecture of applications that to this point are shared as hard-to-follow CloudFormation templates, and further learn best practices based on the output of auditing tools like cfn_nag. We'd love to hear what you think or if you learned anything! Give us a shout @stackeryio with your thoughts!

aws cloudformation IAM Policy

EngineeringIAM Policies: Good, Bad & Ugly

Real-world examples of IAM policies and how to fix them

EngineeringVisualizing your CloudFormation Template with Stackery

Learn how our Design Canvas helps you visualize and edit your code