About halfway through our development cycle, in a meeting with our AWS Solutions Architect we received what we affectionately refer to as the “AWS Bomb”.
Up until that point we had been developing our platform with the idea that all the micro services and resources required to run them all should exist within a single AWS account. Just about all of the AWS documentation up until this point, and guidance from AWS, revolved around the idea of a single account…however AWS had just come up with this new architecture to help keep cloud platforms secure, and minimize the impact (or blast radius) of a security breach. If you’re interested in reading more about what this architecture looks like, or why you should use it – check out the Multiple Account Security Strategy Solution Brief from AWS.
We immediately started working towards adopting this new architecture which came with a whole other host of headaches that we slowly worked through and solved on a micro service by micro service basis. This was fairly easy for the micro services as they generally only communicate within their own account, and when they have to go to another micro service they can just do so with the external API routes.
Where we ran into some interesting challenges was in our CICD process. Luckily we were just starting to design and develop the CICD process, so we didn’t have to start over with this new architecture in mind, but it did mean we had to pioneer some new things that AWS hadn’t covered in their build in CICD tooling.
Our main problem was how to handle cross account permissions in the platform. How do you have a resource give itself permissions to utilize a resource it doesn’t own?
In our specific use case, we are utilizing Lambda as a Custom Authorizer in API Gateway, and the Lambda responsible for generating and validating tokens is in another account. We need to provide permissions to API Gateway to talk to the token Lambda in the other account, and we need to tell the token Lambda that it can be called only from API Gateway’s that we trust.
There were a couple of approaches we could take:
- The micro service team responsible for the resource that needed to be accessed would be required to add to their CloudFormation templates the appropriate security access, and the team attempting to access that resource would have to do the same in their templates
- The resource attempting to access a resource in another account is given permissions to modify the security policies in the target account to allow itself access
- We implement central tooling to help orchestrate permissions. This is the approach we took.
Let’s break down why we didn’t go with the first two approaches.
For approach #1, we didn’t want to slow down development time and introduce dependencies on other micro service teams to complete work. If you have to put an item in the other teams back log, you now are stuck waiting for that to be implemented before you can continue. In the silo’d world of micro services, this really hampers development.
In approach #2, we considered adding a lambda backed custom resource that would allow Account A to configure permissions for resources in Account B. In the end, we didn’t like this model because it required ensuring that the permissions existed in Account B for Account A to modify permissions, which essentially brings us back to operating as we would in approach #1. Any new micro service would have to wait for the target account to add permissions before it could continue. We also didn’t like the blast radius of any of those numerous accounts being compromised and being able to give itself permissions all over the place.
Finally, we come to the way we decided to operate. Enter the AWS Super Glue – Lambda. Similar to approach #2, we used a Lambda that would act as a central coordinator to handle tasks that we didn’t want to allow the individual micro services to do directly. We call this tool the Deploy Helper. Utilizing a little bit of information gleaned from the CloudFormation stack from the micro service, we submit a request to the Deploy Helper lambda to set up the appropriate permissions on the token lambda. This allows me to directly control who is allowed to use the Deploy Helper (it has some magic inside to make sure we trust who’s calling it), and I can ensure that only the Deploy Helper has permissions to modify permissions around the cloud. Then when a deployment is run, our CICD framework automagically calls the deployment helper and the appropriate permissions are applied.
This works for me on multiple levels
- A central location makes it easy for me to update the code and extend as needed for various use cases
- The ability to call the Deploy Helper is limited to trusted calls from our CICD accounts, and has a special sauce built in to ensure we trust that the framework is ours and we made the call
- I can restrict access to modifying permissions only to the role that the Deploy Helper utilizes, and I can tightly control cross account calling permissions
- Nobody needs to add, maintain, or understand additional permissions in their CloudFormation templates
- Everything happens automagically, and without the Developers really needing to understand the underlying reasoning of why something is done, or how it is done
To give you an idea of how this works, and what the process is, here’s a little diagram
(1) The CICD Framework deploys the CloudFormation template to the cloud account, (2) which includes the permissions for the API Gateway Custom Authorizer to call the token lambda service. (3) The CICD Framework makes a request to the Deploy Helper to create permissions for the newly deployed API. The Deploy Helper assumes a role in the Token Service Account, (4) and sets the appropriate lambda permission on the token lambda service.
Lambda really came to the rescue for us here as the multi-account strategy requires real thought and effort into implementing and maintaining permissions since (and rightfully so) by default you can never communicate outside your own account. As with many things we do, it did require custom code and development time, but that’s why the first half of DevOps stands for Development right?
The Deploy Helper does a number of other tasks that I will cover off in future posts, but I think that’s enough to wrap your brain around for one day.
Until next time,