Safety Nets: CloudFormation Custom Resources for Regional API Gateways

Up until this week we’ve been utilizing edge optimized custom domains within our API Gateways. This has been really easy to set up using CloudFormation and has been a great way for us to tightly control the URLs used to access our REST platform.

In order to support a more global expansion of the platform on AWS, we’re ditching the Edge optimized custom domains and getting into the Regional custom domains which were launched by AWS at reInvent 2017.

This caused me one major headache – the AWS::ApiGateway:DomainName resource in CloudFormation currently has no way to look up what the underlying URL is for me to add the appropriate Route53 record.

Before I get into the sample code and solution, let’s do a quick introduction into the difference between edge, and regional API Gateway endpoints.

Continue reading “Safety Nets: CloudFormation Custom Resources for Regional API Gateways”

Playing in the SANdbox: Subject Alternate Names on ACM Certificates in CloudFormation

Depending on your view, the speed at which AWS updates and changes can either be a complete nightmare or something that keeps you coming into work every day. Luckily for me, I see it as the latter.

Constant AWS updates means every time I come back to revisit a problem, or I’m surfing the documentation around CloudFormation there’s always a new way to solve something or a different and better way to do things.

When I first started tackling ACM certificates in CloudFormation you couldn’t specify Subject Alternate Names as part of the request – which left you creating a separate certificate for every subdomain you need a certificate to cover. Luckily back in the 70’s Larry Tesler invented Copy & Paste so it wasn’t too big a deal…at least until you run into the maximum number of certificates you can request in a year (which thankfully was increased from when I first started doing ACM requests in CloudFormation so you shouldn’t realistically hit that limit).

Now that Subject Alternate Names (SANs) are supported this simplifies my CloudFormation quite a bit, but it gets a bit tricky for doing the certificate approvals since I’m going to be adding subdomains and I need the certificate approval to come to an email address domain that actually exists.

Continue reading “Playing in the SANdbox: Subject Alternate Names on ACM Certificates in CloudFormation”

Carefully Poking Holes: Using Cross Account Custom Authorizers in API Gateway

First off, apologies for the brief hiatus. I hit a bit of a busy period with work and fell off the posting wagon.

AWS recently introduced support for API Gateway to use a Lambda custom authorizer in API Gateway. Previously the Lambda custom authroizer had to exist in the same AWS account as the API Gateway, which causes problems in our architecture since we want to use a singular token service for REST APIs across all accounts.

We originally solved this problem with what we dubbed the Auth Proxy. The Auth Proxy Lambda lives in an S3 bucket in a shared account that can be deployed with CloudFormation by referencing it’s location. The bucket policy for the bucket in S3 is configured to allow CloudFormation to get the zip package during a deployment. Finally, when we run the CloudFormation stack we do a lookup for the ARN of the Token Lambda, store it as an environment variable for the Auth Proxy, and then add permissions for the Auth Proxy in that account to be able to do a Lambda invocation on the token lambda.

Phew, that’s a lot of steps for something that would be so much easier if we could just point API Gateway at the Lambda in the other account. Now that API Gateway supports exactly that, handling the delicate process of opening up permissions cross account needed to be tackled.

Continue reading “Carefully Poking Holes: Using Cross Account Custom Authorizers in API Gateway”

Proactively Plugging Leaks: Securing CloudFormation Stacks in AWS CodePipeline

In a recent chat with our AWS Solutions Architect, he pointed me in the direction of some really cool open source DevOps tools from the guys over at Stelligent. They have a bunch of neat utilities and frameworks on their public GitHub, and they do a bunch of very interesting DevOps podcasts that are worth taking some time to listen to.

Anyways, one of the utilities I really like is cfn-nag which is designed to scan CloudFormation templates for known vulnerabilities and bad practices. This is something that fills a gap I have in my DevOps portfolio since I know most best practices and know what I should or shouldn’t do from an infrastructure stand point (most of the time), but my developers shouldn’t have to be AWS security experts. Since stacks are maintained by developers after I hand them off, I need a way to ensure they don’t accidentally introduce security leaks into the platform. As is one of my mantras I have to automate everything, I can’t expect developers to remember to scan their stacks before they get deployed.

Continue reading “Proactively Plugging Leaks: Securing CloudFormation Stacks in AWS CodePipeline”

Making Snowflakes: Avoiding Collisions in CloudFormation Stack Outputs

Generally speaking I’m pretty lucky because all our micro services operate in their own accounts (check out AWS Super Glue: The Lambda Function for why we do this) and I don’t have to worry too much about uniqueness of AWS resources. There are a few exceptions of course for resources that must be unique across the entire AWS platform, like S3 bucket names, and CloudFront CNAME’s. Since I’ve been so lucky I haven’t had to solve this problem of stack outputs and the requirement for them to be unique within an AWS account. Our micro services are fairly lightweight so I can get away with all my infrastructure existing within a single CloudFormation script. In my recent Web App Blue/Green deployment however I started running into stack outputs colliding.

Before we get too far into this, let’s take a step back and bring the CloudFormation rookies up to speed. If you just want to check out my solution, go ahead and skip down.

Continue reading “Making Snowflakes: Avoiding Collisions in CloudFormation Stack Outputs”

Crash Test Dummy: Building, and Testing Angular Apps in AWS CodeBuild

As part of an upcoming post around how we achieved Blue/Green functionality within AWS for I wanted to cover off a bit of a technical hurdle we overcame this week around how to build and test a web app in AWS CodeBuild.

So what’s the big deal? AWS CodeBuild lets you use a whole bunch of curated containers that have all kinds of frameworks and tools built in. If that’s not enough for you out of the box the buildspec gives you excellent control over running scripts including installing packages in Ubuntu. Even then if none of that works for you, you can simply curate your own docker image, publish it to the Elastic Container Registry (ECR) (though more on limitations of ECR in CI/CD in another post), or the Docker Hub. With all these tools and approaches at my disposal, there’s no way I can’t have an AWS CodeBuild environment that meets my needs. But, spoiler alert, the environment isn’t my problem – it’s the build and test framework my developers have implemented.

Continue reading “Crash Test Dummy: Building, and Testing Angular Apps in AWS CodeBuild”

AWS Super Glue: The Lambda Function

About halfway through our development cycle, in a meeting with our AWS Solutions Architect we received what we affectionately refer to as the “AWS Bomb”.

Up until that point we had been developing our platform with the idea that all the micro services and resources required to run them all should exist within a single AWS account. Just about all of the AWS documentation up until this point, and guidance from AWS, revolved around the idea of a single account…however AWS had just come up with this new architecture to help keep cloud platforms secure, and minimize the impact (or blast radius) of a security breach. If you’re interested in reading more about what this architecture looks like, or why you should use it – check out the Multiple Account Security Strategy Solution Brief from AWS.

We immediately started working towards adopting this new architecture which came with a whole other host of headaches that we slowly worked through and solved on a micro service by micro service basis. This was fairly easy for the micro services as they generally only communicate within their own account, and when they have to go to another micro service they can just do so with the external API routes.

Where we ran into some interesting challenges was in our CICD process. Luckily we were just starting to design and develop the CICD process, so we didn’t have to start over with this new architecture in mind, but it did mean we had to pioneer some new things that AWS hadn’t covered in their build in CICD tooling.

Continue reading “AWS Super Glue: The Lambda Function”

This is How We DevOps

I realized that in writing this blog, I assumed that everyone knew why the DevOps model is so important, and what CI/CD means, and why we do it.

That’s probably a bad assumption. As a new(ish) development ethos, DevOps isn’t widely adopted and it can be difficult to sell DevOps as a revolution in development to an organization. I’m going to take a bit of time to explain why we do DevOps here, and what our guiding principals are whenever we develop our CI/CD processes.

Why we DevOps

Okay, I know some people are going to say “hey, James…DevOps isn’t a verb!” but it’s a made up word and a contraction of Development and Operations anyways so I’m gonna use it in weird ways simply because I can.

The core tenant of DevOps is exactly that, amalgamating Development and Operations teams. For us, we do this because it allows our development teams to develop, and push to AWS from anywhere anytime. This is achievable because we automate just about everything, and the automation allows us to put in all kinds of security blankets for builds, testing, and security to help developers not make mistakes and push bad code directly to the public. This is a huge improvement over waterfall and really empowers our developers to drive change in the product at a rapid pace. If someone is excited about a new feature and works until 4AM getting it working, why wait another year before actually releasing it? Here’s the best part, when developers know that when they commit that code, it’s going up to the cloud platform right away, and they’re more likely to check and recheck for good measure before it goes (even with the automated safety nets).

My team is all about removing barriers for Developers, while maintaining safety of the platform for our customers. My job is done right if a Developer doesn’t even notice all the checks and balances that go on…that is unless they make a mistake and we catch it.

Finally we cover off the Operations side of things. With all this automation done, the demand for monitoring and notification is high.

If a build fails in the cloud, and nobody is around to see it, did it really fail? -Me, just now

We utilize CloudWatch metrics extensively through DataDog to handle a vast majority of our day to day cloud operations. Everything that goes on shows up on a fancy dashboard I’ve built and I can see the health of the platform at a glance. Beyond that, we have a number of monitors and alarms that will send emails, and messages to our operations slack channel whenever something has gone wrong. We make extensive use of outlier detection to determine what kind of behaviors are out of the norm.

In addition, we have a number of services and checks that we have designed and implemented to watch the health of our actual service. It’s hard to use third party tools out of the box when you’re developing a first party product, so this one went hand in hand between the DevOps team and the engineers working on the microservices. They put in API routes to query health, and we put in automated checks and alarms that leverage those routes. There’s a handful of other lambda’s that check status of things like our CodeBuild projects to notify teams when the builds and deployments fail, or notify the DevOps and media teams when the STUN services are experiencing issues.

This proactive monitoring and alarming lets us address and fix issues at the development level, before a customer ever even contacts our support team. We typically identify, and at a minimum backlog a fix, before our customers ever notice there was a problem. This makes for a really great experience for everyone, developers, operations, support, managers, and most importantly our customers.

The Tenants of our DevOps Practice

Now with all of that introduction out of the way of what DevOps is, and why it’s so important and awesome…let’s get into the meat and potatoes of my DevOps practice. These are sort of my commandments when it comes to developing, designing, and implementing DevOps and CI/CD processes.

  1. Automate everything. Everything that can be automated, should be. Even if it starts as a manual process to figure out how to do it, we always immediately turn that into code and make it run automatically. If we didn’t, DevOps just wouldn’t scale with our platform.
  2. Remove barriers. The last thing I want to be is the guy holding the master key for everything, because I don’t want my phone ringing in the middle of the night because an emergency code fix has to be pushed into the cloud. If I followed commandment #1, then there’s no reason to call me. Just go ahead and push the code.
  3. Implement safety nets. Nobody is perfect, and we all make mistakes. As soon as you admit that, and allow developers to make mistakes, they’re going to be much faster at iterating on code and producing some neat innovations. Let them make mistakes, just catch them before they become a problem for the public.
  4. Never stop improving. The speed of cloud development demands that we are never fully done with our work. AWS releases new services, improves security, and gives us new features and functionality. We are always dedicated to going back and re-working code and processes to use the latest and greatest in services and features. The more I can push off to AWS to handle instead of my custom code, the better.
  5. Always be teaching. Especially in a company where DevOps isn’t a thing yet, it’s important to always be teaching others how to do their own DevOps work. Not only does this make the concept of DevOps proliferate your organization, but it also shares the burden of work across the entire development team. It takes time, but I promise you will reap the rewards when developers maintain their own CI/CD pipelines, leaving you to keep focused on commandment #4.
  6. Secure everything. It’s worth the effort to start with the least possible permissions across the board up front, than to have to go back and dial in back in later. Dialing in later causes outages, downtime, and unintended behavior. Spent the extra bit of time in the cycle up front and make sure you’re not leaving things wide open (especially S3 buckets).

Generally these are sort of my top level drivers for everything. There will be “sub” commandments within each of those to drive me down the right path, but so long as I keep these in mind with every top level decision I make – I know I’m moving in the right direction.

Got a good one I missed? Send me an email I’d love to hear what some of your guiding practices are.

Until next time.