Carefully Poking Holes: Using Cross Account Custom Authorizers in API Gateway

First off, apologies for the brief hiatus. I hit a bit of a busy period with work and fell off the posting wagon.

AWS recently introduced support for API Gateway to use a Lambda custom authorizer in API Gateway. Previously the Lambda custom authroizer had to exist in the same AWS account as the API Gateway, which causes problems in our architecture since we want to use a singular token service for REST APIs across all accounts.

We originally solved this problem with what we dubbed the Auth Proxy. The Auth Proxy Lambda lives in an S3 bucket in a shared account that can be deployed with CloudFormation by referencing it’s location. The bucket policy for the bucket in S3 is configured to allow CloudFormation to get the zip package during a deployment. Finally, when we run the CloudFormation stack we do a lookup for the ARN of the Token Lambda, store it as an environment variable for the Auth Proxy, and then add permissions for the Auth Proxy in that account to be able to do a Lambda invocation on the token lambda.

Phew, that’s a lot of steps for something that would be so much easier if we could just point API Gateway at the Lambda in the other account. Now that API Gateway supports exactly that, handling the delicate process of opening up permissions cross account needed to be tackled.

I’ve mentioned in previous posts that we have a helper function that handles a bunch of different operations, and in this particular case there’s two operations that we care about:

  1. Looking up the ARN for the token service for the particular cloud we’re deploying to (is this the development cloud, the production cloud, or some other silo’d cloud deployment)
  2. Adding permissions to the Token service Lambda to allow us to invoke it from another account

As with all things I test to answer technical questions before I wade knee deep into changing my deployment processes.

For the first operation, I found everything was good – my existing lookup of the ARN for the cloud meant that I have the information directly in CloudFormation so I just changed my CloudFormation stack output that I reference to build the authorizer in the API Gateway swagger file before deployment and ran a deploy.

Everyone deployed great, and I thought I’d cracked it with a super simple fix…until I actually tried to use the REST API. Authorization was completely broken, I just wasn’t getting a response back from the Token service. How could this be? Digging into the API Gateway logs, it appears as though I’m getting an access denied. The deployment helper Lambda took care of my permissions on the Token service, so everything should be fine.

So what’s the issue?

API Gateway performing a Lambda invocation requires a different format of the permissions on the target Lambda.

In my original implementation the permissions were being applied to the Token Lambda for another Lambda to invoke it. Now that the API Gateway is invoking the Lambda directly, it’s permission set looks completely different.

Previously when I used the AWS SDK to set permissions, my parameters looked like this:

This would result in a function policy block that looks like this

We can see by the policy that we’re allowing a lambda:invokeFunction call from the other account on this lambda.

Now that I need to allow API Gateway to invoke directly, we need to apply permissions a bit differently. To keep things secure, we need to make sure we’re only allowing executions from our API, limiting it to our specific REST API ID.

This gives us a resulting policy that looks very different:

As you can see there’s a lot more going on here. Now we need to have a principal of, specify the account it’s coming from, and set the ARN to be our API Gateway. Now that I’m creating the permissions properly, authentication is running through just fine!

If you’re curious all together how we do this, here’s the full function code. By using a deterministic statement ID I can ensure that we’re not applying permissions that already exist and just jamming up the function policy with redundant permission sets.

And that’s it! In the interim until I move all the API’s over, I will continue to set permissions for both the Auth Proxy and for the new API Gateway direct method then just drop the code and permissions for the Auth Proxy when it gets fully retired.

Dropping the Auth Proxy gives us at least less complexity to deal with in our solution, and in the best case actually speed up our authentication a bit by reducing a few hops through various Lambda functions.

That’s it for today, enjoy your week!