This is How We DevOps - The Joy of DevOps

I realized that in writing this blog, I assumed that everyone knew why the DevOps model is so important, and what CI/CD means, and why we do it.

That’s probably a bad assumption. As a new(ish) development ethos, DevOps isn’t widely adopted and it can be difficult to sell DevOps as a revolution in development to an organization. I’m going to take a bit of time to explain why we do DevOps here, and what our guiding principals are whenever we develop our CI/CD processes.

Why we DevOps

Okay, I know some people are going to say “hey, James…DevOps isn’t a verb!” but it’s a made up word and a contraction of Development and Operations anyways so I’m gonna use it in weird ways simply because I can.

The core tenant of DevOps is exactly that, amalgamating Development and Operations teams. For us, we do this because it allows our development teams to develop, and push to AWS from anywhere anytime. This is achievable because we automate just about everything, and the automation allows us to put in all kinds of security blankets for builds, testing, and security to help developers not make mistakes and push bad code directly to the public. This is a huge improvement over waterfall and really empowers our developers to drive change in the product at a rapid pace. If someone is excited about a new feature and works until 4AM getting it working, why wait another year before actually releasing it? Here’s the best part, when developers know that when they commit that code, it’s going up to the cloud platform right away, and they’re more likely to check and recheck for good measure before it goes (even with the automated safety nets).

My team is all about removing barriers for Developers, while maintaining safety of the platform for our customers. My job is done right if a Developer doesn’t even notice all the checks and balances that go on…that is unless they make a mistake and we catch it.

Finally we cover off the Operations side of things. With all this automation done, the demand for monitoring and notification is high.

If a build fails in the cloud, and nobody is around to see it, did it really fail? -Me, just now

We utilize CloudWatch metrics extensively through DataDog to handle a vast majority of our day to day cloud operations. Everything that goes on shows up on a fancy dashboard I’ve built and I can see the health of the platform at a glance. Beyond that, we have a number of monitors and alarms that will send emails, and messages to our operations slack channel whenever something has gone wrong. We make extensive use of outlier detection to determine what kind of behaviors are out of the norm.

In addition, we have a number of services and checks that we have designed and implemented to watch the health of our actual service. It’s hard to use third party tools out of the box when you’re developing a first party product, so this one went hand in hand between the DevOps team and the engineers working on the microservices. They put in API routes to query health, and we put in automated checks and alarms that leverage those routes. There’s a handful of other lambda’s that check status of things like our CodeBuild projects to notify teams when the builds and deployments fail, or notify the DevOps and media teams when the STUN services are experiencing issues.

This proactive monitoring and alarming lets us address and fix issues at the development level, before a customer ever even contacts our support team. We typically identify, and at a minimum backlog a fix, before our customers ever notice there was a problem. This makes for a really great experience for everyone, developers, operations, support, managers, and most importantly our customers.

The Tenants of our DevOps Practice

Now with all of that introduction out of the way of what DevOps is, and why it’s so important and awesome…let’s get into the meat and potatoes of my DevOps practice. These are sort of my commandments when it comes to developing, designing, and implementing DevOps and CI/CD processes.

Automate everything. Everything that can be automated, should be. Even if it starts as a manual process to figure out how to do it, we always immediately turn that into code and make it run automatically. If we didn’t, DevOps just wouldn’t scale with our platform.
Remove barriers. The last thing I want to be is the guy holding the master key for everything, because I don’t want my phone ringing in the middle of the night because an emergency code fix has to be pushed into the cloud. If I followed commandment #1, then there’s no reason to call me. Just go ahead and push the code.
Implement safety nets. Nobody is perfect, and we all make mistakes. As soon as you admit that, and allow developers to make mistakes, they’re going to be much faster at iterating on code and producing some neat innovations. Let them make mistakes, just catch them before they become a problem for the public.
Never stop improving. The speed of cloud development demands that we are never fully done with our work. AWS releases new services, improves security, and gives us new features and functionality. We are always dedicated to going back and re-working code and processes to use the latest and greatest in services and features. The more I can push off to AWS to handle instead of my custom code, the better.
Always be teaching. Especially in a company where DevOps isn’t a thing yet, it’s important to always be teaching others how to do their own DevOps work. Not only does this make the concept of DevOps proliferate your organization, but it also shares the burden of work across the entire development team. It takes time, but I promise you will reap the rewards when developers maintain their own CI/CD pipelines, leaving you to keep focused on commandment #4.
Secure everything. It’s worth the effort to start with the least possible permissions across the board up front, than to have to go back and dial in back in later. Dialing in later causes outages, downtime, and unintended behavior. Spent the extra bit of time in the cycle up front and make sure you’re not leaving things wide open (especially S3 buckets).

Generally these are sort of my top level drivers for everything. There will be “sub” commandments within each of those to drive me down the right path, but so long as I keep these in mind with every top level decision I make – I know I’m moving in the right direction.

Got a good one I missed? Send me an email I’d love to hear what some of your guiding practices are.

Until next time.

James.