Trusting DevOps and Preventing CompromiseAnirban Banerjee
DevOps is king, long live DevOps! This article is going to talk about how enterprises are finding out that all is not well in the rush to automate everything under the sun by employing their DevOps teams. More importantly it discusses how to create a chain of trust and barriers to prevent system wide compromises that can lead to Equifax level (severity wise) data breaches. the goal is to have ones cake and eat it too – how can an organization learn to embrace DevOps and their favorite automation tools but have guarantees (to some reasonable degree) that the toolchains cannot be misused easily. This article is a follow up to the one we have published a few months ago – https://www.onionid.com/blog/bridging-autoliance-gap-pci-sox-soc2-nist-nerc-compliance/
What is this DevOps thing?
For many years the job of creating software and the job of making sure that it runs and provides good service to customers has been done by two different teams in most enterprises: Development and Operations. Development would write code, test it and ship the product – like an online tool to generate policy quotes for life insurance. While the operations team would be responsible for making sure this policy quote tool keeps running like it should, responds quickly to customers and does not break. There is a fundamental issue with this model – its asking someone to babysit a newborn when they have no real clue as to what this specific child’s idiosyncrasies are. DevOps is, simply put, a mash-up of Development and Operations. Its not as loose and haphazard as the name might suggest – its actually quite an efficient way to straddle the gap of code and product development and maintaining it working all the time. A good description of DevOps is found here – https://en.wikipedia.org/wiki/DevOps
Caring for Infants makes the case for DevOps! Huh?
I would argue yes! I know this is a bit of a stretch but bear with me for a bit. As a parent you quickly realize that your children, coming from the same gene pool are quite different. Even though they may be the same gender, they still have different responses to the same stimuli. Many of us choose to, or by circumstances, are required to go back to our work after having a child. In such cases you may have to leave your child in the care of another person, a grandmother, your sister, a nurse or someone else that you can trust to take care of your child. Consider the simple example – the child’s grandmother might be good at taking care of young children but she will struggle at times to console the infant when he/she cries. It will not be clear immediately to the caregiver whether the child is hungry, needs a diaper change, someone needs to walk the child, sing to it, show lights and moving objects – you name it. With time the caregiver learns what the child responds to and how to best soothe a child. Keep in mind the caregiver may have experience with other children, but that does not mean 100% of that experience smoothly transitions to this infant. Hence Ops teams do have good amount of experience handling various products, but since they are not the folks who built the product in the first place they will have to bang their heads against a wall and learn the idiosyncrasies of the product. DevOps is a good thing – it makes sense to have people straddle both sides of the divide – albeit with some amount of good judgment. Too many times DevOps teams are stretched thin to the point of breaking.
Automation Automation Automation – The DevOps way
Yes, most certainly! Lets automate everything and squeeze out as much efficiency as we can. And lo and behold – the gods of open source software have given us the bounty of good quality tools that we can use to create our custom pipelines of DevOps automation. Consider the bevy of tools from configuration management, continuous integration, continuous deployment, serverless-lambda, container launch/manage and so much more. Here is a great list of the top 30 DevOps tools that teams love to use in enterprises – https://www.onionid.com/blog/30-must-have-tools-to-support-devops/ . The great news is DevOps teams today can literally craft pipelines, built with open source tools, where the developers check in code for their product, bug fixes and many other thing and this code gets automatically tested, launched into production, managed, updated, patched all with zero manual intervention.
So, its all milk and honey then right?
Ah! Not quite – would I be writing this if that were so? nopes. If you have the time you might want to read up on our previous article “The Autoliance Gap”.
Lets get to the point – No there are some major issues here. All these DevOps toolchains are built on a source of trust. These toolchains employ various pieces of software, and these software perform a specific task in the toolchain. Under the hood of these shiny, brand spanking new pieces of software lurks something that is actually quite old-school. What might it be? – SSH!
Most of the tools that are used to quickly automate various server deployments of code use SSH under the hood to establish secure connectivity. This itself is not the problem. The problem lies in the way that trust is embedded into each tool in the toolchain. Here is a simple example – you have a configuration manager like Salt/Ansible/Chef and you have other pieces of software to perform testing, continuous deployment like Jenkins, Bamboo and more. Each one of these tools needs to talk to the various servers in your datacenter or AWS clusters. Each tool is going to use something called an SSH key (god forbid that you are nailing root passwords in scripts!). A good explanation of what is an SSH key is here – https://www.onionid.com/blog/a-better-way-to-manage-ssh-keys/ . The point being these tools use an identifier to access your servers and are authorized to make changes to the software running on them. The compromise of that single identifier – the account which controls the SSH key – is disastrous. Someone compromising one single tool in the entire chain, gaining access to the SSH key can now break your entire toolchain and worse, gain total control of your infrastructure.
So, SSH is bad?
No. In fact I love SSH. I use it every day – I love agents, X11forwarding, all the flags – I am a fan of SSH! However, the concept of using one single mother account with god like power, to be registered across all the tools in the toolchain in very risky. Think of it as using your social security number to identify yourself at your Fedex pickup location, You should not use a super sensitive piece of information everywhere. Of course there are many strategies like Yubikey usage, airgapping production and test/QA but lets be realistic – how many enterprises afford their DevOps teams the time and space to do this – not many. Also, another quick argument – lets use DUO 2FA! yeah – who is going to be clicking a Yes on the mobile device 2000 times a day when automated pieces of software are accessing your servers? better have lightning fast fingers.
C’mon, is there no hope?
Of course there is. And its actually a relatively simple strategy. Coming up with a grand plan to reorganize and re-architect toolchains and pipelines is not going to work for 90% of enterprises. There is always time pressure to push out products, fix bugs and what not, hence a relatively simple 2 step strategy can alleviate the concerns expressed here and at the same time simplify life for DevOps automation from a security/compromise risk perspective.
- Do not store/register SSH keys in the tools that are part of your toolchain. Instead use a dynamic secret vault that can provide the value in real time to the tool and then if needed change the value at regular intervals. This prevents the case of a hacker breaking a tool and then getting an SSH key and then using it to access all the systems. Here is an example – https://youtu.be/xqhiNjedOSM
- Use inline filtering or endpoint filtering for SSH commands. For example, using a simple transparent SSH proxy (not a bastion necessarily) can help you limit the commands that a tool can execute on a host. You can also try to limit the commands that a tool can run using an account on the endpoint – the manageability for this approach is lower. More effort will be needed. Here is an example – https://youtu.be/3qwpfBcr4Lw
In this article we have highlighted what the issues with the toolchain trust model in DevOps practices are, and have provided some rationale from our experiences with customers. We find that implementing two simple steps as mentioned above helps reduce the risk of toolchain-infrastructure compromise significantly. If you need further information about Onion ID or how it can help you secure your DevOps toolchains, please get in touch with us and we will be happy to help.