DevOps as originally conceived was more of a philosophy than a set of practices—and it certainly wasn’t intended to be a job title or a role spec. Yet today, DevOps engineers, site reliability engineers, cloud engineers, and platform engineers are all in high demand—with overlapping skillsets and with recruiters peppering role descriptions with liberal sprinklings of loosely related keywords such as “CI/CD pipeline,” “deployment engineering,” “cloud provisioning,” and “Kubernetes.”
When I co-founded Kubiya.ai my investors pushed me to better define my target market. Was it just DevOps or also SREs, cloud and platform engineers and other end users?
More recently I’m seeing lots of interest from job seekers and recruiters in defining these roles. From Reddit posts to webinars, this is a hotly debated topic.
In this article, I offer my thoughts, but recognize there’s a great deal of room for interpretation. This is an inflammatory topic for many—so at the risk of provoking a conflagration, let’s proceed!
The Proliferation in DevOps Job Specs
The practice of DevOps evolved in the 2000s to address the need to increase release velocity and reduce product time to market while maintaining system stability. Service-oriented architectures were allowing separate developer teams to work independently on individual services and applications, enabling faster prototyping and iteration than ever before.
The traditional tension between a development team focused on software release and a separate, distinct operations team focused on system stability and security grew. This hindered the pace that many businesses aspired to. Devs didn’t always properly understand operational requirements, while ops weren’t able to head off performance problems before they had arisen.
The DevOps answer was to break down silos and encourage greater collaboration facilitated by tooling, cultural change, and shared metrics. Developers would own what they built—they would be able to deploy, monitor, and resolve issues end to end. Operations would better understand developer needs; get involved earlier in the product lifecycle; and provide the education, tools, and guardrails to facilitate dev self-service.
DevOps as originally conceived was more of a philosophy than a prescriptive set of practices—so much so that there isn’t even common agreement on the number and nature of these practices. Some cite the “four pillars of DevOps,” some the “five pillars,” some the six, seven, eight, or nine. You can take your pick.
Different organizations have implemented DevOps differently (and many not at all). And here, we can anticipate the job spec pickle we’ve found ourselves in. As Patrick Debois, founder of DevOpsDays, noted, “It was good and bad not to have a definition. People… are really struggling with what DevOps is right now. Not writing everything down meant that it evolved in so many directions.”
The one thing that DevOps was not was a role specification. Fast forward to today, and numerous organizations are actively recruiting for “DevOps Engineers.” Worse still, there is very little clarity on what one is—with widely differing skillsets sought from one role to the next. Related and overlapping roles such as “site reliability engineer,” “platform engineer,” and “cloud engineer” are muddying already dim waters.
How did we get here, and what—if any—are the real differences between these roles?
DevOps and DevOps Anti-Types
In my experience, realizing DevOps as it was originally conceived—i.e., optimally balancing specialization with collaboration and sharing—has been challenging for many organizations.
Puppet’s 2021 State of DevOps report found that only 18% of respondents identify themselves as “highly evolved” practitioners of DevOps. And as the team at DevOps Topologies describe, some of these benefit from special circumstances. For example, organizations such as Netflix and Facebook arguably have a single web-based product, which reduces the variation between product streams that can force dev and ops further apart.
Others have imposed strict collaboration conditions and criteria—such as the SRE teams of Google (more on that later!), who also wield the power to reject software that endangers system performance.
Many of those at a lower level of DevOps evolution struggle to fully realize the promise of DevOps, owing to organizational resistance to change, skills shortages, lack of automation, or legacy architectures. A wide range of different DevOps implementation approaches will have been adopted across this group, including some of the DevOps “anti-types” described by DevOps Topologies.
For many, dev and ops will still be siloed. For others, DevOps will be a tooling team sitting within development and working on deployment pipelines, configuration management, and such, but still in isolation from ops. And for others, DevOps will be a simple rebranding of SysAdmin, with DevOps engineers hired into ops teams with expanded skillset expectations, but with no real cultural change taking place.
The rapid adoption of public cloud usage has also fueled belief in the promise of a self-service DevOps approach. But being able to provision and configure infrastructure on-demand is a far cry from enabling devs to deploy and run apps and services end to end. Not all organizations understand this, and so automation for many has stalled at the level of infrastructure automation and configuration management.
With so many different incarnations of DevOps, it’s no wonder there’s no clear definition of a DevOps role spec. For one organization, it might be synonymous only with the narrowest of deployment engineering—perhaps just creating CI/CD pipelines—while at the other end of the spectrum, it might essentially be a rebranding of ops, with additional skills in writing infrastructure as code, deployment automation, and internal tooling. For others, it can be any shade of gray in between—and so here we are with a bewildering range of DevOps job listings.
SRE, Cloud Engineer and Platform Engineer – Teasing Apart the Roles
So depending on the hiring organization, for better or worse, a DevOps Engineer can be anything from entirely deployment focused to a more modern variation of a SysAdmin.
What about the other related roles: SREs, cloud engineers, and platform engineers? Here’s my take on each:
Site Reliability Engineer
The concept of SRE was developed at Google by Ben Traynor, who described it as “what you get when you treat operations as a software problem and you staff it with software engineers.” The idea was to have people who combine operations skills and software development skills to design and run production systems.
The definition of service reliability SLAs is central and ensures that dev teams provide evidence up front that software meets strict operational criteria before being accepted for deployment. SREs strive to make infrastructure systems more scalable and maintainable including—to that end—designing and running standardized CI/CD pipelines and cloud infrastructure platforms for developer use.
As you can see, there’s a strong overlap with how some would define a DevOps engineer. Perhaps one way of thinking about the difference is that whereas DevOps originated with the aim of increasing release velocity, SREs evolved from the objective of building more reliable systems in the context of growing system scale and product complexity. To some extent, the two have met in the middle.
As the functionality of cloud has grown, some organizations have created dedicated roles for cloud engineers. Again, although there are no hard and fast rules, cloud engineers are typically focused on deploying and managing cloud infrastructure, and know how to build environments for cloud-native apps. They’ll be experts in AWS/Azure/Google Cloud Platform. Depending on the degree of overlap with DevOps engineer responsibilities, they may also be fluent in Terraform, Kubernetes, etc.
With the forward march of cloud adoption, cloud engineer roles are subsuming what formerly might have been called an infrastructure engineer, with its original emphasis on both cloud and on-premises infrastructure management.
Internal developer platforms (IDPs) have emerged as a more recent solution to cutting the Gordian knot of how to balance developer productivity with system control and stability. Platform engineers design and maintain IDPs that aim to provide developers with self-service capabilities to independently manage the operational aspects of the entire application lifecycle—from CI/CD workflows; to infrastructure provisioning and container orchestration; to monitoring, alerting, and observability.
Many devs simply don’t want to do ops—at least not in the traditional sense. The developer as a creative artist doesn’t want to worry about how infrastructure works; and so, crucially, the platform is conceived of as a product, achieving control by creating a compelling self-serve developer experience rather than by imposing mandated standards and processes.
Getting Comfortable with Dev and Ops Ambiguity
So where does this leave candidates for all these various roles? Probably for now—and at least until there is greater commonality of DevOps implementation approaches—the only realistic answer is to make sure you ask everything you need to during an interview clarifying both the role expectations and the organizational context into which you will be hired.
For recruiters, you may decide for various reasons to cast a wide net, stuffing job postings with trending keywords. But ultimately the details about a candidate’s experience and capabilities must come out in the interview process and conversations with references.
From my perspective here at Kubiya.ai, whether you are a DevOps, Platform Engineer, Cloud Engineer or even an SRE, making sure you are supporting developers with all their operational needs will go a long way in helping them focus on creating the next best thing.