How does your DevOps setup stack up against the competition?

Kaspar Von Grünberg
CEO @ Humanitec

Benchmarks aren’t just for software and hardware. They’re just as applicable to DevOps as a broader engineering process. To improve, you must know not only how you’re doing within your organization, but also contextualize your performance within the greater ecosystem.

Of course, that’s a massive undertaking. It’s not as if you can just peek behind the curtain of your competitors’ processes to see what they’re doing. Many engineers find themselves left to guess at trends — which is clearly suboptimal!

Fortunately, there’s hope in the form of the latest DevOps Benchmarking Study. Although these annual data findings have been gaining popularity since the Humanitec team first started tracking, the 2023 edition bore some fresh revelations that deserve a look. 

Kaspar von Grünberg, Humanitec CEO, gave us a detailed insider’s perspective. While you can and should check out the study itself, this overview will get you up to speed on the findings and how to take action.

Understanding the findings: What did the DevOps Benchmarking Study 2023 reveal?

As Kaspar explained, the study allowed approximately 2,000 respondents to self-evaluate. Respondents were ranked by their perceived performance levels and then asked several specific questions. While the team did its best to take a representative worldwide sampling, many respondents communicated through Twitter and worked for US-based companies.

When asked how their setups ranked on a 0-100 scale, the message was clear: Many respondents were struggling just to get over the hump. 

Exploring infrastructure

The DevOps “mountain of tears” graph above is illustrative, but most of the report’s insights were a bit more specific. For instance, most top performers used public-cloud infrastructure, while on-prem was more popular with low-performers. 

As Kaspar pointed out, there are plenty of reasons why infrastructure choice is more subtle than it seems. For example, big enterprises like banks face way more governance issues that could make it harder to migrate. Changing market pressures and public-cloud pricing are also factors — CIOs are on a mission to reduce costs. 

Quantifying containerization 

Containerization levels also showed an irrefutable trend: Top-tier performers containerized more of their services. High- and middle-tier performers are also trying to catch up, with significant efforts made to ramp up containerization.

Picking your path to orchestration 

As for which orchestration tool organizations preferred, Kubernetes was the clear knock-out winner. No matter which performance category the study looked at, no other technology came close — a big difference from past studies where options like Docker Swarm still had a significant market share. 

One interesting fact that might surprise you is that serverless isn’t quite making the impact that the hype train suggests it would. Serverless wasn’t the preferred winner because Kubernetes remains the predominant method for in-demand workflows like event processing.

DORA metrics

The DevOps Research and Assessment (DORA) metrics are designed to be relatively clear-cut. As the findings showed, however, there was plenty of nuance to be had here as well. 

Even though there was a huge difference in metrics like lead times — on the order of minutes versus months — comparative evaluation isn’t always accurate. If you have a lot of users or countless governance hurdles to overcome, comparing yourself to a smaller team with fewer restrictions isn’t quite fair.

Deployment frequency was another example of a large respondent spread that deserved measured consideration. While differences in deployment calendars were stark, Kaspar pointed out that most CI/CD setups were improving in terms of year-over-year performance.

Mean time to recovery rates were also significantly skewed. Top performers bounced back from problems in less than one hour or within a day, but a not-insignificant chunk of the low performers needed up to a week to get back on track.

Finally, there was the change failure rate — another great representation of how things are getting better. Most top performers still maintained lower change failure rates than low performers. Still, three times as many low performers fell into the 31-45% category just two years earlier. Progress!

Configuration management: Looking for improvements 

This category was important for multiple reasons. Yes, it showed what people were doing. The real insight, however, was that it revealed what they could be doing better. 

Here’s an example: The majority of all respondent categories used VCS for storing application configs. At the same time, the difference between those who did and those who didn’t was far more pronounced for top performers. 

The good news here is that the industry is getting the message. According to Kaspar, most low performers didn’t use VCS for application configurations two years ago. 

The findings also reinforced the value of selective, judicious separation of concerns. The better an organization performed, the more likely it was to handle its application configs the same way it dealt with infrastructure dependencies. 

The study showed a similar trend regarding what companies did with their environment-specific and environment-agnostic configurations. Most top performers — around 81% — enforced strict separation, while almost 67% of low performers let their configs intermingle. 

This wasn’t just an ideological distinction. Kaspar underscored some real benefits to keeping things from getting too blurry. Like reducing the maintainable surface area for configurations and letting different teams focus on their specific areas.

Another effective way to break down application configuration management is to assess standardization and once again, the findings were stark. More than 80% of top performers applied a standardized management framework to all applications, but about the same percentage of low performers used separate configuration management strategies for each application. 

Degree of Self-service

Self-service can be hard to gauge thanks to a broad range of conflicting opinions on what it entails. According to Kaspar, you have to focus on the lifecycle-wide, holistic view. Restricting yourself to Day Zero analysis might not provide a justifiable return on investment. 

With that in mind, Day Zero conditions still offer useful self-service data points. The results showed that top performers made it easy for devs to spin up new features and preview environments with minimal help from management or Ops personnel. They also democratized autonomous deployment to dev and staging environments, eliminating bottlenecks. 

Provisioning was another major example of how some companies have a lot of catching up to do. Most top performers managed provisioning as code — such as by using Terraform — but low performers were far more Ops-dependent. 

Despite the gains, Kaspar pointed out that things are far from perfect. Currently, many organizations have unmaintainable ratios of ops personnel to developers — something like one person serving just ten or fifteen. Ideally, each Ops team member should be serving upwards of 50 to 80 devs. Sadly, it looks like the platform community still hasn’t mastered the art of detaching app scaling from operations team scaling. 

Another industry-wide hurdle involves merely getting informed. Determining which database relies on which service is more of an ordeal than it should be.

While GitOps seems to be the solution of choice, it’s not exactly easy to debug. For this reason, Kaspar is a big advocate of dynamic configuration management (DCM). 

DCM has far too many benefits to cover here, but one ever-present problem it could help solve is the bootstrapping challenge. Many low-performing respondents reported spending an inordinate amount of time bootstrapping and deploying new apps.

Putting the key lessons to use

One interesting result is that the findings painted a bigger picture. They sketched out the ideal design elements of Internal Developer Platforms (IDPs).  

Kaspar’s four vital criteria for building scalable golden paths: 

1. Optimize for containers in K8s

 Although this is mostly a done deal, there’s still some work to do, so don’t let up.

2. Enforce standardization by design

Pushing for standardization can lead to political strife, but doing it right by making platforms the driving factor is worth the risk of ruffling some feathers.

3. Enable self-service

Self-service minimizes cross-team dependencies and eliminates silos, improving your odds of serving developers properly.

4. Use dynamic configuration management

Want to achieve high-performing status? Separate your environment-agnostic and environment-specific configurations. Handle application and infrastructure configurations the same way. Use abstract, environment-agnostic elements to build on the baselines provided by your engineers. With this groundwork laid, you can create configurations just-in-time with deployments and enable true self-service.

Finally, remember that platforms breed success. 93% of top-performing organizations use them to enable developer self-service. According to Gartner, 80% of organizations will have IDPs by 2026.

Getting concrete with IDPs

Even Kaspar admitted that it’s easy to fall into the buzzword trap. A good starting point for getting more rigorous is McKinsey’s forthcoming reference architecture:

The main gist is that McKinsey broke IDPs into separate components, the planes in the above diagram. The specific components in each plane are exchangeable depending on your needs, but remember there’s always going to be some interplay. For example, the developer control plane is where the devs (and platform team) interact with the platform. It has to be open to user choice to promote adoption, elevate vital information to the surface, and support efficient workflows. 

Another huge takeaway was that getting the fundamentals right makes it easy for the benefits to fall into place. For instance, proper dynamic configuration management drives standardization by design and enables multiple benefits that ultimately minimize time to market. 

Finally, there’s Dynamic Configuration Management (DCM). To leverage DCM as your platform’s core, you’ll need three components: 

  1. A uniform workload specification devs can use to describe their apps' key relationships in abstract terms.
  2. A Platform Orchestrator that generates config files in a context-aware fashion, dynamically building your applications with each deployment, and
  3. Drivers that connect the platform to your cloud or on-prem resources.

Kaspar wrapped up this great talk by getting hands-on with the golden path concept, and explaining how to create paths that open doors instead of closing them or shielding developers. Unfortunately, you’ll have to check out the webinar on YouTube for the full experience, as we’re just about out of time here. 

We’d advise you not to miss this exciting dive into what a fully functional, smooth-operating IDP should look like — and you’ll definitely want to stick around for the Q&A session. Watch this space for more enthralling PlatformCon talks, and be sure to download your free copy of the DevOps Benchmarking Study 2023!