Platform engineering

Why platform teams are the key to success

Nigel Kersten
Field CTO @ Puppet

Talk Transcript

Kaspar  02:14

I'm happy to be back here with Nigel, my voice is a little less crystal compared to last time. But I think we will get the content as crystal ia last time. Again, it's Nigel and Nigel has been doing this for 10 years. And with that, let's maybe start off with that. Nigel, can you for those that are joining new and that don't know, can you give us a little bit of background about yourself and 10 years of State of DevOps Report by Puppet? Can you give us a brief reflection on the last 10 years to start with?

State of DevOper Report at Puppet and the DevOps evolution model

Nigel  02:48

Absolutely. Thanks, Kaspar. It's great to be back here. So my background was in operations. So essentially large scale Linux System Administration, but also a bunch of Mac OS 10, and Linux endpoint management as well. I moved over, I worked as an SRE at Google for about four years before I joined. So originally, an Australian moved to the US, worked for Google and then joined Puppet when we were really quite small, we're about eight or nine people. And that was originally a running product. And about this same period was when the whole DevOps movement was starting to escape beyond, you know, the first couple of DevOps days, the first few things that Andrew and Patrick had done to try and put together and codify this whole movement. And Puppet sat very much at the center of all of that. So we started doing the State of DevOps Report 10 years ago. And to be honest, in those days, a lot of that was trying to explain to people exactly what DevOps was and what it actually meant. And particularly, I think, folks in the enterprise were very resistant to the idea of how DevOps would actually work - whether DevOps was actually even a good idea, to be honest. And I think if we fast forward 10 years, we see that everyone either feels like they've succeeded well enough that they don't really talk about DevOps anymore in applying some of these principles, but breaking down silos, or they are really struggling with how to implement them at scale. And this sort of brings us down to one of the reasons why I love chatting with the Humanitec folks so much is that what we've seen in the enterprise in particular, is the way to succeed with scaling out DevOps initiatives is by following something like the platform team model. And I think this has been something we've been seeing emerge over the last couple of years. And that's one of the reasons why this was such a big focus for this year's State of DevOps Report. I'm field CTO at Puppet by the way, I'm normally based in London, but currently navigating the nightmares of travel during a pandemic, and then back in the US just for a little while to do some paperwork.

Kaspar  04:55

Amazing. Okay, let's dive a little deeper before we go into the results. You have this very specific way of segmenting teams, and then actually looking at the results, and I think you call it low, medium and high evolution of DevOps. To calibrate the conversation a little bit. Can you explain why you're doing this? Why are we taking this statistical approach? So we can understand the results a little better?

Many teams stuck in the middle of the DevOps evolution

Nigel  05:21

Yeah, absolutely. So I think we ended up in this period, about three or four years ago, when we're doing the State of DevOps Report where it felt like we'd really had the dam broken in terms of mainstream enterprise, large, traditional companies trying to do DevOps. And I kept having the same conversation over and over again, with, you know, directors and VPs and C level folks where they'd be going. Saying "Look, we have a sense of where we should end up, there should be, you know, lower friction, fast flow of software delivery, low cognitive load, better quality. But we don't really know how to get from where we are now to what we should do first, what should we do second?" You know, where do we actually start here? Because if you look at, you know, a 200 year old bank, it's built up processes and has all sorts of organizational issues, it's built up over years, that you can't just sweep away. In any large organization of many 10s of 1000s of people it takes a while to have change actually happen. So we've been segmenting folks for a while in terms of outcomes, like what's your mean time to recovery? How quickly can you deploy software? How quickly can you respond to change? What's your failure rate, those sorts of things. And people naturally were falling out when we did cluster analysis into several groups. Where you have sort of low folks who really hadn't made much progress at all adopting these principles. There were the medium folks who were doing okay. And then there were the high folks who actually really succeeded and the flywheel of momentum was humming along. And when we analyzed the actual results, people fell into three pretty distinct categories. And so I think we can see that there's a step change for this sort of implementation, where, once you get it right, you see a big leveling up in terms of returns. Now, to come back to things a little, one of the things we wanted to work out was, how do you get from low to medium? How do you get from medium to high, what were the most important things to actually adopt. And some of this stuff is really basic bread and butter things and the fact that all of you who are probably on this webinar, you might not realize this, but you're probably quite advanced in terms of the adoption curve of these sorts of practices, just by the fact that you're here, choosing to opt into this sort of a webinar. I work a lot with pretty traditional organizations where there are a lot more people who it's just a nine to five job for. And I don't want to sound like I'm criticizing that at all, because I actually think society could do a lot more, you know, work is just work and life is life. And life is the important thing and works are just a means to an end. So I don't want to sound snobbish at all about this. But there are some pretty basic things you can do. So for example, one of the things we saw, one of the most impactful things you can do at the very beginning of your journey is, let's have a version control system, a single one that everyone can access, let's have a secret storage, so people can put secrets inside it. So that we can actually encourage everyone to put all this stuff in version control. And you can get hung up on, you know, what does it mean to do DevOps really, really well. But there's these basic things that most organizations are not really doing. And so to sort of sum all this up in a little bit of a theme, what we've seen, since we first introduced the evolutionary model a few years ago, was that most people are still stuck in the middle, as we talk about that, you know, like they get from low evolution to mid levels of evolution. But most organizations are struggling to move beyond that. And one way you can think about this is that they've optimized for the team. So an individual operations team or development team might actually be doing quite well. But they haven't optimized for the whole organization, or the team of teams. And that's what we tend to think of when we talk about high levels of evolution.

Kaspar  09:03

And there I just wanted to sort of mirror this and by the way, you're Australian and so by default, it's baked in your identity, you can't be snobbish. So, um, but I think I just want to compliment that because, um, we I don't know whether you know, that I actually wanted to share that with you. We are running a DevOps maturity test, and like, like by default, and on, on Twitter, I think, in particular, and we I think we have over 1500 people that responded in the last month or two. And we're calculating a score. I mean, that is an awfully ancestral way of approaching this, but it's really interesting because your theory is manifesting itself in the data there. If you just plot all of the, all of the DevOps maturity scores between zero and 100 You see this chasm if you want, right? And it seems super hard for people to cross that. And I think, yeah, there's very, very interesting, and I think that's why your report and talking about this, and we've talked about this last time that you can't always compare yourself to Netflix right, is so important.

Nigel  10:17

Exactly. Yeah. And I think one of the things there to point out is, you know, DevOps was an ops driven movement and was kind of a grassroots driven movement and practitioners going, this all sucks, let's actually make this better and work out a better way to do things. And I think, what that means is, that a small team, like the team lead, and the people around them who work on that team can actually achieve quite a lot in terms of improving the way they work and overall efficiencies in terms of software delivery. But if you want the whole organization to do it, you often can't do that as the operations lead, or even the director of infrastructure or something like that, you need to actually get your higher level management on board, you need to get people relatively aligned, like getting large groups of people to do stuff is a complicated problem. You know, like, we're not very good at that as a species. I mean, "Wear your masks!"

Kaspar  11:12

Yeah, very good example. Okay, very good. Let's dive in. And let's look at the pillars I think it was, like, if you look at the headlines, DevOps is not just automation, DevOps is not the cloud. Like you, you have this focus on Team Topologies, and patterns on platform teams, let's let's go through all of them. DevOps is not automations, that is, maybe counter intuitive. What do you mean by that?

Doing better and better at automation doesn't necessarily mean you're going to do better at DevOps

Nigel 11:41

Yeah, so I think one of the unfortunate things we've seen, you know, similarly to agile in a way is that when you have these labels around a movement, that are actually creating change, and you know, making a difference to how people work, people are going to jump on the bandwagon, and we're going to get armies of consultants and vendors, you know, not necessarily behaving super authentically around this. And everyone's talking about DevOps, you know, I sell software, I can make it sound like DevOps software. And I think one of the downsides of this has been that we've ended up focusing on the technical aspects of DevOps, of automating things and measuring them, rather than necessarily the breaking down of silos between orgs. And I'm sure many of you who are on the webinar have come across the idea that in large orgs where someone's called a DevOps engineer, and literally, their whole job is to manage a Jenkins pipeline, or do release engineering. And again, I don't want to sound like I'm ragging on release engineering. I think build and release engineering is like an incredibly critical role inside IT. But it's just doing that is not doing DevOps. And so we've seen this myth, I think build up in two areas. One is that if you're doing automation, if you're using infrastructure as code, if you're taking any of these approaches, then you're doing DevOps. And that's just the technical aspect. And similarly, I think we've cloud we've seen a bunch of organizations, essentially rebrand sis admins or systems operators to be cloud engineers and say, we now do DevOps. What we did solve from looking at the analysis was that in terms of automation, it's not really possible to do well at DevOps without having a really high degree of automation, it forms the social contract between different teams that allows you to do things reliably, there's all sorts of benefits to it. But when we went through and did the statistical analysis, it's not actually predictive of DevOps success. So automation is necessary in terms of succeeding at DevOps. But doing better and better and better at automation doesn't necessarily mean you're going to do better and better at DevOps. With Cloud, we saw a slightly different relationship, which was that organizations that are good at DevOps, like they're good at communicating between teams, they're good at optimizing the overall pipeline across different parts of the software delivery lifecycle, those organizations are better at using Cloud, they managed to make more use of the the elastic consumption based all of the advantages the cloud can give you. People who are good at DevOps tend to do better at Cloud, but those things are not synonymous. If that makes sense.

Kaspar  14:13

Yeah, that makes sense. One thing, one element that I want to come back to is this where we say "you build it, you run it", right? Everyone should own everything. And that is great, by definition it solves the throw over the fence problem, it gives you ownership, it has all these positive effects. But if you come up where you basically do a misinterpretation of the DevOps idea, and you now throw everything at them and everything is "you build it, you run it" and we say there is no operations person anymore. Everyone needs to do Jenkins pipelines. In an extreme model, we're also not helping this. Look what I think what I wanted to say is that "DevOps is not just automation" is so insanely important, because you are making clear that it's like, it's so much more on the cultural level, on alignment between these teams of making sure that you understand how much cognitive load, like where's the the trade off between how much self service can I do? How much automation do I have to provide? How much cognitive load do I actually give the developers? If you don't constantly communicate about this, and very important you do like the qualitative communications rather than the transactional, you get nowhere. Right. And I think that's why this funding is so highly relevant.

Nigel  15:51

Yeah, and there's one thing I wanted to jump on today, which is, we talk a lot about optimizing cognitive load for developers. And it's absolutely critical that, you know, we try and make it so that people who can focus on their job and do that, do it efficiently. I think one of the things we see is that if you have entirely autonomous DevOps teams, and I've seen this a bunch of times in big banks, in particular, where folks have gone to "we're going to create small two Pizza teams", they're going to get to own what's the infrastructure they build on. How do they deploy, how do they manage everything, and everyone kind of goes off and does their own thing, that may do a good job at locally optimizing for that particular value stream team or that application, but it doesn't optimize for the whole organization. And what you're actually doing is you're creating cognitive load for your auditors, for your IT Asset Management folks, for all of your governance issues around cost control and security, and so

Kaspar  16:43

The ones that have to go in later and clean up the mess.

Organizations that are succeeding are the ones that are adopting the platform team model

Nigel  16:48

Exactly. And how do you switch from one team to another, like all of these things become really, really complicated. And so I think what we've seen is that the organizations that are succeeding are the ones that are adopting the platform team model. And the analogy I keep coming back to, to think about this is think about roads, you know, like, we have a bunch of different kinds of vehicles that drive on roads, but we want them roads to all have the same kinds of rules. So imagine if you went from one suburb to another and suddenly stopped signs were a different color or a different shape, or you drove on a different side of the road, there are efficiencies we get out of creating common standard, rationalized layers underneath everything. And I think the platform team approach of going, let's have a single place where all of the concerns around infrastructure and everything below it, are actually solved for the developers. And we're solving that in a self-service product mindset kind of way. And so this is one of the reasons to return to something earlier, why we invited the Team Topologies guys to be co-authors of this year's report, which is that the Team Topologies model, if you haven't come across, it really just describes this world of your platform teams, we have value stream teams, and then another two kinds of specialists. And so this year, what we did was, we're all big fans of the State of DevOps Report of the Team Topologies model, let's actually validate it. Like this makes sense to us, we all believe it. Let's go and validate it. And it turned out the data was really, really strong that "yes, the platform team model is a better way to deliver software."

Kaspar  18:21

And do you already have data on how widely distributed this approach is now? How is it starting to grow?

Nigel  18:32

I think it's still aspirational for an awful lot of folks. But everyone's starting to talk about platform teams in the way they were talking about, you know, SRE and GitOps and various other flavors of the month. But I think if you look at the fact we've got Gartner and Forrester, like out there, the big analyst firms talking to senior executives, going what you need as a platform team strategy. And actually, I think producing really, really good work around this stuff. We're seeing this happening more and more that people are realizing this is the way to actually scale out. And so I guess my quick note to those of you who've tuned in, is that I actually think if you're specializing in being a platform engineer, in being a product manager for a platform team, if you are succeeding at using a platform well inside an organization, this is a very good step for your career right now.

Kaspar  19:28

So let's zoom in a little bit on Team Topologies. I mean, we're both friends with Matthew and Manuel, we both really like their work. Can you explain in a little more detail, as this is aspirational for most, how do we help people tip their toes into this world? What's a good way of starting with this? And is this something that can help us to overcome being stuck in the middle situation?

The platform team model by Team Topologies

Nigel  19:59

Absolutely, this is the way to succeed at it. So there's a few things I think you want to get in place before you get really excited and start following the platform team model, which is one of the core ideas around the Team Topologies model that we investigated this year: Does my team have a clear purpose? And if you have a clearer understanding of its responsibilities to other teams? And again, this sounds like really basic work. But one of the biggest inhibitors to success I see inside large organizations is this. Like, I don't know what that team does. I'm not sure what this team is meant to deliver for that team. Where are the responsibilities? Where do we say yes? Where do we say no? And it turned out, when we asked folks, we saw that, more than twice, highly evolved teams more than twice as often as low evolution teams said, my team has a clear understanding of our responsibilities to other teams. My team has clear roles, plans and goals for their work. And teams next to me have a clearer understanding of their responsibilities to my team. And this sounds like really basic stuff. But I think you can't actually achieve a significant and substantial transformation until you've actually got these things in place. And particularly, I think, when it comes to a platform team and value stream teams, you want clear responsibilities, and what are the developer teams actually allowed to do? What are we making easy for them to do? What can they choose to use on the platform versus opting out of those sorts of decisions, I think, are almost impossible to make. Unless you get the first groundwork going, what is my team's responsibility? And what are the different teams' interactions with each other? So we saw this over and over again, through the report, we asked a whole bunch of questions about what kinds of teams, that particular thing I'd say to is, when you look into the report, you'll see that there's a whole section on what sort of teams are there. And we have a whole bunch of plain language descriptions of those things. The higher evolution organizations have a smaller number of team types, and they tend to follow the Team Topologies model, the folks at the lowest levels of evolution tend to have every single different kinds of teams with unclear responsibilities between them.

Kaspar  22:10

Okay, so hey, is the new cloud operations team. Hey, there's a new SRE team. So we have one question from cedar. How do you prevent a platform team from becoming yet another silo that needs to be broken down?

Nigel  22:25

This is a fantastic question. So I'd say one of the things that's really key. So let me let me first start with platform is one of the most overused and abused words inside tech. And we're pretty good at doing this to words in tech. But I think when we're talking about the platform team approach, there's a few things that are really key. One is that primarily, the value you're delivering to your users, internally or externally, is via self service. So you spend all that time collaborating with your users in the design phase of what you're providing both self service, but then collaboration is expensive. It's not actually the best way to actually, you know, do efficient at scale interactions between teams. We know delivering things as a service via an API is actually a more efficient way. So what we say when we're talking about a platform team approach is that you need to have a product mindset, you need to be thinking about your users like a market, like what are their actual problems? How can I build something compelling, that will actually solve their problems for them, it's not about just providing raw access to infrastructure. So if you're doing this, well, you're building products in the way we know how to build products these days, which is you get early feedback from your users, you make sure you have feedback loops between the people who are producing the products and the people who are consuming them. And if you do all of those things, and particularly try and get a community of practice going amongst your users, and that feedback comes back to the platform team, then you won't be siloed. But I'd say this is an approach I've seen happen more than once, which is where the platform team becomes itself another silo because they're not taking that product mindset into account and going, what do our users actually want? And what problems are we actually solving for them?

Kaspar  24:11

And so the, that's exactly we're back to this pattern, if you don't have good communication, right? And if you don't nail the cultural elements, you can do you have to have the best automation in the world, but .... And I myself have seen so many examples where you have a really restricted, like platform team that implements I don't know OpenShift. But you can't deviate from anything. And then it works great in 95% of the cases, but the 5% are just a pain in the ass. And you have to work through this and you have to ping central ops and it takes three months and you go nuts. And then what happens is so interesting. Everyone starts to build workarounds and come up with things like this way or this way to circumnavigate the whole thing and it completely collapses. And another reason Interesting thing that I've seen and that we've, the two of us have also talked about is how you position this right? Are you giving golden paths? Or are you giving golden cages? Right? And your pitch is abstracted, like, take it or leave it? If it doesn't work, I don't know. Don't write your service in Python. And the golden path is, hey, this is what it is like we've developed it together. Yes, you can deviate. This is what's happening in the hood. It's not like a black box, it will, I don't know that the output is YAML. And this is how it works. And we've developed this together, we've done this hackathon. And what we're seeing in the data, and I find that so interesting, is that in 97% of the cases, if you have a golden path approach, in 97% of cases, teams are just following the golden paths. Maybe it's exactly right. But they're following it because they know, hey, they're not abstracting me away. It's not a golden cage, I could change that if I want. But I get this set of set guarantees. To really know that I was, I had this great conversation with Jason at GitHub. That's the exact approach that they're bringing. And they're measuring the way that they're successful by measuring the thank yous that they get from the developers and all of a sudden you have this really conversational model. And it works very well.

Nigel  26:25

Yeah, absolutely. And it's funny your answer there segue little into the question I see Daniella has in the chat as well, which is around advice for product managers. And I think, one to touch upon what you were talking about is, you want to make the golden path like go here is a bit of functionality, we want to make this thing easy for people. But I think particularly when you come from, I think a larger organization with IT mentality, we can easily fall into the trap of going, we built the thing, you have to use that or else. And I think one of the things, if you're doing platform as a product well is sort of letting go of things a little and going. My users are highly technical, we're giving them access by API's and platforms and things they can write code on, they're going to write code, they're going to like use it in ways that we didn't actually expect. And the whole goal of having a community of practice and feedback loops is sometimes those ideas will be really, really good. And you should pull them into the platform team and go ah, the you know, the golden path usage has dropped to 60%. Because someone built this other thing that's living in their repository that everyone's using, let's pull that into the platform. So I think that's one of my top two bits of advice for platform product managers is let go a little and allow your users to come up with their own solutions. Keep an eye on them, because that's in many ways, the top of the funnel for your ideas for new features. The second would be, and this doesn't come naturally to a lot of folks I see in the platform product manager role is, your job is evangelism. Like your job is not just defining what's going to be built, making sure it's built well and going, yep. Okay, I'm done now. But you probably don't have a marketing resource like a real product team has. So you've kind of have to take on that role internally. Because what we have seen over and over again, is mandating these platforms and going, Hey, everyone, you must do things this way. Just gets their backs up, and they just don't actually use it as effectively. But if you go around, evangelize and go, Hey, we've listened to you all we know this is a problem you have. And we're actually going to go and solve it. Now. He looks like we've solved it, what feedback do you have, you get a very, very different reaction. But that evangelism role is not something I think that comes naturally to a lot of folks in ops. But I think it's absolutely critical. And if you're in that role, and the platform product manager and you that's not your sort of natural way of being, like go look for someone who can operate as your essentially internal Developer Advocate, who can go out and evangelize and talk to your users and get them using your stuff.