Platform engineering

5 things to consider before building your internal platform

Chris Ford
Head of Technology @ ThoughtWorks

Talk Transcript

Christoph  03:28

Alright, and so let's get started. Today, we are going to talk about an article I think that the two of you wrote, it's a bit funny because this is the three Chris meetup. So let's see what you create some confusion with that. But anyways, yeah, you wrote this article, Mind the execution gap, which I really like. And we had a short discussion last week where you told me that one of you has a more optimistic view on platforms and one has a more I don't want to say pessimistic but skeptical view on it. You didn't tell me who is who so let's see whether we can figure that out today. I'm pretty sure we can. But why don't we start with the two of you introducing yourself and maybe already answered the question, why did you write this article? Because if you go out there are probably millions of articles on platforms and internal platforms. Why did you write the one million and first article on this?

Chris  05:13

Sure, well, I'll start. And I'll try to claim Chris without a qualifier. If I can help you with AI, that part of the namespace. So I'm the head of technology for Thoughtworks Spain, and I've worked with Thoughtworks for more than 10 years now, and in many different countries. And so part of the privilege but also the heartbreak of a consultant is entering many different situations, seeing how things go trying to support it. And then, you know, moving on and doing something else. And one of the things that I noticed over my career that made me enthusiastic about the concept of platforms, is places where people seem to spend most of their time kind of solving the same problems over and over or it'd be sold to one team, and then another team would have to solve it again. And it seems like somehow this noble spirit of development team freedom, which I definitely believe in and endorse, was somehow leading to a situation where people were doing busy work, or non value added work, or not really connected with the problems because everyone was doing the same thing over and over again. And so I guess my original interest in the platform topic came as a way of somehow avoiding that waste of human time and potential. That was how I originally got to that one. And eventually, that led to me and Christóbal having many conversations about exactly how one should approach this to get the promised benefit. And that's where the article came in. Who are you Christóbal?

Cristóbal  06:49

Well, I'm Christóbal. I'm principal consultant at Thoughtworks Spain. Basically, that means that I'm a little bit far away from the code. I was talking to Christoph. I'm going to PowerPoint these days, which is a little bit boring. Well, you know, and I've been working in operations teams, in infrastructure operations teams, and in software development, so I have a little bit of both patients, division of product, and the product operation. And yeah, it was an interesting journey, discussing with Chris about the benefits from the platform and the risk associated with them, both technical and non technical, right, because many of the risks are not necessarily technical, or operational, organizational, corporate, and that was super interesting. It was super interesting to share with him my opinion on that, and we were, I think we got to deal with understanding each point of view, which was really good.

Chris  07:51

Because I think the actual specific genesis of the article was, you know, I might have been extolling how platforms address all these problems, and you know, fix these problems. And, one of the things that when we got talking, we recalled was, I don't know if you're familiar with this, but Martin Fowler wrote an article, like seven or eight years ago, called you must be this high to use microservices. And it kind of follows a similar principle, right? So okay, maybe you might expect it as a consultant from Thoughtworks has some enthusiasm for the benefits of microservices. But I also acknowledge that the adoption of microservices, certainly at one point, its most intense pitch was driven, at least partially, by fashion. And that was something that Martin Fowler noticed as well. And so his response was to say, well, it's not, it's a trade off, it's a different point in the design space that can get you benefits. That doesn't mean it's a silver bullet. Famously, there are no silver bullets in software. So his response was to say, Well, if you're going to do this microservice thing, there's a bunch of other stuff you have to get right, if you want to reasonably expect it not to blow up in your face. So you know, if you can't, as an organization, invest in observability, or, you know, deployment, the fact that you've broken things up into many, many small pieces is going to hurt you, not help you. And when Christóbal and I got to thinking about platforms, we felt there was a similar phenomenon going where it's true that platforms, I think, have enormous potential to support different bits of values in organizations. But that doesn't mean that you can get those benefits for free and without solving the technical and non technical questions that Christóbal mentioned just before.

Christoph  09:31

Yeah, yeah, there's no such thing as a free lunch, right? Just doesn't exist. Maybe. I mean, you structured your article, I think in five broader topics, which is business value, product thinking, operational excellence, software, engineering excellence and healthy teams, right and maybe, maybe use this structure to make sure we cover all these points. I found it interesting that you started with business value, because I've seen a couple of platform teams that start more on the solving and engineering problems side, then solving or creating business value. What's important there for you ? Why start with that topic?

Business value of an internal platform

Chris  10:11

Well, you said why starting with this topic and the word Why is the cave point for me. So I do think that there can be a bit of an Ozymandias scenario where people build a platform because they're thinking about it as a technical monument to their brilliance, right. And this is a kind of abstraction that developers have fallen in love with over many years. So you know, they might have tried to build frameworks, or if you go back far enough, they built their own programming language, because they thought that if I could just build the perfect piece of infrastructure, then when they came to actually pay attention to the real problem, everything would just fall into place, right. So I think that starting with a purpose is really kind of important. So if you don't know why you're committing to a platform, and I use the verb committing consciously, because I think building a platform has a little bit of a fallacy in it, because it implies that the original construction dominates the effort, which is not true. If you're committing to a platform, you're going to build it, you're going to maintain it, you're going to improve it, you're going to be there in a couple of years when there's a problem with it, and you've convinced your company to bet their future on it. If you're going to do that, you should have an argument in mind as to why the benefits outweigh the costs. There's going to be a lot of human effort and late nights and backlogs and you know, yandell going into concocting a platform, probably depending on how thick or thin you go into it. And I think it's part of our responsibility as professional engineers, that we should be able to say, we are going to cost this in order to deliver this benefit. So the benefit might be our product teams are spending too much time reinventing the wheel. So we want them to focus on delighting the customer. So we should say we should have a platform to reduce some of that repetition. In that case, we've created a means by which we can judge our success, we can't just say we're successful, because we used to have zero platforms. And now we have one platform. So tick, we have to say Okay, before we had teams who were reinventing some pretty basic stuff, or reintegrating with our own internal authentications services, or something like that over and over again, and now they're not. And so by starting with a business case, by which I do not mean 100 slide decks, points out, you know, I just mean an argument that you could reasonably explain to another human being when you're talking about why you're going into this. If you don't have that justification, I find it hard to say that it's responsible to start inviting people to build their stuff on a platform. Because maybe if you haven't got a solid foundation, maybe your organization is going to lose interest or be defunded or the people are going to go elsewhere. And it won't have created not a benefit to your colleagues, but something that causes a problem for them. So that's why I think, you know, this business value, business case, justification for what you want to achieve is a prerequisite to succeeding at achieving it.

Cristóbal  13:28

Something that Chris was pointing out, well, that there is no need for this platform to deliver an immediate value. And this platform is going to last for this component is going to last for a few years, right. So you are going to need to engineer it, you are going to need to deliver it, your organization needs to adopt it. For all this time, you are not bringing in, that's not a very compelling business case. And you're getting zero value until this happens. And after that there was this study, I think it was this article that the people at CMU were discussing. And by the mid 2000s, the Navidad system lasted for seven years in a company. This is like 15 years ago, I don't think he would have tended to keep it for seven years. So whenever we say we are going to deploy that we need to think more or less, this is going to be 5, 6, or 7 years. And we will need to maintain it for that amount of time or half of it at least to get out of it before.

Chris  14:30

And what's fascinating about that is the conditions, the technical and business conditions are not going to be held in place exactly for the life of that platform. This is kind of what makes it actually kind of cool to think about platforms and so on, because you're making a bit of a business bet over what will happen in the next future. So you have to think well, Amazon, GCP, they offer these kinds of services or maybe you know Kubernetes out of the box lets me do these things. Now. I have a requirement or a need that goes beyond what I can get with those industry standard tools, so I'm going to invest in some extra capability or some special source that is about my organization. But I'm also making a prediction or a bet about the habitat technology landscape will evolve over time, you know, will Kubernetes stop being fashionable, and suddenly, there'll be some other technology or will, you know, as you are version of Fargate, that will be so much more convenient than what I've actually provisioned to my own colleagues that it will go from being a competitive advantage to a disadvantage relative to the state of the art. So that I think that kind of future gazing is kind of cool and fun, also a little bit dangerous, because depending on what happens in the future, the business case, or the justification you have in your own internal platform that isn't just stuff that's available on the market, can be affected or even sometimes eroded.

Christoph  15:59

Yeah, I think you have to stay flexible in order to adopt these new technologies. Maybe everything will be serverless in 10 years. Yeah, you just don't know yet. And I love this maintenance point. I think that's something that's easily overlooked. It's fun to build something like that. And you'll probably be able to gather a team of your best engineers that are interested in solving this problem. But at some point, you'll have to maintain it. That's a very different story.

Chris  16:26

When you say best engineers, I think there's a very famous division of different kinds of engineers, called pioneers, settlers and town planners. I think that Simon Wardley came up with this, so we say the best engineers, but it's also the pioneer engineers. So the people that are interested in new stuff. Yeah, exactly. So the question is, once it exists, once you've created this service that makes it really easy for you to spin up a new micro service, you know, built on top of, you know, creates the instances and the storage and the compute or whatever, will the challenge of living with that making it robust, stand, you know, pushing it back up when it falls over at 3am, due to some unforeseen DNS outage? Is that the same set of people who want to do that? Because I'd like to think so. Because I'd like to think that people mature and want to live with their successes and their mistakes. And that, I can't tell you that might be both. But I'm not sure that necessarily there is. I've certainly seen more than one client situation, the problem of kind of engineering talent flight, after the cool thing was done. So the building bit was done. And that's the first very small part of the entire lifecycle of a platform. And then they went to look for other challenges and build other platforms at other companies. But then it was quite hard for the organization to find the kind of the long term committed infrastructure platform engineers to keep this thing alive and thriving for the other six of the seven years that Christóbal mentioned.

Product thinking is key

Christoph  18:10

Yeah, yeah, absolutely. I can, I can clearly see that. So that's business value. Understood that. The next thing you mentioned is already: product thinking. So yeah. So let's jump right from business value to product, maybe Christóbal. Why is product thinking important when building a platform? Because in the end, it's an engineering thing, right? Why do you need product?

Cristóbal  18:31

Are we building something? Well, we were discussing that before, right. And we were building something because it's useful to someone in the organization. We want, in fact, what the authors of Team Topologies say, right, they define a platform, a toolset or component that allows delivery teams to deliver fast, right? So we need to understand their problems. I love this analogy of this monument. It's not a monument to our competence. It's something that is actually useful for them. And it's a complex problem, and how we understand complex problems with what we know it's the product mentality, which is feedback loops, verifying the assumptions through usage, data based decisions, and this kind of stuff. We are not just building because after that, they will come. And having these data available. Having these feedback loops in place requires some organizational decisions, right, first of all, having this point of contact between the teams, the delivery teams and the platform team, then the mentality or the mindset, to build incrementally, to understand the problem little by little. And also to be sure that for the developers or the engineering teams to use our platform, they probably are going to need something more along the lines of a power grid. Instead of a new social network with shiny new features every day, they are going to favor stability and reliability over new buttons and features, right. And that's something we have seen. And because of the same reasons that Chris was mentioning before, it's not always easy to get this mindset of, we are going to build a power grid. And it's going to be super simple to use and super reliable, with a very clear interface, very conceptual, very easy for the developers to grasp. And then we are going to deliver it in a reliable way. And we will, we can talk about reliability later on.

Chris  20:40

I think, in some sense, you could say the product thinking challenge of a platform is greater than if you are building something with buttons that your end customers are clicking on. Because the argument you need to make as to why what you were doing was the most important thing you could have been doing that day, is a little bit easier. If you have a customer who's like, I'll pay you money, if this button does this extra thing, right? It's that there's a very clear line to be drawn. When you're saying, If I create this new way of provisioning servers, that will make it easier for another team to get their ideas to market. So they can put a new button in front of customers that they'll pay money for, it's actually just a more complex argument. It still ultimately has to be grounded in the end users of a company or the customers of a company, liking something, assuming that you're in a standard business environment, because if it doesn't, then you haven't really delivered a positive return on what you're doing. But it's just that much more indirect. So what I might say is that, as well as kind of what Christbóbal said about empathy with users, it's also just the the extrapolation of how value flows is more complicated and less direct, and needs a maybe more discerning thinker in a platform environment than it does in an end user scenario. So you know, I wouldn't want people to think it's a smaller challenge, because the value is less obvious. No, because the value is less obvious. It's a greater challenge to keep your eyes on it. 

Christoph  22:10

Yeah, I think it's actually a huge challenge, because also the pioneers you mentioned earlier, like the engineers that are interested in building this, I'm sure they have an idea on what they want to build. Because they know their problem very well. So I think you're under danger that they did something that's fancy for them. But then if you look at 98% of the remaining organization, it might not be useful for them. So I think that's super dangerous.

Chris  22:36

Yeah, I think somebody in the chat, Rob mentioned, like adoption is hard, right. So, you know, history is littered with cases where the best technology didn't win, right. So you know, Betamax, or whatever, there's all sorts of examples you can find on, you know, random blog posts. So just saying, my platform actually is better than your AWS or GCP doesn't instantly mean that it's going to be adopted by everyone in your company, and you're going to get the big bonus, and eventually, you're going to succeed the CTO, right? Because actually, you still need to do that outreach, you need to communicate the ideas of the platform, you have to maybe compete with the fact that the raw cloud providers offering, have marketing budgets that are many orders of magnitude greater than your ability to, you know, to send out emails or the latest feature of the platform to your company. So you know, I think one of the most important skills of a platform team is that human connection and facilitation skills in order to address the adoption challenge Rob was mentioning, and I think in the past, there has been maybe even Christbóbal has encountered this as an infrastructure engineer, there's been a stereotype that people working in infra can afford to be antisocial. And, you know, you  create a basement office or whatever, I think, okay, maybe there is room for some people who are technically brilliant and antisocial within the scope of a platform team. But as a whole, you're not going to get adoption, or cooperate well with the rest of the people in your company, if you don't have highly emotionally intelligent people with good communication skills, doing that kind of product evangelism within their own company.

Cristóbal  24:26

We are building for the global experience, right. We are going to understand the developer experience by talking to the developers, talking to our customers downstream. And this is super important. In the article, I think we mentioned the role of the development relationship, professional, that is able to understand, that it's able to convey the needs from one side to the other and ensure that your core work is meaningful for them, right. So that's super important.

Chris  24:54

And that that pursuit of meaningfulness might lead you to focus on different features than if you were focusing on runtime they just write. So maybe what you need to do is to create a command line interface to make it really easy for people to use your internal platform, right? That that might be solving the problem of the barrier to adoption of teams, because they're, like, confused by it or no, they want to be able to use of quickly, if you are just focusing on the runtime technical system, you might think that what you need to do is focus on, I don't know, making it cross region in whatever cloud provider you're using, or something like that. So if you're genuinely using products, thinking, you're being led by the needs of the beneficiaries of your platform, and that can sometimes lead you in surprising directions, which is a result of the good.

Operational excellence of the platform team

Christoph  25:45

Yeah, it's really in line with what David said in the chat, right? I mean, try to solve that. If it's not featured, I think that's exactly the spirit you need. If you build that. Now, let me be moving from there to your two topics of excellence. Because I understand you need business value, you need product thinking with everything that comes with it. And then you have these two accidents, points, operational excellence and software engineering excellence. What's so important about these two?

Cristóbal  26:15

I guess I can summarize this. This is something that happened to me many years ago, many? So I was working in the infant engineering team. This is in the old days, there were no cloud providers. I recall that I was dropped into a team in the middle of the fiscal year, in the middle of the exercise. And suddenly, I realized that one of the systems was business critical, and had no capacity, right? I was talking to the folks and said, Okay, now we have no capacity. And I suddenly called one of the people I was reporting to, and said, well, I need to have these components, these are numbers, that will need you to sign a purchase order for 1000s of euros. I would need to have this done, ASAP. And the answer was shocking. That was? Okay, so can you send me a business or business plan for that? So I was thinking, well, he's not understanding. I'm not in fact, this was me who was not understanding, right. So I didn't understand the situation. And there were some expectations, on reliability, on capacity, on demand, that had not been very welcomed. Costs and risks have not been very welcomed, right. And this was an engineering team. And this is something that people build platforms, this is something that can happen to teams building platforms. One day, you jump from one license tier to the next license tier, from one capacity tier to the next capacity tier. You need the double of instance, because of computing instances, because you are successful, and then your costs skyrocket, or your services are not performing. And this requires operational excellence, you need to know how to handle this stuff, right? Nowadays, we have this operational framework, like IPL and so on that define a service, like utility and warranty and warranty is super important, is super important. Because it conveys to your users the risks that they are incurring on by using your platform, and they convey this risk downstream to their customers as well. They say if my platform is super reliable, I can be super reliable, if my platform is not super reliable, I would need to be less ambitious. And many decisions, product decisions are based on that, but are also based on the assumption that whenever an incident happens, for instance, we are going to be able to manage it properly. Right, not only not only to communicate it, but focus on restoring the service. And also after that, to understand the root causes or the control and causes will have to do with the genetics of the incident and we are going to put some plan to address them, and so on. Same as we said before: demand and capacity, event management, all these processes need to be understood. And this is not something that comes from being normal. It requires experience, it requires training. But definitely it's something that needs to be there. These days, since 2010, this has been the transition from the old management frameworks to the site reliability engineering corpus of knowledge that addresses that very well. But this is something that needs to be understood internally, in our company and in our team as well. And these capabilities need to be in place. What cannot happen is that as Chris said in the article, right, that you are not delivering your commitments regarding capacity or service or whatever, and you might not even be realizing it and unless you know how to manage this kind of stuff.

Chris  29:55

I think the key observation is that a platform is a multiplier, right? So in the positive case where you're doing something well, you're like, I have done a good job of making sure that there won't be an outage of the service. And now everyone in my company, in the happy case, benefits from the improvement I've made. So that replaces the old case where some teams did it well, some teams did it badly, some times didn't know it was a thing, they had to do it all, right. So in the positive case, it's a positive multiplier. But if you screw it up, your mistake becomes the whole company's problem as well. So like, the reason I think we wanted to include excellence in the article was just that we didn't want to make people think that platform was purely about hand waving as like Wardley maps, where we talk about commodification and slide decks and things like that, as much as the business and intention and design side of things is important. It does, in the end, come down to building a technical artifact with hard technical skills. And if your team who embarks on creating a platform, don't understand the fundamentals of operating a system in production, maybe they don't understand the networking, they don't understand incident response, like Christóbal said, then you actually have the potential to multiply a negative massively across the whole company. And I've definitely one of the pathologies I've seen with clients is fairly, you know, a medium stage startup where they embarked on having a platform team, but the people in the platform team didn't really have any skill or insight beyond that of the rest of the teams. So there wasn't really, you know, there wasn't really a good case to say that the people who were assembled in that team could make decisions that were robust enough for everyone else in the company to live with. Especially since once you start building a platform, you've got technical indirection to match the value indirection we talked about earlier, suddenly, you need to make it work for all cases, not just my three microservices. And because they didn't have that operational excellence, even though what they were doing, did have a good business justification. They cried out, causing problems, at least at first, before they you know, they thought of hiring, you know, SRE or operations people to round out the skill set at the team. So you know, I know I love talking about the concepts of platforms and waving my hands, but you need good old fashioned competence in operations, if you're going to be successful, not just a good business case.

Software engineering excellence

Christoph  32:28

And it's again, this eight year timeframe. Think mid-long term, because you don't only have to build it, you have to bring it into the organization, and then you actually have to run this stuff. And yes, teams rely on you. You're better there to fix problems. Does that also hold true for this software engineering excellence and or is it mostly operational? What's your gut feeling about that? What's your experience?

Cristóbal  32:59

I think that both of them are super important, that are the two sides of the same coin. So we want to operate the system in production, according to our promises. We want to know how to do it. But also when we build it. And these practices enable us to deliver software quickly and that matches customer expectations, because we focus a lot on feedback from having short feedback loops, right? Things like trunk based development, or pair programming, test driven development, the focus is having short feedback. These, if I can jump into my personal anecdote book, prove to be very useful as well in infrastructure. It's not that we need to suspend extreme programming or the practices of having short feedback loops when we enter into the platform world. Right. Recall, again, you know, the infrastructure team, we're working with data protection software that needs to be installed and operated across Europe. And I recall that the software has been traditionally installed and maintained manually. And it was quite painful, right? So in one of the releases, we decided all together as a team to automate all of this as much as possible. This is a number of years ago, I'm not going to say how many just to not embarrass myself with my age. But the investment in that was a team decision. The whole team worked on some of us trying to protect the rest of the team so that the rest of the work could be done, investigating stuff, trying to create engineering teams to install that software that was not designed to be automated. Patching servers automatically, understanding how these things work. And we needed a lot of time, but when it was done, testing new software, adding patches, adding users, performing any kind of request was super simple. A matter of minutes, versus a number of hours. Our feedback loop for testing, for delivering new features was easy, super easy, just ran our pipeline and testing, everything was working. And it was really an enabler for the team, right. So the same happens nowadays, with complex technical platforms, there are no limits. Working tests first might not be super easy, because of times in both creating infrastructure and building one. But if you keep that if you focus on that, if you put your ingenuity on building that, for the whole lifetime of the platform, this seven years ideas we have in discussing, you are going to be much more able to meet your customer requirements, because you are going to relate it faster and in a safer way. Right. So that's what we see, these practices apply in both worlds in one of them, admittedly, might be more difficult. But Keith Morris, for instance, has these three rules about infrastructure as Code (IaC) that we use. And we find super, super interesting, like the fact of delivering all the infrastructure as code, testing it continuously, and splitting it in small parts and can be tested again separately. And that's something we advise to all interested groups. If you follow the source number, your systems are more decoupled. And you can have shorter pipelines and faster feedback.

Christoph  36:38

Yeah, that makes a lot of sense. And I think also trying to cut down time on processes that are manual and take a lot of time, but come in at a certain frequency. It reminds me of ticket ops, right in the end, because it's, it's, it's answering tickets, right. And this is boring. So if you can automate that your team has the freedom to work on things that really matter, and not do some manual patches based on tickets that come in or create new users, all these kinds of things, that makes sense.

Chris  37:10

I mentioned tickets. There's a good heuristic that says that self service is most of the time, if not all of the time, really important, both technical and non technical features of the platform. So I think on one hand, from a technical side, it holds you to a certain standard of automation and excellence, because you have to figure out how this stuff works, not just, you know, where the button is in the AWS console to add it manually, you have to really get to grips with that and embody your knowledge in code that's hopefully tested and certainly deliverable. But also to get back to earlier comments about but the product thinking, the adoption, that if something is ticket ops, it's probably going to struggle to look like an attractive option compared to direct usage of public clouds, which are, after all, automatic and you can do everything, you know, with with command line invocation. So if you're, if you're trying to convince people that your internal platform is going to solve their problems, and deliver them benefits that just, you know, going nuts with a credit card on one of the cloud providers, and then your process involves logging the a JIRA ticket, and then like, I'll promise to get back to you in three days, it's probably not going to be a compelling experience for them.

Healthy platform teams

Christoph  38:31

Yeah, yeah, absolutely not. And I, I love the saying that as a platform team, you need to build golden paths for your teams like you, you need not golden cages, you need to build like paths that if the team's follow this path, then they can be assured that things work. And I think without tickets, so self service, absolutely. I think that's, that's absolutely right. That's one of the major main goals you should have. Maybe looking at the time, maybe switching to the last of your five points that you mentioned, which is healthy teams. If you talk about healthy teams, which teams that you're thinking about are the platform teams or the teams using the platform.

Chris  39:12

So I think about the platform team specifically. And I think this point might actually be the most I didn't think it was going to be the case. But I think people's reactions, sometimes this is the most heretical of our points and other bits people are not along to. So I think in the application development world, over the past 10 years, there's been a widespread acceptance that the team is the unit of software delivery. And there's been all sorts of, you know, work from Google saying that psychological safety is one of the key aspects of a productive team. There's been books like Team Topologies that talk about how to manage the cognitive load of a team. There's lots of literature out there about what it means to be a tech lead, which might involve creating a good space for a team as a whole. I hope I'm not going to offend my platform brothers and sisters out there. But I think that the culture of senior people being heroes who do things, and carry a company on their back is much stronger and remains in the platform and infrastructure part of the industry, then in the application development. So what this leads to, I think, is, you know, on a human level, burnout for individuals who are the heroic people who save the company, so many times, and then like, you know, finally, they just kind of can't stand the impact that has on their personal life yet again. But also, if they quit, or they burn out, or have to take a holiday, the knowledge management of how to sustain this platform is lost. So to get back to the pioneers, you have a bunch of pioneers, maybe they're slightly individualistic people, they do all sorts of brilliant things, they stand out the platform, maybe they do manage to give it operational and, you know, software engineering excellence. But if they don't form a cohesive set of teams, you're not going to move into that successful into that settler and town planter phase, what you're going to do is that people who've built your platform who have managed to find some very good things to put on their resumes are going to be recruited by other companies to build their platforms. And you're going to find enormous gaps in your institutional knowledge of the platform, and you've asked your company to commit to this platform, right, they've got to build things on it, they've got to, you know, they've got to trust it. And you've got to live up to that trust. And if the teams working to provide the platform add, sharing knowledge, and preserving psychological safety aren't making room for junior or more junior people to learn the ropes and come up and replace other people who go on to other opportunities, you may find that you have a technically excellent platform, but the social ecosystem, you need to keep it running, can fall apart. And if you're one of those companies is like, okay, that's fine, we'll just hire some more people. Well, you know, good news for anyone listening, who is working in the platform and infrastructure space, but the senior people in those areas that you know, the salaries you need to attract these people are only going higher and higher. And it's not sustainable, to have a platform where you can't keep a unified group of human beings around, he'll keep it running. So that is actually one of the most common failure modes, not technical forever, but that of the personnel who work on a platform and not creating healthy enough teams and not enough opportunities for junior people to come in, and be effective within that context.

Cristóbal  42:54

I think having any team in which only senior people can be involved, it's probably an interesting red flag, right? That's how this team is going, is it sustainable in any part of an organization, but in a platform team, of course. Besides that, alsoneed to have healthy ways of working. They need to understand this collective ownership of the platform. But also they need to understand how to manage their own workload, like to constantly reduce the toil work that they have made to understand that decision. I don't deal with effort being able to discuss with a management of cognitive load and negotiate with and how to reduce it. So being in meetings on this table and discussing it. And they need to have some slack as well. Some slack time. I mean, who said that low systems work better, right? Super efficient systems that are a danger to themselves and onto others. A super efficient system has no no room for change, and this damage when potential would need to have some time to slack. And these things, collective ownership, psychological safety, diversity, and understanding of the wish of working are super important in multiplayer teams like that.

Christoph  44:21

Yeah. It will be super interesting to connect this discussion with you and Matthew and Manuel, the authors of Team Topologies. He said, bring this all together. I think that we'll be able to do this in 2022. Because I think there's a lot to discuss there, especially about health and team health, I can clearly see that.

Chris  44:41

I think that the cognitive load assessment that Manual and Matthew have produced I think is a really good exercise. And, you know, I've run this with teams who work in platform areas to work out whether something is manageable, because at least in my experience, there's plenty of organizations who at the application software level will say things like, ah, we can tell this is just too much. For one team, we need to split the monolith, or we need to create a new team to handle this. But maybe don't apply that same level of care about cognitive load to people working in a platform who may be sometimes spread too thin to effectively share knowledge. And maybe that then creates a knock on effect where it creates a difficult environment that only super senior people can thrive in, it kind of has a knock on effect, because you don't create healthy teams, therefore, you rely on really senior individuals, you don't create the environment, the healthy team. And you have these heroes that save the company. I saw that in teams in platform teams. Absolutely. And at some point, they burn out. No good for human beings ever to burn out. But also no good for the value to your platform. You know, if the burnout happens in year two of the seven that we're talking about?

Christoph  46:04

Yeah, absolutely. We reached the end of our 45 minutes that we planned for, and went through the five points. So I think  we did quite well. Thanks, everyone. Thank you.