The quest for seamless infrastructure management has been a theme both in my career and the dynamic world of software development. Today, I am the co-founder and CEO of Zeet, an all-in-one developer platform. In this article, I’ll outline how I got my start in the world of developer platforms.
My first tryst with platform engineering began in 2016. A family friend of mine worked at a small startup called Bebo, and through him, I was able to land an internship while still in high school. Bebo aimed to build a revolutionary E-sports solution for high schoolers worldwide. However, the dream came with unique challenges, particularly concerning our infrastructure requirements. To cater to global gamers and maintain an optimal user experience, we needed robust live video infrastructure capable of low-latency video ingest and processing. We also needed to 10x our ability to handle load.
The problem was relatively well defined—more infrastructure in more places to meet our growing demand. Starting in May 2018, we got to work, however our attempts to scale globally hit a snag when both of our SysAdmins retired in the same month, leaving our team in need of expertise to manage the expanding infrastructure, or so we thought.
Enter me, their high school intern turned Director of Ops. Bebo’s product only worked on low-latency connections, meaning we still needed to have those points of presence (PoPs) in regions around the world. We could manually stand up our stack in data centers around the globe, but that was untenable with such a small, infrastructure-ignorant team. All this in mind, my boss gave me a simple instruction: automate our infrastructure scaling.
With all this in mind, I was determined to approach the problem with a software engineering mindset. If I could create a template for a new region, I could write software to deploy this template over and over, no matter which cloud provider we were working with. It was a novel concept then, but I resolved to create a platform to handle all the complex cloud orchestration involved in scaling without slowing our dev team down.
This plan would ultimately end up being reminiscent of a developer platform today. When our engineering team needed to spin up a new region or deploy code to existing regions, they would use the platform to perform these operations without me. After some collaboration with the team on how best to do this given the technology of the time, I got to work.
The birth of an Internal Developer Platform company
Driven to solve our infrastructure challenges, I standardized our infrastructure using Docker and containerization (Kubernetes wasn’t really a thing yet).
The rollout strategy I implemented was crucial in ensuring a smooth transition to the new platform for all of our developers. While we only had 8 engineers building, that was still enough moving pieces that this needed to “just work”. Whenever a deployment happened—FE, BE, ML, or otherwise—it would need to not only be deployable to new regions, but it also needed to go out to existing regions without any hiccups.
Starting with just one region, I validated that our new Docker strategy worked repeatedly before automating the process. Once confident in our automation’s ability to spin up a single region without hiccups, I streamlined the process by creating a one-click solution to spin up the region in a series of steps, further enhancing efficiency.
Before I could copy and paste this to automate the eventual dozen regions, there were two things I still had to figure out. As anyone who has ever tried to debug can attest, software without metrics and logging in place is not really production-ready. In today’s world of Kubernetes and dev tools for everything, there are dozens of ways to solve this problem, however at the time, meaningful engineering work was to be done here.
I landed on standardizing all metrics and logs so that we could keep track of everything in a unified way. With one schema, no matter where the logs were coming from, we could view and parse them effectively in one place.
Similarly, I needed to ensure that no matter which cloud or region we were deploying to, the application requirements were defined in a way that I’d get the same result as the first region every time. Using a JSON framework I came up with, I was able to define all our app’s requirements so that any cloud we deployed to could make use of them. IaC and yaml frameworks make this a trivial problem today, but again, in 2018 they were still in their infancy.
My work resulted in an Internal Developer Platform that completely changed our infrastructure management. The platform efficiently handled auto-scaling, provisioning of new nodes, and CI/CD deploys without human intervention. My ad-hoc Internal Developer Platform became the backbone of our cloud scaling operations, allowing us to scale to 15 regions and ingest over 30 terabytes of live video daily without ever hiring another SysAdmin.
Internal Developer Platforms today
In the years that followed, the concept of platform engineering transformed from an interesting idea into a full-fledged movement. The breadth of disjointed tools to handle cloud operations like Infrastructure as Code, Kubernetes, CI/CD, etc. makes a compelling case for centralization. What was once a problem of solving engineering problems, is now a problem of managing the tools that solve the engineering problems.
Yes, you can easily hire someone to manage your clouds and dev tools, but you can just as easily hire a tool to do that for you now. I believe that as more teams discover the headcount cost and time savings that a developer platform offers, developer platforms will gain popularity.
Through its evolution and the adoption of cloud computing, platform engineering has started to crystallize into something that everyone on the team can benefit from. It’s no longer just about faster, repeatable infrastructure deployments. It’s also about making everyone on the team more productive and more compliant.
Infrastructure engineers are often becoming “platform engineers.” They not only maintain the pipelines and infrastructure that developers use, but they also create pre-configured infrastructure templates that adhere to security and compliance guidelines set by their organizations. With these templates, developers can deploy without having to worry about being out of compliance or roping in an infrastructure engineer to shepherd the deployment. This process is pictured below.
Over these last seven years, platform engineering, once a novel concept, has become the bedrock of many of the largest cloud-native teams. Doordash has a whole team dedicated to supporting their developer platform, and Spotify’s Backstage has been a boon to many teams, internal and external. Maybe more interesting is that many ex-large company engineers who start their own companies choose to deploy their first line of code using a developer platform.
Anas Abou-Allaban, co-founder of Tarteel, a company that started on a developer platform, summed it up best: “I've worked at AWS, and I’ve worked in AWS, and I prefer to use a developer platform. At the end of the day, there are just too many services, tools, and resources you need to learn and use. With a developer platform, I don't need to know them all. I just focus on deploying.”
Would we have started our company on an Internal Developer Platform in 2016 if we had one? It’s hard to say. We were a scrappy team that iterated quickly, regularly scrapped weeks of work, and liked to have control of every aspect of our infrastructure and software stack. That said, the platforms of today are less and less about doing everything for you and more about allowing you to focus on high-value work.
My early platform held your hand every step of the way because that was the requirement; when we needed a copy and paste of our stack, it was possible at the click of a button. Today, however, developer platforms are more about making developers more self-serve and allowing infrastructure engineers to focus on higher-value work. Having the whole team working under one roof and moving faster likely would have been enough for us to make the jump.
This article was based on two talks I gave at PlatformCon 2023. Check them out 👇