Platform engineering

How Netflix unified their engineering experience with a federated platform console

Brian Leathem
Senior Software Engineer @ Netflix

Most developers' day-to-day is inefficient primarily because of the dozens of fragmented services and tools they use to build, run and scale applications. The inefficiency inadvertently leads to lost productivity. For small companies, the fragmented developer experiences might be tolerable, but the need to unify them grows as the business grows.

Led by Brian Leathem, developers at Netflix, which had adopted a microservices architecture early on, found that this approach was becoming too fragmented as the platform tooling grew. They needed to unify developer experiences across the company's Software Development Life Cycle (SDLC). Brian Leathem shared these insights during PlatformCon 2022.

To unify developer experiences across Netflix's SDLC, the Platform Experiences and Design (PXD) team at Netflix decided to build a federated platform console. The Netflix federated platform console is a one-stop shop for all the tools engineers need to develop and deploy software at scale. It consolidates the dozens of services and tools developers use into a single, easy-to-use interface.

Through this console, Netflix hoped to solve the main fragmentation challenges it had identified through interviews with developers, which included:

1. Managing multiple services and software

There are too many tools that developers have to work with daily, making it challenging to develop, deliver, and operate services and software. For instance, it is not unusual for a developer to use Bitbucket to review poll requests, Spinnaker to check on their deployment pipelines, Jenkins to check on their build failures, and internal alerting metric tools to check their operational status, etc., throughout the SDLC. In addition, they will likely need to repeat these workflows multiple times.

2. Platform discovery

Product service owners at Netflix have created tools and documentation for developers, but many developers don't know the tools exist. Any developer will not immediately know the many tools and documentation their teams are using and might find themselves relying on tribal knowledge passed along to new team members. On the other hand, a developer that has been around longer might not know about new tools that have been added to improve their daily workflows.

3. Switching contexts between tools

When developers need to use multiple services and tools, they need to switch between them contextually. This can lead to inefficiencies and errors, as the developer might forget what they were doing in one tool when they switch to another.

It was clear to the PXD team at Netflix that the developers would benefit from a platform console that would serve as a common front door, giving them a single place to view and assess the status of their services and a launch point from which they could discover and reach the tools necessary to manage their services.

The Initial Console Platform Concept

The PXD team wanted to leverage their success with GraphQL Federation in the Netflix Studio division to build the console's backend architecture. GraphQL Federation allows users to spin up a domain graph service (DGS) that exposes their service as part of a single federated graph accessible by a federated graph gateway. When the gateway handles a request, it delegates to the appropriate DGS to fulfill all the fields referenced in that request.

The team's investment in a GraphQL-based platform API can power not just the new platform console but many other experiences, including more dedicated UIs, CLIs, and Slack bots.

For the front end, the Platform Experiences and Design team wanted to federate the solution across the many platform teams and services they would bring together in the platform console. They understood that the scope of this effort would not be realizable by a single team; they would need to leverage both domain expertise and the code contributions of the platform, providers, and partners.

Leveraging Hawkins, Netflix's internal design system

PXD's first stop was Hawkins, Netflix's internal design system with over 80 applications. These applications power Netflix content production, from pitch evaluation to financial forecasting and asset delivery. Using Hawkins across all platform products would enable more cross-tool workflows and provide users with a consistent experience.

Designed to be reusable, configurable, and composable, Hawkins provides a consistent user experience across all applications in the suite, reducing the learning curve for users. It allows engineers to reuse components, toolsets, and design patterns, improving efficiency and reducing costs.

Surveying existing open source and proprietary solutions

Leathem's team didn't want to go straight into developing another developer portal and service catalog to put that on top of their platform API and design system. They first went evaluating available open source and proprietary tools that could solve the problem. They considered the tools based on the needs and expectations of both the end user and platform partner providers. The team ultimately determined that Backstage was the tool that best suited their usage scenario.

Backstage, Spotify's open source developer portal, made sense to the team for several reasons:

  • Backstage's loose coupling between the frontend and backend would allow the team to easily integrate their existing backend solutions, including the federated GraphQL.
  • Backstage UI technologies aligned quite well with PXD's expertise.
  • The Backstage plugin is lightweight and unobtrusive.

Backstage versus building a developer portal from scratch

How would using Backstage, an existing open-source tool, compare to building a bespoke in-house solution? To answer this question, PXD evaluated Backstage based on its functions and respective elements. Using Wardley Map, they assessed whether it would be better to invest their developer resources into creating an internal tool from scratch or build on top of Backstage.

“We identify the various components of our system and locate them vertically by how much they will impact the end-user experience and horizontally by how much they're commoditized in the industry. Components are broken down into their constituent parts, pulling out the pieces that are commoditized.” — Brian Leathem.

Wardley Map is a visualization technique that compares the speed of an organization's development to a value chain and yields insights into how crucial custom UI components are to business success. The team discovered that the most critical value addition component was custom UI components, which were made by nailing down features, functions, and characteristics unique to each platform. They realized they were better off investing development resources into creating custom UI components rather than rebuilding the plugin and core APIs available on Backstage.

A connected experience through a federated platform console MVP

The initial goal of developing the platform console MVP was to build a connected experience with a common front door for developers to view and access the state of their project across the SDLC, then link to existing tools within the console. The plan is to, over time, upgrade the console from a connected experience to an integrated one and, eventually, to a platform where all the organization's satellite tools are fully managed.

When designing and developing these plugins, the team was careful not to simply lift and shift existing experiences into the console but to take the opportunity to rethink the experience and the value the user would get from the data. To address the problem of managing multiple services software, they introduced the concept of collections. With collections, users can group a fleet of services to view and assess their status together.

Paved Roads bridges the gap between product knowledge and the engineering process with a centralized repository for product information and an organizing framework to assist engineers in finding the proper tools for a specific problem.

The console also includes the ability to kick off bulk mutations for the services in the collection, a concept that does not yet exist on any other platform tools at Netflix. To tackle the platform and tools discovery problem, Leathem and the team came up with Paved Roads, a concept designed to pull together product documentation into a single location and organize it in a way that helps engineers find the best tools to use for the challenge they're trying to solve.

User adoption and feedback

The PXD team has been rolling out the MVP, getting it in front of engineers at Netflix. The MVP has met the goal of providing that minimally valuable feature set that users can use to view and assess the state of their software. However, the team has discovered through user feedback and new research that this consolidated functionality alone isn't enough to draw developers in and break their established routines and habits around existing tools.

As a result, the team is looking into driving users to the platform by inserting the console into existing workflows or creating new workflows altogether. They also hope to enrich the console with new functionality, hoping that users will develop new routines and habits around the platform console and organically add it to their toolchain

Summary

Here are key takeaways from Brian Leathem's presentation at the virtual PlatformCon 2022 conference:

  • A federated platform console can help unify the Netflix engineering experience by providing a common front door for developers to view and access the state of their project across the SDLC.
  • The platform console MVP that Brian and his team have created has met the goal of providing that minimally valuable feature set that users can use to view and assess the state of their software.
  • The team is looking into driving users to the platform by inserting the console into existing workflows or creating new workflows altogether. They also hope to enrich the console with new functionality, hoping that users will develop new routines and habits around the platform console and organically add it to their toolchain.
  • The team acknowledges that simply consolidating functionality is not enough. To succeed, the platform console must be integrated into existing workflows or create new ones valuable enough to developers to break their established routines.

The 2022 PlatformCon virtual conference was made possible through support from some of the strongest brands leading the platform engineering revolution, including Google, HashiCorp, Puppet and Humanitec. More than 70 different platform practitioners shared their experiences through over 20 hours of recordings – all available for you to watch on the Platform Engineering YouTube Channel.