Designing for the inevitable - shifting Kubernetes policies and resources

Keeping track of a bunch of shift-left decisions can be difficult. The inevitable starts to occur, and policies, resources, and people change. How can you design for this?

Ravi Lachhman

Field CTO @ Shipa

•

Published on

If you are a fan of Marvel, Thanos's [not to be confused with the Prometheus rendition] famous last words were “I am inevitable” right before Iron Man snatched the Infinity Stones. When you conjure up an image in your head about what a legacy workload would be, the first picture in my head brings me back to when I was leaving university, the large amount of Green Screens running on a mainframe that needed to get ported over eventually to web applications. As my career progressed, those very web applications that were shining beacons or freeing folks from the terminal themselves became legacy.

Legacy workloads are bound to occur in any organization. They don’t have to be running on a mainframe that was written before you were born to be considered a legacy workload. As platform engineers, sometimes we can focus on the development pipeline and envisioning and building faster and safer paths to production. Kubernetes with its declarative nature and human-readable YAML format for manifests allows all parties involved to understand what is being requested of the container orchestrator. Though as time moves forward, even those shiny new Kubernetes workloads inevitably start to resemble legacy workloads of years gone by. Let’s dig into what constitutes a legacy workload, how Kubernetes is not immune from these workloads, and how you can design platforms to support these workloads for years to come.

The L word, legacy

If you follow Gartner’s Bimodal IT definition, there are clearly two types of workloads. Mode One keeps the lights on e.g brownfield and Mode Two is experimental e.g greenfield. IT teams are split on how to support each mode since each mode has unique challenges and opportunities. On a platform engineering team, there is a natural affordance to want to focus on the Mode Two or building for the future. Technology certainly ages like milk, not like wine, and as platform engineers, we control the experience of both legacy and greenfield delivery.

Several years ago I was fortunate enough to catch a talk by Corgibytes, a consulting firm focused on modernizing legacy apps on all fronts e.g. from a technical and psychological side. They define a legacy application as one that is unattractive to work on. Reading the writing on the wall as part of my PlatformCon 2022 talk, legacy applications are those with low or no active development occurring.

As the initial surge of development happens, projects/products age, there will be less active development. As engineers start to transition off, the availability of expertise diminishes. The diminishment of expertise is typically viewed as detailed knowledge of the functionality of the application. Though with the amount of items shifting left towards developers, the expertise that they would have in those shift-left decisions also goes away. Kubernetes is a great tool in shifting items left; depending on your views, this could be a good or a bad thing. Kubernetes itself is not immune from having workloads that represent legacy workloads as expertise leaves.

Kubernetes, gracefully aging

Kubernetes at the time of this blog piece is turning eight years old. Kubernetes has been influenced by several platforms before being released to the public, so the initial release of Kubernetes was ideating for a while.

In years gone by, the Kubernetes and the CNCF ecosystem have been moving at an ever-increasing velocity. Kubernetes as a platform is designed to have its opinion be pluggable. Don’t like the CNI provider Flannel, then switch to Calico or vice versa. The explosive growth in the number of projects focused on two areas; one in which Kubernetes’s opinion could be changed providing alternative opinions or two areas to help Kubernetes scale in granular fashions. As Kubernetes moved into the mainstream, the velocity of new paradigms and opinions slowed slightly. Platform engineering organizations have a better footing on balancing the “bleeding edge” vs the “tried-and-true” approaches as the almost constant paradigm shifts slowed.

Today, it is not novel to run a workload in Kubernetes. The big wave of initial workloads being placed on Kubernetes or the re-tooling of existing Kubernetes workloads e.g. embracing a Service Mesh where there was not one before has started to dial down. Greenfield Kubernetes development as templates and archetypes are much stronger for development resources to consume. As in any platform, there are folks who are happy with your templating and folks who need a bespoke solution.

Kubernetes is ironically homogeneous and heterogeneous at the same time. YAML and Go are the defacto languages in Kubernetes. But depending on the combination of YAML and Go, your workloads can be described and run in heterogeneous fashions from other workloads. Those shift-left decisions that make the workloads heterogeneous are exactly the areas that need coverage as expertise eventually leaves.

Keeping track of (and hopefully enforcing) shift left decisions

Core areas of infrastructure that was managed by infrastructure teams are now showing up as Kubernetes manifests. Networking configurations are now described in Service Mesh manifests and storage configurations are now described in CSI manifests. As the constant march of shifting left continues to occur, sometimes with the ethos of self-service, why engineers made those shift-left/self-service decisions can be troublesome. Re-iterating the adage that software ages like milk and not like wine, a not-insignificant chunk of the application distribution and dependencies are most likely open source. For most of us, we will not have contributor-level resources in every open source project that we consume.

Infrastructure-as-Code [IaCs] and the vast amount of configuration-as-code are sources of truth, especially if you subscribe to a GitOps model in delivering Kubernetes workloads. Like anything in the platform engineering world, if you had one and only one service, this would be easy. An oversimplified answer to designing platforms to eventually support legacy Kubernetes workloads e.g when expertise leaves are designing platforms that keep track of the shift-left / non-functional decisions to the applications.

Here are common items of shift-left decisions to keep track of / enforce.

Image [container] choices - Taking a page out of manufacturing, bill-of-material management e.g. SBOM [Software Bill of Material] is crucial to finding what dependencies are inside your organization. This is especially helpful when a new vulnerability occurs, like Log4Shell. Also, basic pillars of having a sanitary registry stance; images are coming from an organizational registry vs the public.
Networking policy choices - Going back to if there was one and only one service, blocking a specific port like 6443 [this is the default Kubernetes API port] would be easy enough. Though with the rise of microservices and the ease of on-boarding workloads to Kubernetes, there can be lots of workloads to keep track of. Understanding the communication posture of the services is key.
Container storage choices - Not to be lost in the “who made this Persistent Volume Claim”, tying a workload or application to a PVC is helpful. Remember those orphan AWS EC2 EBS instances? They appear in K8s also.

With those major pillars being kept track of, being able to react when there is a security or resilience incident becomes easier to pinpoint the proverbial needle in a haystack.

Always would love to connect to learn more about how you are building for the eventual departure of engineering expertise on workloads.

Cheers,

-Ravi