Kubernetes cluster management microservices: Streamline environments and boost developer productivity
Most platform engineering teams face a familiar challenge: as organizations adopt Kubernetes and microservices, the number of development and testing environments multiplies rapidly. What starts as a single production cluster quickly becomes a sprawling ecosystem of staging clusters, CI environments, team-specific dev clusters, and personal ephemeral environments. The result? Sky-high cloud bills, overwhelmed platform teams, and developers still waiting 20-30 minutes for feedback on their code changes.
In this Platform Engineering community webinar, Arsh Sharma, Senior DevRel Engineer at MetalBear and CNCF Ambassador, shares a practical approach to breaking this cycle. If you missed the live session, you can watch the full webinar here.
Arsh brings deep expertise to this topic, having worked in the platform engineering space for four years and previously contributed to the Kubernetes project at VMware. He has been awarded the Kubernetes Contributor Award and has contributed to CNCF projects including cert-manager and Kyverno. His experience spans both the technical challenges of cloud-native development and the practical realities of building developer platforms at scale.
The cloud-native environment explosion: A familiar story
Arsh opens the webinar by walking through a progression that many organizations will recognize. "This is the story of going cloud native," he explains, describing how teams typically evolve their infrastructure as they adopt Kubernetes.
The journey usually begins simply enough. Your organization migrates an application to Kubernetes, creating a production cluster. Everything seems manageable at this stage. Then reality sets in: pushing directly to production isn't feasible, so you add a staging cluster for testing changes in a controlled environment.
As your team grows, staging becomes a bottleneck. "Some people want to test their changes on staging but they're not able to because some other team is using the staging cluster," Arsh notes. To address this, organizations spin up clusters in CI so developers can test their commits without waiting for staging access.
But the proliferation doesn't stop there. Teams request their own dev clusters that mirror production, wanting realistic environments to test their microservices. Finally, individual developers demand personal ephemeral clusters they can spin up on demand. "Now things have really gotten out of control and your cloud budget is just exploding," Arsh observes.
This progression isn't just about cost. The management overhead on platform teams becomes substantial, and even with all these environments, developers still wait 5-10 minutes for ephemeral environments to provision. Keeping these environments in sync with production remains a persistent challenge.
Why microservices make Kubernetes cluster management worse
Arsh identifies two fundamental reasons why this environment chaos is specific to cloud-native, microservices-based applications.
First, microservices create highly interdependent architectures. "If you work on one isolated part of your application, let's suppose you're working on service A, but service A still needs to talk to service B which might be talking to service C and so on," he explains. When you want to test code changes for service A, you need access to all the other services, databases, queues, and dependencies your application requires. This makes creating isolated local or remote dev environments extremely difficult.
Second, production environments look fundamentally different from what developers can replicate locally. Production Kubernetes clusters include secrets for configuration, observability tools, service meshes, and other infrastructure that simply can't be replicated in local development. "There is this huge mismatch between what a developer the environment a developer writes code in versus where it actually gets deployed," Arsh emphasizes. This mismatch means configuration issues and environment-specific bugs often don't surface until code reaches staging or production.
The costly development feedback loop
The current development workflow in most organizations follows a predictable but inefficient pattern. Developers write code locally, run basic unit tests, then push their changes to trigger CI pipelines and staging deployments. This step takes significant time, and as Arsh points out, "as developers we know that your code like most probably will not work in the first time."
When you spend 20-30 minutes going through this process only to discover something isn't working, you fix the issue locally and repeat the entire loop again. "For each feature or bug fix, you're essentially going through this loop like at least three or four times and that's an hour or two wasted per developer per feature," Arsh calculates. "If you've just calculated across a team of 100 or 200 developers, that really affects the speed at which your organization is able to ship features and bug fixes."
The bottleneck in this loop is clear: waiting for pipelines and deployments. If developers could test their application as easily as running unit tests locally, the feedback loop would be dramatically faster, directly impacting developer productivity.
Current solutions and their limitations
Organizations have tried various approaches to address these challenges, but each comes with significant drawbacks.
Personal remote environments - whether ephemeral on-demand clusters or dedicated namespaces - turn out to be expensive at scale. "Even if you're going for one or two clusters and you know then partitioning by namespace, the clusters you end up provisioning are a lot larger for your entire team and that is why it just turns out to be really costly," Arsh explains. These environments also require substantial management overhead from platform teams and still take time to provision, leaving developers waiting.
Local tools like Docker Compose, Minikube, and Kind avoid the cost issue but introduce their own problems. The setup becomes complex, with developers often needing to debug issues themselves or constantly ping the platform team for help. More critically, "they do not replicate production properly," Arsh emphasizes. A local Kubernetes cluster remains fundamentally different from production or staging environments, meaning bugs still surface later in the deployment pipeline that weren't caught locally.
A controversial solution: Shared staging environments
Arsh poses what he acknowledges is "a somewhat controversial topic": What if you could just use your existing staging environment for development and CI?
The typical objections are immediate: it's not possible, the shared environment will break, developers will have to wait for access. "And that is where mirrord comes into picture," Arsh introduces. "mirrord is a local Kubernetes development tool. It is an open source tool which lets you run your local process in the context of a Kubernetes environment."
The key insight is that your code still runs locally on your machine, but you can test it and see how it behaves as if it was running in the cloud environment. "This gives you all the benefits of you know testing on a Kubernetes cluster, testing in a staging environment but without having to go through the hassles of actually deploying code without having to wait 15 20 minutes for CI pipelines to build container images and then deploy those images on a cluster," Arsh explains.
How mirrord works: Traffic mirroring in practice
The technical mechanism behind mirrord is elegant. The tool mirrors traffic and data between the cloud environment and your local machine, handling both incoming traffic from the cloud to your local process and outgoing traffic from your local process to the cloud.
Arsh walks through a concrete example: Suppose you've made code changes to service A and run that service locally. You're only running service A on your machine, not service B, databases, or other dependencies. Meanwhile, your staging cluster has service A already deployed along with all other services.
mirrord establishes a connection between your locally running service A and the deployed service A in the cluster. Once connected, any traffic coming to the cluster intended for service A gets mirrored to your local machine. "This way you are able to see how your code behaves when a request hits that code without having to you know actually deploy it," Arsh explains.
When your locally running process needs to talk to other services or access databases, it sends outgoing requests. mirrord mirrors these requests to the cloud environment, making them appear as if they're coming from the service A in the cluster. The other services respond, and those responses get mirrored back to your local process.
The tool also supports a "steal mode" for targeted requests. In this mode, when a request hits your service, it gets stolen and sent to your locally running process, which then sends the response. This allows you to see exactly how your code changes behave in the context of your broader application.
Transforming the development loop
With mirrord, the development loop changes dramatically. Instead of the lengthy cycle of writing code, running basic tests, waiting 15-20 minutes for CI and staging deployment, discovering issues, and repeating, developers now follow a much tighter loop.
"You write code, you test it against your staging environment using mirrord locally and that is that takes like 5 seconds as compared to 15 or 20 minutes," Arsh describes. Developers iterate as many times as needed in this fast loop, then only go through the CI and staging deployment process once as a final sanity check.
"The difference here is that now you are relying on this step as a final validation instead of you know going through it each time you need feedback," Arsh emphasizes. You're not waiting 20-30 minutes per code change you want to test - you only do that once when you're confident in your changes.
Addressing the multi-developer challenge
During the Q&A, an attendee raised a critical question: How do you handle multiple developers wanting to test changes to the same service simultaneously?
Arsh explains that this is where mirrord for Teams, their commercial offering, comes in. "You install the mirrord operator which is part of the helm chart on your Kubernetes cluster and then the operator takes care of managing multiple concurrent mirrord sessions," he describes.
Multiple developers can target the same service, and the operator manages these sessions while ensuring there are no conflicts and that the service remains functional for everyone, including those not actively testing code. This addresses one of the primary concerns about using shared staging environments for development.
Beyond development: Preview environments for non-engineers
Arsh shares an interesting emerging use case: enabling product managers and designers to preview features before they're deployed. By setting specific headers on requests to the staging environment, these requests can be routed to a particular developer's machine running local changes.
"This way product managers, product designers are able to you know see how the changes the developers working on would behave without the developers having to you know create container images and deploy them on staging environments," Arsh explains. This creates preview environments without the overhead of actually deploying code.
Key takeaways
Environment proliferation is expensive and inefficient: The typical progression from production to staging to CI to dev clusters to ephemeral environments creates substantial costs and management overhead while still leaving developers waiting for feedback. For teams of 100-200 developers, the time lost to slow feedback loops significantly impacts delivery speed.
Microservices architecture fundamentally changes development requirements: The interdependencies between services and the mismatch between local and production environments make traditional local development approaches inadequate. Developers need access to the full application context to validate their changes effectively.
Shared staging environments can work with the right tooling: By mirroring traffic between local processes and cloud environments, developers can test code changes in realistic environments without actually deploying. This approach eliminates the wait time for CI pipelines and staging deployments while maintaining environment stability.
Traffic mirroring enables fast feedback loops: When developers can test against staging in seconds rather than waiting 15-20 minutes for deployments, they iterate faster and catch issues earlier. The development loop transforms from a lengthy, frustrating process to a rapid cycle that keeps developers in flow.
Consolidating environments reduces costs without sacrificing productivity: Organizations can shrink their cloud bills and reduce platform team burden by using fewer, shared environments instead of proliferating personal and ephemeral clusters. With proper tooling to manage concurrent access, this approach actually improves developer experience rather than degrading it.
Conclusion
The environment chaos that comes with cloud-native development isn't inevitable. As Arsh demonstrates, organizations can break the cycle of ever-multiplying Kubernetes clusters while actually improving developer productivity. The key is shifting from isolated environments that require deployment to shared environments accessed through traffic mirroring.
This approach addresses the fundamental challenge of microservices development: the need to test code in the context of the full application without the overhead of deploying every change. By enabling developers to work locally while accessing real staging infrastructure, teams can move faster, spend less, and reduce the burden on platform engineering teams.
Join the Platform Engineering community and connect with peers on Slack and stay tuned for more events.

