Are platform engineers building everything except scalable testing?

AI-assisted development is overwhelming traditional CI/CD testing. Platform engineers face scalability, visibility, and reliability gaps that legacy pipelines can’t fix. Discover how Kubernetes-native testing redefines scalability and control for modern engineering teams in 2025.

Ole Lensmar

CTO @ Testkube

•

Published on

November 4, 2025

Welcome to platform engineering in 2025, where AI coding assistants promised to make everyone faster but somehow have made your job infinitely harder.

You're in a no-win situation:

Developers are angry because tests take two hours to run and they're blocked waiting for feedback.
Leadership is angry because the last three production incidents came from code that passed all tests in CI/CD.
Your infrastructure budget is angry because you're requesting more runners again.
You're angry because you're spending your evening debugging why the same test passes locally, fails in CI, and behaves differently in staging.

The problem isn't that you're doing something wrong. The problem is you're trying to solve 2025 problems with 2015 infrastructure. You've inherited CI/CD systems designed for builds and deployments, not large-scale test execution. That architectural decision made sense when humans wrote code at human speed, but it's collapsing now under multiple converging pressures.

More code is being checked in at lower quality than before. AI generates syntactically correct code at machine speed, but it doesn't understand your business logic or edge cases. Developers are merging more PRs per day than your testing infrastructure was provisioned to handle.

The complexity of modern systems has grown exponentially. Microservices architectures mean a single feature now spans multiple services, each with its own testing requirements. Multiple environments, distributed teams, and interdependent services create a combinatorial explosion of what needs testing, and your test matrix grows exponentially rather than linearly.

Time-to-market pressure continues to intensify. Leadership wants faster releases, but speed without quality creates production incidents. Incidents create pressure to add more tests, which makes everything slower, which increases pressure to cut corners.

The challenge unique to platform engineers is that your job is to build and maintain the Internal Developer Platform(IDP) that makes engineering teams productive. When your testing infrastructure can't keep up, you're not just managing a bottleneck. You're the one expected to fix it.

Why CI/CD can’t keep up with testing in the modern SDLC

When CI/CD solutions like Jenkins and GitHub Actions were initially introduced, the primary job was clear: pull code, run tests, build artifacts, deploy them. Testing was a pipeline stage, not a primary workload. The architecture reflected this.

Obviously, today's reality looks very different:

Shift-left is driving teams to do testing in local environments before code even gets to CI/CD pipelines.
Shift-right is driving teams to do testing in pre-production environments and as part of progressive delivery.
Cloud-native is making delivery pipelines asynchronous, we’re event-driven tools handle different stages of delivering software instead of a single orchestrator
GitOps is further decoupling delivery from traditional CI/CD tools by continuously deploying updates to application and infrastructure.
And perhaps most importantly - AI-generated code is shifting how we both build and test code, introducing new bottlenecks in delivery pipelines related both to testing and infrastructure.

On top of all the above (or perhaps as a consequence), the number of both testing tools and CI/CD solutions is proliferating; new tools replace or complement existing ones, AI Agents introduce new tools to get tested themselves, and teams over time and up with a sprawl of testing activities across their infrastructure that are increasingly difficult to track and maintain.

How AI development exposes CI/CD's architectural limits

Today's reality looks different for platform engineers because your developers are using Claude, Cursor, and Copilot to generate code at multiples of the previous rate. This isn't just about velocity. AI-generated code changes the fundamental nature of what needs testing.

A developer might generate five different API implementations in an hour, each syntactically correct but with subtle differences in error handling, performance characteristics, or security implications. Traditional test suites were designed assuming humans write code at human speed with human-style bugs. AI tools generate code faster than the tests and testing infrastructure you maintain can validate it, creating a backlog that defeats the purpose of acceleration.

The three gaps platform engineers face

The scalability gap becomes obvious

Parallelisation, sharding, parameterization, test-composition, etc are all constructs core to an efficient testing strategy but CI/CD solutions were never optimized for running automated tests at scale. You can duct-tape some of those shortcomings with elaborate scripting, but ultimately testing in your CI/CD pipelines was built for a predictable volume of commits and PRs. When that volume increases as your applications grow and because AI helps developers code faster, your runners can't scale proportionally without budget approval that takes weeks. Tests queue, feedback delays, and developers context-switch while waiting for results, which makes them lose the productivity gains AI is supposed to help with.

As a platform engineer, you're caught in the middle. Developers demand faster feedback, leadership demands lower costs, and you're left managing an infrastructure model that wasn't built for this scale.

The visibility gap gets worse

As tests are increasingly run outside of traditional CI/CD, visibility into which tests are run where and by whom fogs up. On top of that, test-results for all of these runs are lost (like tears in the rain..), making individual test analysis and long term reporting a chore. Furthermore, each AI-generated PR might touch multiple microservices, requiring integration tests across services, load tests to validate performance assumptions, and security scans to catch vulnerabilities. Your CI/CD system executes these tests but provides no unified view of how they relate. You see pass/fail signals from disparate tools (Cypress, K6, your security scanner) but lack the centralized observability you need to understand whether failures correlate with specific code patterns, infrastructure changes, or environmental conditions.

The reliability gap is critical

Tests fail intermittently because CI/CD test execution agents don't match your local dev setup or your production environment. Network policies differ. Tool versions differ. Tech-stacks differ. DNS resolution works differently. When tests fail, you can't determine whether the code is broken or the test environment is flaky. This uncertainty erodes trust in your test suite, leading teams to ignore failures or, worse, disable tests entirely. As the platform engineer responsible for maintaining the IDP, this reliability gap directly undermines your team's credibility.

How leading platform teams are responding

Cloud-native platform engineering teams are moving beyond legacy pipelines by treating testing as its own infrastructure layer rather than a pipeline stage. This shift addresses all three gaps simultaneously and aligns with how platform engineers actually think: building scalable, self-service systems for developers.

Unifying test insights across tools

The visibility gap exists because tests run through disparate systems that don't talk to each other. Forward-thinking platform teams solve this by centralizing test execution and observability, giving all engineers a single pane of glass into all their testing activities - be the functional tests run in a local dev environment or performance tests run in pre-prod When every test runs through a consistent execution layer, you can correlate results across different testing frameworks. You can see that K6 load tests started failing at the same time Cypress integration tests showed increased latency. You can track that security scan failures cluster around specific code patterns AI tools commonly generate.

For platform engineers, this means one dashboard for your teams, one API for integrations, one source of truth for all test execution. This is exactly the kind of unified experience you're trying to build for developers across the rest of your platform.

Eliminating flaky test results

The reliability gap exists because automated tests often run in environments that vary greatly depending on where they are running; local tests run on your desktop, CI/CD tests run in the infrastructure of your CI/CD solution, and pre-prod tests run inside your clusters. This is a recipe for flakiness-disaster. -

Providing automated tests with a consistent kubernetes-based infrastructure eliminates an entire category of false failures caused by environment mismatch. It also reduces the operational burden on you as a platform engineer because you're managing one consistent execution environment, not multiple CI/CD runner configurations.

Cutting infrastructure costs through smarter resource allocation

The economics of CI/CD-based testing are linear, right? More tests require more runners, and more runners cost more money. Kubernetes-native testing changes this equation for platform engineers. Your clusters already have compute capacity that scales dynamically based on workload. When you run tests as Kubernetes jobs, they consume the same resource pool your applications use, scaling up during test execution and releasing resources when complete. You're not paying for dedicated CI/CD runners that sit idle between builds. Resource allocation becomes intelligent rather than fixed. High-priority smoke tests can run on dedicated node pools, while comprehensive integration suites can run on spot instances during off-peak hours.

Organizations implementing this approach report handling significantly higher test volumes without proportional infrastructure cost increases. Tests that previously queued in CI/CD for 45 minutes complete in under 10 minutes by running in parallel across cluster capacity.

So what does this actually look like when you implement it?

What this looks like in practice

Testkube is a Kubernetes-native continuous testing platform built specifically for teams that need to regain control of testing at scale.

Instead of treating tests as a serialized pipeline stage, Testkube runs them as Kubernetes Jobs inside your clusters, using the same configurations, secrets, and network policies as your production workloads. That means your tests execute in the exact environment they're validating: no more environment drift, mismatched dependencies, or flakiness caused by runner discrepancies.

The best part is that you don't need to rebuild your pipelines from scratch. Testkube integrates with your existing CI/CD systems (GitHub Actions, Jenkins, ArgoCD, GitLab) while offloading test execution to where it belongs: inside your clusters.

For platform engineers, this shift brings three core advantages:

1. Cluster-native execution and control

Testkube deploys lightweight agents inside your clusters that handle test scheduling, execution, and test observability. You can define how tests run and integrate directly with your existing tools (Helm, ArgoCD, GitHub Actions, Jenkins, etc.). Whether it's a Postman collection, a K6 load test, or a custom Python script, Testkube executes it natively through Kubernetes. This ensures consistent behavior across environments. You maintain full control over test execution while giving developers the self-service capabilities they need.

2. Centralized visibility across every test type

Every test result, log, and metric is collected and correlated in one dashboard. You can see which test types fail most often, which services introduce regressions, and how test performance trends over time. Because all tests run through the same execution layer, you can identify patterns across frameworks, like integration tests that start failing after specific deployments or load tests that correlate with API timeouts.

This is the unified observability platform engineers need to make testing a first-class concern in your IDP.

3. Scalable, cost-efficient resource usage

Your existing clusters already have the compute capacity to handle testing workloads dynamically. Testkube lets you use that capacity intelligently. Tests can scale horizontally across nodes, target specific node pools, or run on spot instances. You don't need to over-provision CI/CD runners just to keep up with AI-driven code volume. You use the infrastructure you already own, when you actually need it. This is a resource optimization model that aligns with how modern platform engineers think about efficiency.

With these capabilities, Testkube lets platform teams decouple testing from CI/CD, observe tests as first-class infrastructure workloads, and scale validation in line with AI-accelerated development. You gain the control and insight you need without the disruption and cost of rebuilding everything.

What this means for platform teams

The traditional framing treats speed and quality as opposing forces. Ship faster means test less, and test thoroughly means ship slower. This trade-off made sense when testing was a linear pipeline stage that blocked deployment.

Kubernetes-native testing dissolves this trade-off by making testing parallel and continuous rather than sequential and blocking. Tests run constantly in the background, validating system behavior as code is written, as infrastructure changes, and as traffic patterns shift. Your deployment pipeline doesn't wait for tests to complete because testing never stops. You can run comprehensive test suites on every commit without blocking merges, validate infrastructure changes before applying them to production, and execute canary validation in production clusters using real traffic patterns.

If you're responsible for delivery infrastructure, the question isn't whether to adopt this approach. The question is when. If shifting left or to cloud-native hasn’t pressured your pipelines, AI-assisted development will definitely expose the scalability limits of traditional CI/CD. Release velocity is already straining test infrastructure that was provisioned for lower volumes. The gap between test environments and production is already causing false failures that slow teams down.

Moving testing into Kubernetes doesn't require abandoning your existing CI/CD investment. Your pipeline still controls builds and deployments. Testing becomes a parallel concern, running continuously in your clusters while your pipeline orchestrates releases. Testing becomes observable, scalable, and environmentally accurate. Your CI/CD pipeline gets faster. Your test coverage can expand without adding runner capacity. Your developers get feedback in the environment that actually matters.

Explore Testkube to see how Kubernetes-native continuous testing works in practice, or request a demo to discuss how it can solve your infrastructure bottlenecks.