Name: The journey to production AI: A practical guide for platform engineers and SREs
Start: 2026-05-05T17:00:00.000Z

Virtual

In-person

The journey to production AI: A practical guide for platform engineers and SREs

May 5, 2026

7:00 pm

CEST

CET

60 minutes

Most teams are experimenting with AI agents. Few are running them reliably in production. This session breaks down how to move from first agent to trusted, repeatable workflows with real telemetry, strong context, and control over cost, accuracy, and scale.

Watch recording

Speaker

Andre Elizondo

Head of innovation and AI @ Mezmo

Speaker

Getting a production AI agent to work in a demo is easy. Getting it to run reliably at scale - with real telemetry, bounded costs, and without hallucinating its way through your infrastructure - is an entirely different challenge. This post maps the journey from ad hoc experimentation to trusted production AI, covering context engineering, memory systems, governance, and how to earn autonomy incrementally.

Main insights

Production AI requires more than model selection - context engineering, memory, and governance are the hidden challenges that determine success at scale
The path to production AI follows five clear steps: choose the right harness, pick a bounded problem, engineer context, build memory, and earn autonomy gradually
Context engineering is the unlock for reliable production AI - relevance beats relatedness, and structure beats volume when managing finite context windows
Start with reversible, bounded use cases like incident investigation before moving to autonomous remediation

Andre Elazando, Head of Innovation & AI at Mezmo, brings over a decade of experience across security and infrastructure at companies like Wiz, Adobe, and Chef. Starting his career as a sysadmin, he developed a deep passion for cloud and SRE that now drives his work building production AI systems.

You can watch the full discussion here if you missed it: The journey to production AI

The iceberg problem: What's hiding beneath the surface

Most teams focus on the visible parts of building AI agents - choosing a model, selecting a framework, crafting prompts, and defining use cases. But these represent just the tip of the iceberg. The real challenges are below the waterline.

"What's typically missing is really everything down below," Andre explains. "How am I going to actually make sure that I'm not blowing out the context window? How am I actually going to make sure that the agent actually runs in a way that it gets smarter over time, it gets better over time?"

The hidden challenges include:

Context management - Preventing context window bloat while preserving relevant signal
Memory and learning - Ensuring agents improve over time rather than starting from scratch on each task
Governance and observability - Establishing full audit trails and understanding what agents are doing
Cost control - Managing token utilization to prevent runaway expenses
Versioning and GitOps - Tracking workflow changes and maintaining discipline at scale

According to Gartner research, fewer than 5% of organizations were running SRE tasks with agents in 2024, but that number is projected to reach 85% by 2029. The biggest barriers are precisely these below-the-surface challenges that emerge once you move past the demo phase.

Defining production AI: In production and for production

Andre frames production AI across two critical dimensions. First, the AI system itself must be production grade - trusted, repeatable, and observable. Second, it must be built for production - meaning it can safely handle the sensitivity and stakes of live environments.

Production grade systems rest on three core tenets:

Trusted - Full audit trails, measurable outcomes, and the ability to grade agent performance over time
Repeatable - Consistent workflows that deliver the same outcome every time, defined through simple configuration rather than custom code per agent
Observable - End-to-end visibility into agent actions, planning cycles, self-evaluation, and tool calls

The "for production" dimension matters because the failure modes are asymmetric. When an AI-generated code suggestion fails in development, you regenerate it. When an agent makes the wrong call in production infrastructure, the consequences can cascade quickly.

The five-step journey to production AI

Step 1: Choose the right harness

Rather than starting with general-purpose frameworks, select an opinionated <a href="https://www.platformengineering.org/tools/harness">harness</a> built specifically for production operations. "The opinions of how that agent should operate typically live within the harness," Andre explains. This eliminates the need to rebuild boilerplate for every agent and ensures production best practices are built in from day one.

Step 2: Pick a painful, bounded, reversible problem

Your first use case must meet three criteria:

Painful - Worth solving and run frequently enough to generate feedback
Bounded - Clear inputs, outputs, and ideally an existing runbook
Reversible - Mistakes can be corrected without catastrophic consequences

Incident investigation fits all three. "Choose a bounded problem, right, like something where I have very clear inputs, very clear outputs, ideally something that I've already defined in a runbook in my environment," Andre recommends. This gives you a tight feedback loop without exposing production systems to unchecked autonomous action.

Step 3: Engineer your context

Context engineering - the practice of deliberately curating what information enters an agent's context window - is the unlock for reliable production AI. Three principles guide this work:

Treat context windows as finite resources - Every token introduces latency and cost
Prioritize relevance over relatedness - Focus on what changed in the checkout service in the last hour, not everything in your environment
Structure beats volume - High-quality, curated signal outperforms flooding the agent with raw data

"It's a lot more meaningful to have relevance versus relatedness," Andre emphasizes. "I actually really would rather have the most relevant things based on what that agent is currently operating on."

Step 4: Build memory that compounds

Agents must get smarter over time, not reset on every task. This requires engineering memory systems that:

Persist learnings across investigations
Understand service relationships and typical patterns
Store and prioritize relevant historical context
Filter noise before it reaches the agent

"You actually want to kind of like focus on persisting memory for the agent so that the agent can better understand its environment over time," Andre notes. Without this, you're paying the full cost of context reconstruction on every run.

Step 5: Earn autonomy gradually

Teams can progress through three levels of automation:

Co-pilot - The agent suggests actions; a human approves everything
Assistant - The agent handles recognized patterns autonomously and escalates on novel scenarios
Autonomous - The agent operates independently with full audit trails for compliance

"We want to be able to get to the point where the agents that we're running in production are operating in a way that we trust them," Andre explains. "We get kind of more into the audit mode of I know I can visualize everything." Earning that trust incrementally is what separates sustainable production AI from brittle demos.

Real-world impact: From months to minutes

The proof is in production deployments. Rescale, using Mezmo's AI SRE functionality with the Aura harness, reduced investigation time from months to under an hour for complex incident response cases - including gnarly transient 503 errors that typically require extensive manual investigation across distributed systems.

"We take investigations from the matter of months to just a few minutes," Andre reports. That kind of operational leverage is only achievable when context, memory, and governance are treated as first-class engineering concerns.

Open source and community-driven development

Mezmo has open-sourced Aura, an agentic harness purpose-built for SRE and platform teams. The harness eliminates approximately 80% of agent-building boilerplate through simple TOML configuration files and makes agents observable and governed by default.

Aura supports model-agnostic deployment, working with OpenAI, Anthropic, Ollama, Bedrock, and any service exposing an OpenAI-compatible endpoint. The open-source approach enables community contribution and standardization - teams can adopt proven workflows from other SREs without rebuilding the underlying orchestration for memory, tool calls, and audit trails.

If you enjoyed this, find here more great insights and events from our Platform Engineering Community.

If you want to dive deeper, explore our instructor-led Platform Engineering Certified Professional course and connect with peers from large-scale enterprises who are driving platform engineering initiatives.

Key takeaways

Production AI requires engineering beyond the model - Context management, memory, governance, and observability are not optional extras. Teams that focus only on model selection and prompts will hit hard walls when moving from demo to production.
Start small and bounded, then graduate trust - Begin with reversible, well-defined use cases like incident investigation rather than jumping straight to autonomous remediation. Build confidence through measurable outcomes before expanding agent autonomy.
Context engineering is a first-class production concern - Finite context windows demand careful curation. Prioritize relevance over relatedness, structure over volume, and treat every token as a cost and latency consideration.
Choose opinionated harnesses over general frameworks - Production-focused harnesses encode best practices and eliminate boilerplate, making it easier to scale from one agent to ten or a hundred while maintaining consistency, observability, and governance across your agentic systems.

This event is exclusive. Reserve your spot now.

Subscribe to Platform Weekly