AI and Platform Engineering

From AI-powered IDPs to AI/ML-ready platforms, AI is redefining platform engineering. Learn how teams are navigating the risks, hype, and delivering real-world impact.

Luca Galante

Core contributor @ Platform Engineering

•

Published on

March 28, 2025

Artificial Intelligence is reshaping software development, and platform engineering is no exception. From the emerging sub-discipline AI platform engineering, which uses AI to empower platform teams and their Internal Developer Platforms (IDPs) to platforms built specifically to empower AI initiatives, it’s clear that AI is shaking up the platform engineering universe. With use cases like the automation of infrastructure management to enhance developer productivity or the rise of AI/ML workloads, AI is creating more and more demands (and opportunities) for platform engineering to prove its value.

Picture this: According to Google Cloud, a staggering 94% of organisations identify AI as either ‘Critical’ or ‘Important’ to the future of platform engineering. And more, 86% believe that platform engineering is essential to realizing the full business value of AI.

AI-Augmented Software Engineering on the Gartner Hype Cycle for Platform Engineering

Despite all the excitement, however, the space remains pretty badly defined. The number of industry buzzwords has 10x’d, and companies are struggling to distinguish between genuine practical applications of AI and pure hype. The term “AI Platform Engineering”, which refers to the use of AI to empower Internal Developer Platforms (IDPs), is often used interchangeably whether you’re talking about AI-enabled IDPs or IDPs enabling AI. At the same time, the landscape is messy. AI adoption maturity levels vary wildly, ownership is unclear, and best practices are still being written in real-time. It's like watching a live experiment where the rules are being invented on the spot.

In this article, I’ll try to bring some clarity to this rapidly evolving space. We'll break down how AI is intersecting with platform engineering, expose the real challenges and opportunities, and provide a practical framework for platform teams looking to navigate this brave new world.

‍

AI-powered platforms vs. platforms for AI?

The intersection of AI and platform engineering can be broadly categorized into two key perspectives: AI platform engineering, where AI powers Internal Developer Platforms (IDPs) to better deliver on the benefits of platform engineering, or IDPs built to facilitate the deployment and use of AI / ML workloads. While both categories involve AI, they serve different personas and have distinct goals. AI platform engineering enhances developer productivity, while platforms for AI provide the foundational infrastructure needed to build and scale AI workloads.

‍

AI platform engineering

AI-powered platforms integrate artificial intelligence as a tool to empower the platform itself. As stated above, the objective is to turbocharge the benefits that come from a platform engineering initiative. That is things like improved developer experience and productivity. Better standardization and automation, and at the same time, improved compliance and security. These platforms leverage AI and machine learning (ML) technologies, primarily large language models (LLMs), to facilitate access to information and provide intelligent recommendations, automate workflows, optimize decision-making processes, and reduce cognitive load for users. Examples like:

AI-driven observability: Automated log analysis and anomaly detection to surface potential issues before they impact production, and identification of optimization opportunities
Intelligent automation: AI suggests and executes repetitive tasks, such as infrastructure scaling or CI/CD optimizations.
Natural language interfaces: AI-powered assistants allow engineers to query platform data and receive actionable insights.

Focusing first on the pure positive. LLMs thrive in environments where the data space is well-defined and structured. Platform engineering is basically their ideal playground. Unlike open-ended, “high-entropy” domains that challenge generative AI, platform engineering operates within a finite ecosystem of concepts and actions, like infrastructure resources, configuration management, logs, metrics, and policies. A general-purpose LLM might be great with a concrete query like "What is the current memory usage of my Kubernetes cluster?" A scenario with very clear, verifiable answers that leverage the model's strengths in structured information processing. Or even an abstract question like “What is the meaning of life?” where technically any answer might be correct. It gets dangerous when there is a “right” answer but it is not easily verifiable, as if the AI starts hallucinating it can be more convincing than actually right, without an easy ability to verify.

Platforms often offer many opportunities for the kinds of questions AI is great at. They can offer centralized, high-quality data like CI/CD logs, deployment histories, configs, and system metrics that help AI generate accurate insights with little noise. With platforms, AI can do things like quickly analyze predictable log data and alerts to produce focused summaries, like highlighting failures from the past 24 hours. Or serve as an augmentation layer for platform users, enabling natural language queries across logs and metrics, generating cross-tool insights and performance summaries, and automating complex tasks like infrastructure configuration, resource optimization, and incident response, which reduce the need for manual intervention in repetitive tasks.

Sounds amazing right?

What’s the catch?

As I have already teased above. The integration of AI into platform engineering introduces the inherent risk of hallucination. Use any AI assistant enough, and you’ll know exactly what I mean. Not a problem when you’re asking it to write emails, but when you’re messing with your infrastructure? That’s a bit too scary… Which is why it’s essential that AI should replace the “expert in the loop,” not the “human in the loop”. AI should assist with automation and optimization suggestions while leaving final approval to humans. Secondly, the reliability of LLM-based code generation for infrastructure configurations is currently too low (and by too low, I mean it’s terrible), making the human effort required to catch misconfigurations way too high. LLMs are non-deterministic by nature, which can be a big problem for infrastructure automation.

To make AI truly usable in this context, we need to build deterministic AI-powered platforms. That means tackling the unpredictability of LLMs head-on. One way to do this is by structuring AI inputs and outputs, constraining model responses to predefined patterns whenever possible. But it also requires serious backend discipline: strong permission management to prevent AI-driven automation from crossing security boundaries; AI agents that can resume stateful operations without manual intervention, and strict policy enforcement on AI-generated artifacts, especially when the output includes code or configuration.

The benefits of AI platform engineering are immense, but it is crucial that the inherent risks of AI are taken seriously. It can’t be treated as a magic box, it needs to be carefully integrated into platform workflows with guardrails, context, and control.

‍

Platforms for AI

The other side of the AI and platform engineering coin is platforms built specifically to better enable the deployment and use of AI / ML workloads. You can think of these as the mission control centers for artificial intelligence. These aren't your run-of-the-mill computing environments; they're highly specialized ecosystems designed to support the world of AI and machine learning. Their purpose is to support AI/ML workloads and provide the necessary infrastructure, data management capabilities, and model-serving environments for AI applications. Their focus? Enabling data scientists and ML engineers to develop, train, and deploy AI models efficiently.

These platforms are equipped with high-performance hardware like GPUs, TPUs, and NPUs, and they dynamically handle intensive training and inference processes. Their data infrastructure includes everything from real-time streams to encrypted data lakes, ensuring scalability and security for even the most sensitive information.

We’re in the process of building a new platform engineering reference architecture that reflects exactly this kind of platform.

The future of AI in platform engineering

So, what can we expect? A recent Red Hat survey revealed that 83% of enterprises have already integrated AI into their software development stacks. Platform teams are now integrating new specialist roles like AI engineers, data engineers, and MLOps specialists.

Looking ahead, LLMs will act as copilots, supporting developers, SREs, and platform engineers by auto-generating infrastructure-as-code (IaC), security policies, and CI/CD pipelines. AI will increasingly automate policy enforcement and security compliance, flagging risky changes in real time (as scary as that sounds). Essentially, AI-powered platforms will become more autonomous, capable of identifying performance bottlenecks and self-optimizing workloads, making multi-model orchestration play a vital part.

Platform engineering teams must prepare by embracing AI as a core capability. This means proactively integrating LLMs and automation into everyday workflows, from development to infrastructure management. They must embrace the big challenge of experimentation with AI-driven tools for security and compliance. They must be ready to face the potential risks and issues that might involve not just the usual challenges but new questions like model poisoning and unauthorized data access by agents. Platform engineers must build platforms that deliver on business value but still contain robust, policy-driven controls and continuous monitoring.

They will also need to be flexible when it comes to the AI itself. Model-agnostic design will allow teams to remain independent of any single AI provider and switch models as newer, better options emerge.

A new era of platform engineering

We’re in a pivotal moment for platform engineering. It is undeniable that AI is a game-changing tidal wave. However, rather than being washed away, as many fear, it is clear that platform engineering will be riding that wave. AI isn't just knocking on the door; it's already walking into the room and rearranging the furniture of how we build, deploy, and manage technology, and platform teams are emerging as the critical architects of this new landscape.

But we’re still early. The best practices, the roles, the tools, they’re all being defined right now by all of us in the platform engineering community.

If you want to help define it. Take our survey and contribute to the first-ever State of AI Platform Engineering report. Your insights will help shape the future of AI platform engineering.