New reference architecture for an AI/ML Internal Developer Platform on GCP

Bridge the gap between AI pilots and production. Explore the new six-plane reference architecture for an AI/ML Internal Developer Platform (IDP) on GCP, designed to handle data-intensive and complex ML workloads.

Luca Galante

Core Contributor @ Platform Engineering

•

Published on

December 8, 2025

Most organizations are stuck between AI pilots and production scale. You've got data scientists running experiments in notebooks, ML engineers manually wiring together training pipelines, and platform teams scrambling to provide GPU access without blowing the budget. The gap isn't technical capability, it's architectural clarity.

To close this gap, we just released a new reference architecture for an AI/ML Internal Developer Platform (IDP) on GCP, co-authored by leading industry experts Dilek Altin, Dr. Kessie Francis Kwasi (Principal Data Scientist at Fortescue) and Muhammad Nouman Shahzad.

Why traditional IDPs fail for AI/ML workloads

Your existing IDP handles application deployment well. It falls apart when data scientists need GPU clusters, ML engineers require feature stores, and every model needs lineage tracking from raw data to production inference.

The problem is fundamental: AI/ML workloads are data-intensive, compute-heterogeneous, and lifecycle-complex in ways that traditional applications aren't. A web service deploys once and runs. A model trains repeatedly, serves continuously, and requires constant monitoring for drift. The infrastructure patterns don't translate.

Trying to force ML into a general-purpose IDP introduces friction at every step. Data scientists wait for GPU access. ML engineers manually configure serving infrastructure. Platform teams become bottlenecks.

The architecture addresses three core challenges:

Data management complexity: Training data, feature stores, model artifacts, and inference results require different storage patterns, access controls, and lineage tracking than application code
Heterogeneous compute: CPU for data processing, GPU for training, TPU for large models, and specialized inference hardware—all provisioned dynamically based on workload requirements
Model lifecycle management: Experimentation, training, validation, deployment, monitoring, retraining, and rollback form a continuous cycle that traditional CI/CD doesn't capture

The six-plane architecture for ML/AI IDPs

The new reference architecture organizes capabilities into six parallel planes rather than hierarchical layers. This matters because ML systems aren't linear stacks. They're complex ecosystems where security, observability, and data management intersect at every workflow stage.

Why planes instead of layers? Layers imply sequential dependencies and strict hierarchy. Planes represent parallel concerns that interact dynamically. When a data scientist launches a notebook, the Developer Control Plane handles the request, the Resource Plane provisions compute, the Security Plane injects credentials, and the Observability Plane starts collecting metrics - all simultaneously. This architectural choice prevents the rigidity and bottlenecks that plague layered approaches.

The six planes work together:

Developer Control Plane: The front door for all users: data scientists, ML engineers, data engineers, platform engineers. Provides IDEs (VS Code), notebook workspaces (Jupyter), AI assistants (Claude Code), and a unified portal (Backstage) for self-service access
Integration & Delivery Plane: Handles version control (GitHub), platform orchestration, ML workflow orchestration (Kubeflow Pipelines), CI/CD (GitHub Actions), and artifact management (Google Artifact Registry)
Data and Model Management Plane: Vertex AI Metadata for lineage tracking, Vertex Feature Store for consistent features across training and serving, and Vertex AI Model Registry for versioning and governance
Resource Plane: GKE for compute orchestration, Cloud SQL for structured data, Cloud Storage for artifacts, Kafka for streaming, and Nvidia Triton for high-performance inference
Observability Plane: Cloud Monitoring for infrastructure, Honeycomb for distributed tracing, Arize AI for model-specific observability including drift and hallucination detection, Flexera for FinOps, Monte Carlo for data validation, and Dataplex for lineage
Security Plane: SonarQube for code analysis, Google Secrets Manager for credentials, IAM for identity, OPA for policy enforcement, Cilium for network security, and Protect AI for model scanning

This technology mapping represents a reference implementation, not a rigid prescription. You can substitute tools within each category based on your context. The architecture's value lies in the separation of concerns and standardized interfaces between planes.

The modular approach enables independent development and updates. Your security team can evolve policy enforcement without disrupting data pipelines. Your platform team can swap orchestrators without rewriting golden paths. As organizational needs change, you can add new planes or extend existing ones.

Vendor agnostic principles and flexibility

This reference architecture serves as a foundational starting point for a comprehensive discussion on implementing an Internal Developer Platform (IDP) tailored for Machine Learning and AI workloads on Google Cloud Platform (GCP). It is essential to recognize that this document represents a set of best-practice patterns and architectural principles, rather than a rigid, prescriptive mandate.

While the current architecture illustrates a specific technology stack on GCP, the underlying architectural patterns are fundamentally vendor-agnostic. The core concepts of platform engineering, such as abstraction of infrastructure, developer self-service, paved roads, and integrated MLOps workflows, can be readily applied and adapted to other public or private cloud environments..

Furthermore, the design emphasizes flexibility. Each component category within the reference architecture is designed to be swappable. Organizations are encouraged to evaluate and substitute any tool or service listed with alternatives that better align with their existing technology landscape, compliance requirements, security policies, or team expertise. The goal is to provide a blueprint, not a definitive toolchain.

Outlook and future work

Platform engineering is a rapidly evolving domain, and the needs of ML/AI teams are constantly changing. Therefore, our work on reference architectures is ongoing. We are actively developing and will soon publish additional reference architectures that explore alternative technology stacks, specifically focusing on different cloud providers and open-source ecosystems. These future reports will provide blueprints for achieving similar platform capabilities using distinct combinations of services and tools, offering a broader spectrum of proven patterns for the community.

We encourage platform teams, data scientists, and ML engineers to engage with this architecture, provide feedback, and use it as a catalyst for designing and refining their own bespoke IDPs.

To dive deeper into the technical details, comprehensive diagrams, and in-depth analysis of this architectural pattern, you can download the full reference architecture whitepaper.