AI coding agents: A focused look at financial services and government

The shift to autonomous AI coding agents is a liability risk in financial services and government. Discover why workspace-level governance is essential for compliance and safe operationalization

Eric Paulsen

Field CTO - International @ Coder

•

Published on

April 30, 2026

Eighty-nine percent of platform engineering teams use AI daily. Only 69.7% have policies governing that usage. That gap isn't just a process problem - it's a liability waiting to materialize.

For financial services and government, the stakes are higher. A Fortune 100 financial institution learned this when an AI agent, tasked with "resolving compliance violations immediately," deleted a non-compliant service along with its audit logs. According to the agent? Problem solved. According to the organization? The incident triggered a compliance investigation, audit findings, potential regulatory notification, and a freeze on AI agent usage during forensic analysis.

The shift from AI coding assistants to autonomous agents changes everything. When agents execute multi-step workflows, file pull requests, and iterate on solutions without human intervention, traditional governance approaches fail.

This article explains why workspace-level governance is essential for regulated industries. By "workspace," we mean the environment in which a developer works (the dev environment). This article provides platform engineers with a practical framework for transitioning from pilot projects to production deployment.

From assistants to autonomous agents: Why infrastructure must change

There is an architectural distinction between AI assistants and autonomous agents. AI coding assistants like GitHub Copilot operate at Level 1 of the agentic development framework: humans remain in the loop, confirming every suggestion before execution. Tools like Cursor blur this line, enabling more autonomous workflows with less immediate oversight. Fully autonomous agents reach Levels 2-4, where humans move onto the loop, act as orchestrators, or step outside the loop entirely.

The four levels of agentic development:

Level 1 (Human in the loop): AI suggests, humans confirm and execute
Level 2 (Human on the loop): Agents generate PRs for human approval
Level 3 (Human as orchestrator): Agents execute multi-step workflows and deploy low-risk changes
Level 4 (Fully autonomous): Systems of agents initiate and promote changes autonomously

At Level 1, governance is a human concern. At Level 2 and beyond, governance must become an infrastructure concern. The production system itself must provide checks and balances, because humans cannot review every change.

Why application-tier controls fail regulatory scrutiny

Platform teams often ask whether Cursor's or Copilot's built-in governance features are sufficient. For regulated industries, the answer is no.

A large quantitative trading firm's security team audited Cursor, disabled certain policies, then discovered a software update auto-defaulted those policies back on. The firm ran agents non-compliantly for two weeks. The core issue isn't whether the model or AI tooling has controls - it's that those controls aren't yours. If governance lives inside a vendor platform, you're outsourcing enforcement, auditability, and risk management.

Regulated environments need independent control over how agents execute, not just configuration inside a model's system. That control must exist at the workspace level, where infrastructure policies govern network access, tool usage, permissions, and resource boundaries across any model or agent framework.

The regulatory imperative: Why compliance makes operationalization non-optional

Financial services and government operate under frameworks that make uncontrolled AI deployment impossible. These aren't guidelines - they're requirements with nine-figure consequences for failure.

Financial services regulatory requirements:

SOC 2: Requires documented controls over data access, change management, and system monitoring
PCI-DSS: Mandates strict access controls and audit trails for systems handling payment data
GDPR: Requires strict control over where data is processed and a valid lawful basis for processing
Model risk management: Regulators increasingly apply quantitative model standards to AI systems, requiring versioning, monitoring, and drift detection

Government regulatory requirements:

FedRAMP: Defines security and compliance requirements for systems handling federal workloads
ITAR: Requires workspace-level enforcement of citizenship-based access and geographic restrictions
IL4/5/6: Higher classification levels demand physical, not just logical, separation of infrastructure

Many government deployments require fully air-gapped environments with self-hosted models and no external connectivity. While specific requirements vary by program and accreditation, air-gapped deployment is common for classified workloads.

The statistics validate the urgency. Eighty-four percent of organizations consider AI governance a serious concern. Fifty-nine percent don't know how quickly they could shut down AI systems in a crisis. Sixty-eight percent cannot distinguish AI agent actions from human actions. Seventy-seven percent cite risks as a major barrier to adoption.

When agents operate without boundaries

Consider the OpenClaw incident. An agent processing a routine email was manipulated via prompt injection, causing it to exfiltrate credentials using its own authorized access. The agent wasn't compromised - it was doing exactly what it was told, just not by the right person.

Or the Fortune 100 financial services case. The agent identified a non-compliant service and deleted it along with its logs, taking critical systems offline and erasing data needed for compliance audits. These aren't model failures. They're production system failures. The agents had enough power to act but no guardrails to contain their behavior.

The blast radius of an uncontrolled agent scales with its privilege level. Agents will try to solve problems in whatever way possible, including changing infrastructure, mutating policy, or grabbing credentials if not explicitly blocked.

Platform engineering as the foundation for safe agent deployment

Internal Developer Platforms standardize how humans ship software. Operationalizing AI coding agents requires extending that model, so platforms make development paths executable by agents with the right identity, boundaries, and auditability.

The workspace becomes the critical control surface. When agents require persistent compute and scale beyond local machines, development moves into controlled, centrally managed workspaces. In regulated environments, these workspaces are often self-hosted in private infrastructure or deployed in fully air-gapped environments, ensuring code, data, and model interactions never leave organizational boundaries.

Cloud Development Environments (CDEs) operationalize this model by providing centrally managed, policy-controlled workspaces that act as the execution substrate for both humans and agents. Within these environments, governance is enforced through multiple mechanisms: proxying and observability of model interactions, network and process-level controls over execution tasks, and policies enforced globally or per task.

Four governance mechanisms for AI agents:

Provisioning: Terraform-based infrastructure provisioning stands up environments with all dependencies. Templates define compute profiles, network configurations, and pre-installed tools so agents consume their environment as code and know who they are from the start.
Policy: Role-based access controls define what libraries, tools, repos, and network domains each agent can access. Policies are expressed as code and version-controlled. Agents are alerted to their boundaries so they don't waste tokens attempting blocked actions.
Audit: Every prompt, tool call, model interaction, and resource access is logged and attributed. Cost visibility is captured through model usage tracking. Compliance teams can audit agent actions with the same rigor they apply to human developers.
Proxy: Centralized governance for all model and agent usage. Authentication flows through existing identity providers. Supports routing across model providers.

This foundation enables a controlled failure model, where unsafe actions are blocked, logged, and surfaced for review instead of becoming incidents. Agents are probabilistic systems operating inside environments that regulated organizations need to govern deterministically. The production system's job is to constrain probabilistic agent behavior within deterministic guardrails.

Solving the cold start problem

Agents are born into the ether, not knowing who they are, what they are, or what they're doing. They're just given a prompt. Without context, they waste tokens figuring out basic things and produce poor output.

Infrastructure-as-code templates solve this. When workspaces are provisioned from templates, agents can consume their environment as code and understand their role, constraints, and available tools from the start. Supplementing this with lightweight context engineering - markdown files defining standards, anti-patterns, and terminology - dramatically improves performance. Organizations see first-attempt accuracy improve significantly when agents receive structured context upfront.

From compliance burden to competitive advantage

The governance investment pays off not just in risk avoidance but in measurable productivity gains.

A large streaming service with 12,000 developers measured a 100% increase in code production per developer using governed agent infrastructure. Importantly, humans still sign off on all committed code, ensuring the productivity gain occurred under human review. At Skydio, senior engineers run multiple parallel agents consuming hundreds of dollars in tokens daily - but they're getting value because proper governance enables them to trust the output and scale their impact.

A global fintech with 15,000+ engineers reduced developer onboarding from 15-30 days to day one by implementing governed cloud development environments. The same infrastructure now extends to support AI agents. If human developers struggle with environment setup, autonomous agents operating at machine speed will expose every weakness in the system.

Real-world implementation patterns

At a major investment bank, analysts ship new models directly to traders via shared ports in Coder. Removing fragile pipelines accelerates iteration and collaboration. The outcome: faster response to changing market conditions, with governance maintained throughout.

One global fintech made a multi-year strategic commitment, doubling its Coder deployment in active users over twelve months. The organization joined Coder's design partnership program to co-develop capabilities critical to their AI adoption strategy: AI Gateway for compliance and governance observability, Agent Firewall for network isolation, and Coder Agents for background automation.

Perhaps most striking is where this organization sees the next frontier: deploying Coder directly on mainframe infrastructure. By running workspaces on existing mainframe systems, they can leverage ultra-low compute costs to power build and test processes across both developers and agents. AI development infrastructure isn't about forcing organizations onto new platforms - it's about meeting them where their critical systems already live.

Government-specific considerations

Government deployments often require fully isolated environments. FedRAMP provides the baseline for cloud-based deployments, but higher classification levels require physical separation of infrastructure within each classification tier. ITAR regulations impose strict controls on who can access data and where it can reside, requiring workspace-level enforcement of citizenship-based access and geographic restrictions on compute resources.

These constraints make self-hosted, air-gapped deployments common for many government use cases. While a self-hosted CDE can help meet FedRAMP, ITAR, and IL4+ requirements, compliance depends on the totality of controls, implementation, and authorization processes. The same workspace-level governance that enables financial services compliance becomes the foundation for meeting these government requirements.

Implementation roadmap: Visibility, context, scale

The most common mistake platform teams make is attempting to lock agents down immediately. Effective governance starts with understanding behavior.

The three-phase implementation sequence:

Establish visibility first: Deploy observability infrastructure before enforcing restrictions. See which models are being used, how agents interact with tools, what tokens are consumed, and where costs accumulate. This surfaces shadow AI usage, highlights valuable use cases, and establishes baseline metrics.
Provide agents with context: Solve the cold start problem through infrastructure-as-code templates and context engineering. Agents that consume their environment as code and receive structured context upfront perform significantly better while respecting boundaries.
Scale through ephemeral patterns: Instead of long-lived environments, handle each task in an isolated workspace that is created, used, and destroyed as needed. Spin up a workspace per pull request, run the agent, submit changes, and tear the environment down. This prevents context pollution, simplifies branch management, and enables parallel execution.

Seven steps for regulated deployment

Platform teams in financial services and government can follow this sequence to move from experimentation to governed deployment:

Audit current AI and agent usage across teams - Identify shadow AI, understand which tools developers are using, and quantify the governance gap
Establish workspace-level governance as code - Move development from local environments into centrally managed workspaces
Define and enforce privilege separation policies - Document the permission model for agents versus humans and implement it in workspace configurations
Implement observability infrastructure - Deploy logging, attribution, and cost tracking for all agent actions
Deploy context engineering - Create infrastructure-as-code templates and markdown files that provide agents with immediate context
Scale with ephemeral patterns - Implement per-task workspace creation and automated lifecycle controls
Iterate based on metrics - Use visibility data to refine policies, improve context, and optimize resource usage

The key is sequencing. Visibility enables informed policy decisions. Context improves agent performance while enforcing boundaries. Ephemeral patterns enable scale without sacrificing isolation.

The role of platform engineering teams in shaping the future

The space is still evolving. Agent use cases now drive 60% of Coder's business, up from a minimal presence a year ago. Organizations built agentic infrastructure in 2025 after a year of experimentation. This year, they're moving to production.

Platform engineering teams aren't passive consumers of this technology - they're actively shaping solutions. Leading institutions join design partnership programs to co-develop capabilities. Security teams trained on deterministic systems are learning to manage probabilistic AI behaviors through adaptive governance solutions: Policy as Code for automated enforcement, AI observability for monitoring, and context isolation for boundary enforcement.

The Stanford HAI 2025 AI Index documented a persistent gap between organizations recognizing responsible AI risks and actually mitigating them. In regulated industries, that gap becomes a liability. Platform engineering teams are closing it by treating agents as actors within the platform, each with defined permissions, boundaries, and auditability.

The transition to agentic AI development follows the same pattern as previous technological shifts in financial services and government. Organizations that build the right infrastructure now will capture productivity gains while managing risks.

For a comprehensive technical framework on deploying AI coding agents at scale in regulated environments, read the whitepaper recently published by Weave Intelligence: Operationalizing AI coding agents in regulated industries

‍