Name: AI benchmarks: What Jellyfish learned from analyzing 20 million PRs
Start: 2026-03-05T18:00:00.000Z

Virtual

In-person

AI benchmarks: What Jellyfish learned from analyzing 20 million PRs

Mar 5, 2026

7:00 pm

CEST

CET

45min

AI is transforming software engineering, but results rarely match the hype. Why do some teams double productivity while others stall? Join Jellyfish researchers Nicholas Arcolano and Nik Albarran as they share insights from 20M pull requests across 200K developers, revealing what’s actually driving real AI impact at scale.

Watch recording

Speaker

Nicholas Arcolano

Head of Research @ Jellyfish

Speaker

The promise of AI-powered development tools has dominated tech conversations, but hard data on their real-world impact has been scarce. Recent findings from one of the largest studies to date - analyzing 20 million pull requests across 1,000 companies and 200,000 developers - reveal both encouraging productivity gains and critical architectural barriers that determine whether teams actually benefit from AI tools.

Main Insights

AI adoption has reached a median of 63% of engineers using tools weekly, with autonomous agents now accounting for up to 10% of PRs at leading companies
Teams with high AI adoption see approximately 2x increase in PR throughput and 24% faster cycle times on average
Code architecture matters enormously - centralized codebases see 4x productivity gains while highly distributed architectures show little to no benefit
PR size increases by roughly 18% with AI usage, primarily from more verbose code rather than broader scope changes

Nicholas Arcolano, Ph.D., Head of Research at Jellyfish, leads a multidisciplinary department focused on advanced ML and AI algorithms, analytics, and data science. Luke, an engineering director at Jellyfish, joined the discussion to provide perspective on how these findings align with real-world implementation challenges teams face when adopting AI tools.

You can watch the full discussion here if you missed it: https://youtube.com/live/DIHdZCj_xoc

The current state of AI adoption

The data reveals that AI coding tools have moved well beyond experimentation. As of February 2025, the median company sees 63% of its engineers using AI tools at least weekly. More significantly, nearly 60% of companies have achieved what Arcolano calls "frequent usage" - where 50% or more of their engineers use AI tools three or more days per week.

"This is interesting, you know, more interesting using this once a week is not going to cut it," Arcolano explained. The tool landscape shows signs of consolidation after a period of experimentation. While many engineers initially tried multiple tools simultaneously, companies are now standardizing on specific platforms to build durable skills and manage costs. This shift from "try everything at the buffet" to deliberate tool selection reflects a maturation in how organizations approach AI adoption.

The rise of autonomous agents

Perhaps the most striking trend is the rapid acceleration of autonomous agent usage. These tools differ from interactive AI assistants - instead of helping you write code in real-time, autonomous agents take a specification, work independently, and open pull requests without continuous human guidance.

"If you'd asked me Q3, Q4, is this a thing? On Twitter, it sounds like it's a thing. In our data, you were talking less than I think 4% of PRs," Arcolano noted. By January 2025, the median company saw autonomous agents handling 1% of their PRs. However, elite companies at the 90th percentile have reached approximately 10% of PRs being opened by autonomous agents - a dramatic increase from just 3% a few months earlier. This exponential curve suggests that organizations investing in the infrastructure and processes to support autonomous agents are seeing rapid returns. However, Arcolano emphasized that no companies in their dataset are letting robots ship code directly to production. Human review and ownership remain standard practice, even as agents take on more of the initial coding work.

Productivity gains: The numbers

The core productivity findings center on two key metrics: PR throughput and cycle time. When comparing companies across the adoption spectrum, the data shows:

PR throughput: Moving from 0% to 100% AI adoption correlates with approximately a 2x increase in average PRs merged per engineer per week
Cycle time: Median time from first commit to PR merge decreases by roughly 24% with higher AI adoption

These gains represent meaningful improvements in development velocity. However, Arcolano cautioned against interpreting these numbers as pure business value. "Are you seeing actual business value? What we see is a lot of customers as they're learning these things, as they're building these muscles and as they're building the technology, they're pointing it at safe things."

The side effects: Bigger PRs and quality concerns

AI-assisted development introduces notable changes to code patterns. PRs created with higher AI usage are approximately 18% larger on average, measured by lines added. Importantly, this increase comes primarily from additions rather than deletions, and the number of files changed remains relatively constant.

"This suggests it's more verbose code kind of accomplishing the same functionality," Arcolano explained. AI tools tend to include more exception handling, comments, and defensive programming than human developers working quickly might write. Whether this verbosity represents better quality or unnecessary bloat depends on your perspective and codebase standards.

On quality metrics, the data shows mixed signals. Bug resolution rates have increased, suggesting AI helps teams address technical debt faster. However, PR revert rates have crept up by 7-11% among high-adoption teams compared to baseline. Arcolano noted this increase is modest and may reflect different workflows rather than fundamental quality problems.

"If I'm autodetecting bugs and I know that my team can move a good amount faster, if I can eat a 10% quality drop only as measured in revert or whatever, or let's call them follow-on changes, if my observability and my alerting are good and my response to bugs, I might not mind this," Luke observed. The trade-off between velocity and perfection varies by context - what's acceptable for a SaaS product differs dramatically from medical device software.

Code architecture: The hidden variable

The most important finding may be the least obvious: your code architecture dramatically affects whether AI tools help or hinder productivity. Arcolano introduced a metric called "active repos per engineer" - how many repositories an engineer typically pushes code to in a given week.

When the dataset was divided into quartiles based on this metric, the productivity results diverged sharply:

Centralized architectures (engineers working in 0-1 repos per week): Approximately 4x increase in PR throughput with AI adoption
Balanced architectures: Similar strong gains around 4x
Distributed architectures: Moderate gains around 2x
Highly distributed architectures (engineers regularly working across many repos): Essentially no productivity gain, possibly slight decline

"This is a context engineering problem," Arcolano explained. "It's often hard for people and your best engineers have a lot of tribal knowledge, things that are expert, hard-won lessons about everything you have to do and worry about to make things happen in the real world. It's often not written down."

In highly distributed codebases - whether from microservices sprawl, acquisitions, or organic growth - AI tools struggle because they lack the cross-repository context that experienced engineers carry in their heads. Even with expanding context windows in newer models, the fundamental challenge of helping AI understand complex system interactions across multiple repositories remains unsolved.

Practical implications for platform teams

These findings carry direct implications for platform engineering teams evaluating or scaling AI adoption:

Investment in context engineering matters more than tool selection. If your architecture is highly distributed, simply mandating AI tool usage won't deliver results. You must either consolidate your architecture or invest heavily in documentation, service catalogs, and other context-providing infrastructure that helps both humans and AI understand system relationships.

Autonomous agents require infrastructure changes. The companies seeing rapid autonomous agent adoption have moved significant rocks around security, data access, and sandboxing. These aren't trivial changes - they require deliberate investment and cross-functional coordination.

Standardization enables skill building. The shift from experimentation to consolidation on specific tools allows teams to develop deeper expertise and share knowledge. Platform teams can accelerate this by providing clear guidance on approved tools and usage patterns rather than leaving every engineer to figure it out independently.

Quality monitoring must evolve. Traditional metrics like revert rates may not capture the full picture as workflows change. Teams need to think carefully about what quality means in an AI-assisted world and instrument accordingly.

If you enjoyed this, find here more great insights and events from our Platform Engineering Community.

For more comprehensive guidance, check out the Certified Architect Course and learn best practices from industry experts.

Key takeaways

AI adoption has reached critical mass - with 63% median weekly usage and autonomous agents handling up to 10% of PRs at leading companies, AI-assisted development is no longer experimental but operational reality for most engineering organizations.
Productivity gains are real but uneven - teams see approximately 2x throughput increases and 24% faster cycle times on average, but these benefits depend heavily on code architecture and how teams direct AI capabilities toward valuable work versus safe experimentation.
Code architecture is the hidden bottleneck - centralized and balanced architectures see 4x productivity gains while highly distributed codebases show minimal benefit, making context engineering and architectural consolidation critical prerequisites for AI success.
Implementation requires deliberate investment - moving beyond experimentation to accountable outcomes demands infrastructure changes for autonomous agents, tool standardization for skill building, and evolved quality metrics that reflect new workflows rather than traditional patterns.

This event is exclusive. Reserve your spot now.

Subscribe to Platform Weekly