Something strange is happening in the world of software development. Developers — some of the most logical, systems-minded people on the planet — have developed what can only be described as a dependency. Take away their AI coding tools, and a growing number of them simply won’t work.
That’s not a metaphor. Researchers tried to run a study on it, and the developers refused to participate.
So what does it mean when the people building our digital world won’t build anything without an AI co-pilot? And more importantly — is the code they’re shipping actually any good?
The answers are messier than the tech industry wants to admit.
When Researchers Tried to Study AI Productivity, Developers Said No
Earlier this year, METR — a well-regarded AI safety and research organization — set out to run a follow-up study on how AI tools affect developer productivity. The plan was straightforward: measure how long developers take to complete tasks with and without AI assistance, then compare the results.
There was just one problem. The developers they approached wouldn’t agree to work without AI, even temporarily, even for science.
“They do not wish to work without AI,” METR’s researchers noted in their published findings — a line that probably deserves more attention than it got.
To salvage the research, METR switched gears and ran a self-reported survey instead. The results? Developers believe AI makes them roughly twice as productive. They feel more capable, more efficient, and more valuable to their teams.
The trouble is, feelings aren’t data. And other evidence is pushing back hard on that self-assessment.
The Tokenmaxxing Bubble — And How It Burst
If you haven’t heard the term “tokenmaxxing” yet, get used to it. It’s been one of the defining tech trends of 2026 — and it’s already unraveling.
Tokenmaxxing is the practice of measuring developer productivity by how many AI tokens they consume. The logic sounds reasonable on the surface: if someone is using AI heavily, they must be getting a lot done. Companies started tracking this internally, some even turning it into leaderboards and internal competitions.
Then reality showed up.
Amazon quietly shut down an internal token-tracking system called Kirorank after discovering that employees were gaming it — running AI agents on tasks not to get real work done, but to rack up token counts and look productive. The costs ballooned. The output didn’t.
Uber had a similar wake-up call. The company burned through its entire 2026 AI budget in just four months. Its COO, Andrew Macdonald, spoke openly about the situation, noting that the spending surge hadn’t translated into any measurable jump in completed projects or actual output.In both cases, the lesson was the same: using AI more doesn’t automatically mean producing more. It just means spending more.
The Hidden Cost Nobody Is Talking About
Here’s where the story gets genuinely concerning — not just for companies, but for the developers themselves.
Writing code is only half the job. The other half is maintaining it, fixing it, updating it, and keeping it from breaking as the systems around it change. And that second half never really ends.
James Shore, a programmer and writer who has spent years thinking deeply about software quality, made an argument recently that went viral in developer circles. His point was sharp and a little brutal: if AI lets you write code twice as fast but doubles the amount of buggy, hard-to-maintain code in your codebase, you haven’t won anything. You’ve just created a future problem that will cost you far more time than you saved.
“You write code twice as quick now?” he wrote. “Better hope you’ve halved your maintenance costs. Otherwise, you’re trading a temporary speed boost for permanent indenture.”

That’s not just a theory. The data is starting to back it up.
Entelligence AI, a startup focused on software reliability, reports that companies are devoting roughly 44% of their AI token usage to fixing bugs that the AI itself introduced. Meanwhile, CodeRabbit — a code-review platform — analyzed thousands of open source pull requests and found that AI-generated contributions produced about 1.7 times more issues than code written by humans.
To be fair, both of those companies sell products that help fix AI-generated code problems, so they have a stake in the narrative. But their findings align with independent academic research. Researchers from Singapore Management University published a paper this spring warning that AI-generated code poses real risks to long-term software health — particularly for projects that need to be maintained and updated over years, not just shipped once.
Why This Matters Beyond the Developer Community
It’s easy to read this as an internal tech industry debate. But the implications reach further.
Most of the software running modern life — banking apps, hospital records, logistics systems, the tools your company uses every day — is built and maintained by developers. If those developers are shipping code that’s faster to produce but slower to maintain and more prone to breakage, that’s a compounding problem.
The more AI-generated code accumulates in production systems, the more maintenance debt piles up. And maintenance debt, left unaddressed, tends to surface at the worst possible times — outages, security vulnerabilities, systems that can’t be updated without breaking something else.
This isn’t a hypothetical future risk. It’s a present one that’s growing with every pull request.
So What Should Developers Actually Do?
The researchers at Singapore Management University didn’t just flag the problem — they offered a framework for dealing with it, and it’s more human-centered than you might expect.
The core idea is that developers need to treat AI the way a senior engineer treats a very fast but inexperienced junior hire. You wouldn’t hand a junior developer a critical project and walk away. You’d review their work carefully, correct their mistakes, and make sure they understood the bigger picture before giving them more autonomy.
AI coding tools deserve the same treatment. That means knowing, specifically, which tasks AI handles reliably and which ones it tends to fumble. It means building quality assurance workflows designed specifically for AI output — not just the standard code review process, but something more rigorous. And it means keeping humans in the driver’s seat for high-stakes decisions: system architecture, security design, anything where a mistake isn’t just annoying but consequential.
Scott Wu, the CEO of Cognition and the developer behind Devin — one of the most sophisticated AI coding agents on the market — agrees with this framing. He’s openly acknowledged that Devin currently performs somewhere between a junior and mid-level developer, depending on the task. Useful, sure. But not a replacement for experienced human judgment.
That’s a grounded, honest take. And it’s the kind of perspective more companies should probably adopt before they let AI agents run loose in their production codebases.
The tokenmaxxing era taught the industry something valuable: raw AI usage is not a metric for anything meaningful. What matters is what gets built, how reliable it is, and how much it costs to keep running.
The companies that come out ahead over the next few years probably won’t be the ones that used AI the most aggressively. They’ll be the ones that figured out how to use it precisely — automating the right tasks, reviewing the output carefully, and preserving human expertise where it actually counts.
Developers aren’t going to stop using AI. That ship has sailed. But the industry is in the middle of a necessary correction: from the hype-fueled “AI does everything” phase toward something more sustainable, more disciplined, and ultimately more useful.
The question isn’t whether to use AI coding tools. It’s whether you’re using them in a way you’ll still be grateful for two years from now.
Frequently Asked Questions
Are AI coding tools actually making developers less productive?
It depends on how you measure it. AI tools genuinely speed up the process of writing initial code. But research suggests that the time saved upfront is often spent later on debugging, steering the AI, and fixing errors it introduced. The net productivity gain is smaller than most developers believe — and in some cases may be negative when long-term maintenance is factored in.
What is tokenmaxxing and why did it fail?
Tokenmaxxing is using the number of AI tokens consumed as a stand-in for productivity. It failed because it measures activity, not results. At companies like Amazon and Uber, employees found ways to maximize token usage without producing proportionally valuable work — and the costs spiraled without a matching increase in real output.
Should companies stop using AI for coding altogether?
No — but they should be more intentional about how they use it. The most effective approach treats AI as a fast but fallible junior developer: useful for generating drafts and handling repetitive tasks, but requiring careful human review, especially for anything that will live in a production system for a long time. High-level decisions about architecture, security, and system design should stay in human hands.
AI coding tools are here to stay. Developers love them, companies are betting on them, and there’s genuine value in what they offer. But 2026 is turning out to be the year the industry learns that faster isn’t always better — and that the real cost of AI-generated code isn’t always visible at the moment you hit commit.
The developers who thrive won’t be the ones who use AI the most. They’ll be the ones who use it the smartest.



