Wed, Dec 17, 2025Will Hackett

Moore's Law for AI is officially dead

A hand holding a smartphone displaying a folder of AI apps including ChatGPT, DeepSeek, Claude, Mistral AI, Gemini, Copilot, and Poe. Photo by Solen Feyissa.

Google released Gemini 3 Flash today. It's their new default model in the Gemini app globally, replacing Gemini 2.5 Flash.1 The benchmarks look impressive—outperforming Gemini 2.5 Pro while being three times faster.

But there's a catch. It's more expensive.

Input tokens jumped from $0.30 to $0.50 per million. Output tokens went from $2.50 to $3.00. That's a 67% increase on input and 20% on output.

The Hacker News comments are already on it:

They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output.2

We've been told for years that AI costs are falling. Moore's Law for inference. Race to the bottom. But is that actually what's happening?

Is this happening across the board?

Short answer: it depends on which models you're looking at.

I spent the last few days pulling together historical pricing data from all the major providers—OpenAI, Anthropic, Google, xAI, Mistral, DeepSeek and Cohere.3 The picture that emerges is more nuanced than the "AI is getting cheaper" narrative suggests.

What we're witnessing is a K-shaped recovery in AI pricing.

For those unfamiliar with the term, a "K-shaped" trend occurs when different parts of the market move in opposite directions—one arm of the 'K' shoots upward while the other plunges. In AI, the cost of raw intelligence is splitting: while some commodity players are still racing to the bottom, the models we actually rely on for serious work are becoming increasingly expensive. Though honestly, it's less like a K and more like a <—a crocodile with its mouth wide open, about to eat an AI founder.

Let me show you what I found.

Frontier model pricing over time

First, frontier models. These are the best each provider has to offer—the research leaders pushing the boundaries of compute. These are what you'd use for complex reasoning, agentic workflows and tasks where quality matters more than cost.

Input pricing ($/1M tokens)

Source: Pricing data compiled from provider announcements

Output pricing ($/1M tokens)

Source: Pricing data compiled from provider announcements

The frontier story is messy. The floor is rising. While we celebrated the 80% price drops of 2024, the 2025 frontier models are coming in at a premium. We aren't just paying for more intelligence; we're paying for the massive R&D and energy bills that finally came due.

OpenAI dropped GPT-4 prices by 83% over 18 months,4 but the trend has reversed with GPT-5 and specialised units like o1-pro. Anthropic held Claude Opus at $15/$75 for nearly two years before cutting Opus 4.5 by 67%.5 Google slashed Gemini 1.5 Pro in late 2024, only to hike Gemini 3 Pro prices back up.

The only thing keeping the "race to the bottom" narrative alive? DeepSeek. Their V3 at $0.28/$1.10 is 18x cheaper than GPT-5 and 54x cheaper than Claude Opus was at its peak. They're the spoiler—proof that efficiency gains are possible, but also an outlier the Western labs haven't matched.

Efficiency tier pricing over time

Efficiency models—the high-volume production workhorses—show an even starker divergence.

Input pricing ($/1M tokens)

Source: Pricing data compiled from provider announcements

Output pricing ($/1M tokens)

Source: Pricing data compiled from provider announcements

Here's where things get interesting.

OpenAI's fast tier dropped 92% from GPT-3.5 Turbo ($2/$2) to GPT-4o mini ($0.15/$0.60). That's the narrative we've been told.

But look at Anthropic. Claude 3 Haiku launched at $0.25/$1.25 in March 2024. Then Claude 3.5 Haiku arrived in November 2024 at $1.00/$5.00.6 That's a 4x price increase for the "budget" model.

And Google? Gemini 1.5 Flash dropped to $0.075/$0.30 in August 2024. Today's Gemini 3 Flash? $0.50/$3.00. That's a 6.7x increase on input and 10x on output.

The efficiency tier trend isn't consistently down. It depends entirely on which provider you're using.

Does this correlate to actual improvements?

Fair question. Maybe prices are going up because the models are getting dramatically better?

MMLU scores over time

Source: Stanford HAI AI Index, provider announcements

HumanEval scores over time (coding)

Source: HumanEval benchmark results from provider announcements

The benchmarks tell a nuanced story.

MMLU scores have clustered around 88-92% for the top models.7 GPT-4 hit 86.4% in March 2023. GPT-5 hits 91.4% today. That's 5.8 percentage points over nearly three years. The gap between providers has narrowed to within a few points.

HumanEval shows more dramatic gains—from 67% to 92%+ for the leaders. But again, the top models are converging.

Here's the uncomfortable truth: benchmark improvements have slowed while some prices are increasing. The price/performance curve isn't as favourable as it was in 2023-2024.

What's driving these price increases?

We've hit an inference efficiency ceiling.

We've optimised the low-hanging fruit—KV caching, quantization, architectural tweaks. To get the next jump in reasoning, we're now throwing raw compute at the problem during inference via "chain of thought" internal iterations. You can't optimise your way out of a model that has to "think" for 10 seconds before it answers.

Other factors:

  • Energy costs. Data centres are competing for limited power capacity globally. Google, Microsoft and Amazon are signing long-term power purchase agreements worth billions.
  • Capital returns. Hyperscalers spending $50-100B annually on infrastructure now need to demonstrate profitability to investors. The "land grab" phase is over.
  • Data licensing. The era of free training data is ending. Companies are paying millions for high-quality human text.

The downstream effect: "Unlimited" is dead

If upstream model prices are increasing, what happens to the products built on top of them?

We're seeing a universal shift to usage-based pricing.

Cursor moved to usage-based credits in June 2025.9 When Claude Opus 4 stayed at $15/$75, Cursor's flat-rate "Pro" plan became an act of charity. The move wasn't a choice; it was a survival tactic.10 Heavy users now pay $100-200/month.

GitHub Copilot introduced "premium requests" and began enforcing caps in June 2025.11 On December 2, 2025, they removed the legacy $0 budgets that many enterprise accounts relied on to block overage charges.12 If you haven't reviewed your Copilot billing settings recently, you might be in for a surprise.

Lovable switched to complexity-based deductions in July 2025.13 Their documentation now warns: "If you rely heavily on Lovable's cloud and AI usage, you could burn through credits far faster than you planned."

The pattern is clear. Every major AI-powered developer tool has moved—or is moving—from "unlimited" fixed-price plans to consumption-based models. This isn't greed. It's the underlying economics catching up.

What about quantum?

There's a certain irony that Google—the company that just hiked Gemini Flash prices by 10x—might also hold the key to making AI affordable again.

Their Willow quantum processor and "Quantum Echoes" algorithm claim performance 13,000x faster than supercomputers.1415 The promise is that quantum hardware could revolutionise the matrix multiplications that make neural network inference so expensive.

But let's be real.

Google is asking us to bet on a multi-universe quantum future to solve a pricing problem they created in the present.18 It's a classic diversion. Until I can pipe Willow into a production API at $0.01/1M tokens, it's just physics-flavoured PR.

This is the company that's killed 293 products.17 Even if quantum inference becomes viable, their track record suggests they'd find a way to make it expensive, lock it into their cloud, or abandon it entirely.

The AI pricing landscape is no longer a simple story of costs falling. It's a K-shaped bifurcation where frontier intelligence and reliable workhorses are getting more expensive. If you're not tracking your AI spend at a granular level, you're in for a surprise when the bill arrives.

If this resonates with you, it might be a time for a conversation with flowstate.


An awful lot of research was done with the help of the Internet Archive.For quite a bit of this post, I've had to browse older versions of websites. If you're a fan of the Internet Archive's mission of preserving digital history, consider supporting them with a donation..

About the Author

Will Hackett

I'm Will Hackett, CTO at flowstate™. I'm a technology leader who's led engineering teams across startups and larger organisations. Previously I co-founded Pragmatic an AI company, built product at Pactio and led engineering teams at Blinq and Linktree. I'm passionate about distributed systems, product engineering and helping teams ship great software.