Everyone's Sleeping on GPT-5 Mini (And It's the Only Model That Actually Matters)

OpenAI dropped GPT-5 last week and they made it a big deal. Sam Altman's cryptic tweets, influencers calling it "revolutionary," the whole show. The hype was insane.

Then the models launched and... nothing. GPT-5 feels like o3 with tiny improvements. Maybe 5% better at general tasks. Everyone agreed: overhyped, underwhelming, barely different.

But while everyone's busy complaining about the main model, they're completely missing the real story. GPT-5 Mini is quietly killing it for actual production use, and nobody's talking about it.

The Numbers That Made Me Look Twice

Here's what caught my attention:

$0.25 per million input tokens (that's 5x cheaper than Gemini 2.5 Pro)
$2 per million output tokens (also 5x cheaper than Gemini 2.5 Pro)
400,000 token context window
Benchmarks really close to Gemini 2.5 Pro while being 5x cheaper

That last point sounds like marketing fluff until you realize GPT-5 Mini with high reasoning effort actually beats Gemini 2.5 Pro on several benchmarks. Not that benchmarks mean much, but still.

Why This Actually Matters: Building a Changelog Agent

I've been building an AI agent at UserJot that helps teams write changelogs. The idea is simple: chat with an agent that has access to your feedback, roadmap, and closed tickets. It pulls together what shipped, helps you write the update, and schedules it for publishing.

The workflow looks like this:

"What tickets did we close since last changelog?"
"These are great. Now write the changelog"
"Add me and the CTO as authors"
"Schedule for Monday 2pm"

That's it.

I've tried almost every model on OpenRouter, both open source and proprietary. Claude Opus works beautifully but it's too expensive. GPT-4o Mini worked okay but wasn't great. o4-mini? Also okay, not great. I spent a lot of time mixing and matching different models for different parts of the workflow.

Then I tried GPT-5 Mini with medium reasoning effort and it just works beautifully.

Why Mini Works So Well for Production

Here's what makes Mini different - it's really good at the things that matter for production AI agents. Not for writing poetry or medical advice, but for actual tool-based workflows:

Really good at tool calling. I'm getting 95%+ success rate on complex multi-step workflows. It picks the right tools and formats parameters correctly almost every time.

High success with structured data output. When I ask for JSON, I get valid JSON. No random markdown, no extra text, just the data structure I need.

Follows instructions really well. Give it a system prompt with 10 rules and it follows all 10. This consistency is huge for production systems.

Handles long context without forgetting. At 400k tokens, it can hold your entire codebase and documentation. And it actually remembers what you told it 50k tokens ago.

The Catch (Because There's Always One)

Throughput is okay right now. I'm seeing 60-70 tokens per second, which could be better. OpenAI's probably still scaling up their infrastructure for these new models.

But for async workflows? For background agents? For anything where you can wait a few seconds? It's perfect.

The Real Problem: Overhype Kills Good Products

If OpenAI hadn't hyped this so much, people would be happy with the incremental improvements. GPT-5 is better than o3. GPT-5 Mini is amazing for its price point. But when you set expectations sky-high, even good updates feel disappointing.

Classic case of over-promise, under-deliver.

If you're building agents or need reliable tool calling, give GPT-5 Mini a try. Set the reasoning effort to medium or high. You might be surprised.

Building something with GPT-5 Mini? I'm curious what you're working on. Drop a comment below or find me on Twitter. Also check out UserJot if you need a better way to manage feedback and changelogs.

Shayan @shayy