Claude Sonnet and Opus 4 (Executive Summary)

Anthropic released Claude Opus 4 and Sonnet 4 today, claiming the #1 spot for coding performance. There are going to be a lot of articles floating around with exaggerations and marketing talk, but here is an executive summary of everything you need to know.

Performance Numbers

Claude Opus 4:

SWE-bench: 72.5% (world's best)
Terminal-bench: 43.2%
Sustained performance for hours on complex tasks
$15/$75 per million tokens

Claude Sonnet 4:

SWE-bench: 72.7% (matches Opus 4)
3x faster than Opus 4 for most tasks
$3/$15 per million tokens

Two key slides from the announcement:

Key Technical Features

Hybrid Architecture: Instant responses + extended thinking mode (up to 64K tokens)
Extended Thinking with Tools: Can use web search, code execution during reasoning
Parallel Tool Execution: Multiple tools simultaneously
Memory Files: Creates persistent memory when given file access
65% Reduction: Less shortcut/loopholes behavior vs Sonnet 3.7

Industry Adoption

GitHub: Integrating Sonnet 4 into GitHub Copilot
Cursor: "State-of-the-art for coding"
Rakuten: Validated 7-hour autonomous refactor
Sourcegraph: "Substantial leap in software development"

New API Capabilities

4 new capabilities:

Code execution tool
MCP connector
Files API
Prompt caching (1 hour)

Claude Code Generally Available

VS Code and JetBrains extensions (beta)
GitHub Actions integration (demo)
Claude Code SDK for custom agents
GitHub PR integration via /install-github-app

Access

Already available via Anthropic API.

If you want to skip the new model restrictions, you can try it via Glama Gateway and OpenRouter.

So, is it hype?

Claude 4 models lead coding benchmarks and offer sustained performance for complex agent workflows. Opus 4 for maximum capability, Sonnet 4 for speed/cost balance. Both already available to test.

Source: Official Announcement

Will update this article to add interesting insights and facts as the day progresses.

Frank Fiegel @punkpeye