Claude Sonnet and Opus 4 (Executive Summary)
Frank Fiegel

Frank Fiegel @punkpeye

About: I make cool shit because I am a cool kid.

Joined:
Jun 27, 2022

Claude Sonnet and Opus 4 (Executive Summary)

Publish Date: May 22
13 1

Anthropic released Claude Opus 4 and Sonnet 4 today, claiming the #1 spot for coding performance. There are going to be a lot of articles floating around with exaggerations and marketing talk, but here is an executive summary of everything you need to know.

Performance Numbers

Claude Opus 4:

  • SWE-bench: 72.5% (world's best)
  • Terminal-bench: 43.2%
  • Sustained performance for hours on complex tasks
  • $15/$75 per million tokens

Claude Sonnet 4:

  • SWE-bench: 72.7% (matches Opus 4)
  • 3x faster than Opus 4 for most tasks
  • $3/$15 per million tokens

Two key slides from the announcement:

Image description

Image description

Key Technical Features

  • Hybrid Architecture: Instant responses + extended thinking mode (up to 64K tokens)
  • Extended Thinking with Tools: Can use web search, code execution during reasoning
  • Parallel Tool Execution: Multiple tools simultaneously
  • Memory Files: Creates persistent memory when given file access
  • 65% Reduction: Less shortcut/loopholes behavior vs Sonnet 3.7

Industry Adoption

  • GitHub: Integrating Sonnet 4 into GitHub Copilot
  • Cursor: "State-of-the-art for coding"
  • Rakuten: Validated 7-hour autonomous refactor
  • Sourcegraph: "Substantial leap in software development"

New API Capabilities

4 new capabilities:

  • Code execution tool
  • MCP connector
  • Files API
  • Prompt caching (1 hour)

Claude Code Generally Available

  • VS Code and JetBrains extensions (beta)
  • GitHub Actions integration (demo)
  • Claude Code SDK for custom agents
  • GitHub PR integration via /install-github-app

Access

Already available via Anthropic API.

If you want to skip the new model restrictions, you can try it via Glama Gateway and OpenRouter.

So, is it hype?

Claude 4 models lead coding benchmarks and offer sustained performance for complex agent workflows. Opus 4 for maximum capability, Sonnet 4 for speed/cost balance. Both already available to test.

Source: Official Announcement

Will update this article to add interesting insights and facts as the day progresses.

Comments 1 total

  • Dotallio
    DotallioMay 23, 2025

    Super helpful roundup, thank you! Has anyone tried Sonnet 4 in VS Code yet for real projects?

Add comment