Grok 4 crushes the competition
FROM THE FRONTIER
Grok 4 crushes the competition
Grok 4 sweeps the competition on Humanity’s Last Exam. Source: xAI
| Grok 4 sweeps the competition on Humanity’s Last Exam. Source: xAI | xAI’s new 200,000-chip supercomputer is putting in the work. The Elon Musk-led startup just unveiled Grok 4, a model that supposedly beats current leaders — o3 Pro, Claude 4 Opus, and Gemini 2.5 Pro included — across multiple benchmarks. What’s it capable of?
On Humanity’s Last Exam, a test designed to be especially tricky for LLMs, Grok 4 scored an impressive 41% (with tools), beating the previous record-holder, o3 Pro, by around 15 points.
But with multiple agents running at once (with each sharing their notes), Grok 4 can solve more than half of the exam’s text-based questions and 44.4% of the full dataset.
It achieves state-of-the-art results on the AIME 25, GPQA, and other leading benchmarks, and achieves 15.9% on ARC-AGI — double what Claude 4 Opus scored.
It can get a near-perfect score on the GRE every time, even on questions it was never trained on, according to Musk.
Grok’s voice capabilities have also gotten a major overhaul with new voices and half the latency.
⠀You can try Grok Heavy today, but it’ll cost you: A new subscription tier called SuperGrok Heavy will give you access to both models, higher rate limits, and early access to features for $300/month. Even before the release, xAI was already making headlines — but for very different reasons. The old version of Grok suddenly started spewing harmful content and hate speech this week, and at one point, xAI allegedly even shut it down on X. Complicating the story, X CEO Linda Yaccarino recently announced she’s stepping down after two years with the company. Only time will tell if Grok 4’s impressive performance will be enough to bring on new users, especially with the holy grail of AI — OpenAI’s GPT-5 — slated for release later this summer.