OpenAI o3 - Thinking Fast and Slow
Maxim Saplin

Maxim Saplin @maximsaplin

About: ツ Manager, Engineer, Open-source Maintainer

Joined:
Oct 12, 2019

OpenAI o3 - Thinking Fast and Slow

Publish Date: Dec 20 '24
203 10

OpenAI has teased the o3 model today—a further development of the "reasoning" model and a successor to o1.

I was impressed by how much it improved on the ARC-AGI-1 benchmark - a supposedly unbeatable benchmark by the current generation of LLMs. o1's high-score was 32% while o3 jumped right at 88%. The authors of the Arc Challenge ($1mil reward for beating ARC-AGI) were quite confident that transformer-based models won't be successful in their benchmark - they were not impressed with o1. Yet, the o3 blog post is a completely different sentiment with words such as "surprising", "novel" and "breakthrough". Yet there's a catch - it's very, very expensive: scoring 76% cost around $9k and 88% - OpenAI didn't disclose (one can evaluate the total cost to be at $1.5mil given the statement that 172x more compute was used).

o3 has reminded me of an analogy often mentioned when discussing LLMs. No matter the complexity of the task GPTs use the same amount of compute/time per token as if they are streaming info from their subconscious without ever stopping to think. This is similar to how the "Fast" System 1 of the human brain operates.

A quick recap, "Thinking Fast and Slow" is a 2011 book by Daniel Kahneman. He argues that functionally (based on empirical research) our brain has 2 departments (or modes):

  • System 1, Fast - effortless, autonomous, associative.
  • System 2, Slow - effortful, deliberate, logical.

The 2 systems work together and shape humans' thinking processes. We can read a book out loud without any stress, yet we might not remember a single word. We can read the book and be focused, constantly replaying the scenes and pictures in our mind, keeping track of events and timelines, and be exhausted after a short period—yet we might acquire new knowledge.

As Andrew Ng once noted, "Try typing a text without ever hitting backspace" - seems like a hard task, and that is how LLMs work.

Well, that's how they worked until recently. When o1 (and later Deepseek R1, QwQ, Gemini 2.0 Flash Thinking) appeared the models learned to make a break and operate in a mode similar to the "Slow" system.

Recently there has been a lot of talk of LLM pre-training plateauing, exhausting training data, AI development hitting a wall.

We might be seeing a forming trend on what comes in 2025 - combining reasoning/thinking models with traditional LLMs, interconnecting them as Slow and Fast minds: planning (Slow) and taking action (Fast), identifying (Fast) and evaluating (Slow) etc.

Here's one of the recent examples from Aider AI coding assistant which shows how combining QwQ as Architect and Qwen 2.5 as a Coder (there's a 2-step "architect-code" mode allowing to choose different models for each of the steps) increases coding performance.

Whether it will play out - it's hard to say. There are plenty of challenges that we haven't seen a lot of progress lately, even with Slow models. It's unclear how models such as o3 will be tolerant to hallucinations. The context window is still too small. The prices are going up... The slow models, while they hit the next levels of different "isolated" evals, are far from practical application at scale (doing large projects on their own OR simulating a junior intern). Additionally the Fast models, the actors, it doesn't seem they have shown progress in computer use and Moravec's paradox is still a challenge when it comes to automating a computer clerk.

P.S.>

About the same time when o3 was announced I received API access to o1-mini. I ran my own LLM Chess Eval that simulated chess games prompting models to play against a random player. While the previous SOTA models couldn't score even a single win (and I assumed the benchmark is as hard as the ARC eval)... o1-mini won 30% of the time! Now I am less skeptical, after all there might be some reasoning.

Comments 10 total

  • АнонимDec 22, 2024

    [hidden by post author]

    • Andre Du Plessis
      Andre Du PlessisDec 26, 2024

      [Daniel[(@dansasser) , will you be posting your article "The New Frontier ofAI" here on DEV, or on another platform?

      • Daniel T Sasser II
        Daniel T Sasser IIDec 26, 2024

        @andre_adpc I actually posted it here yesterday under a different title called The AI of Christmas Future. Just click this link to get to it and it's also part of the same series as my other AI articles. I thought the name was a little catchier since I released it as a special Christmas edition 😁 There are three others in this series so far. Each of them go into detail on different aspects of the subject.

  • Valeria
    ValeriaDec 23, 2024

    I don’t want a smarter and more logical AI. I want an affordable robot that can fold my laundry and pick up the mess, so that I can spend more time being logical and smart. Human brain is clearly a much more efficient tool for the latter, such a waste using it to do laundry or dishes 😎

  • АнонимDec 26, 2024

    [hidden by post author]

  • sinni800
    sinni800Dec 28, 2024

    It still doesnt think, it's still a language model, not a thinking model, this is where OpenAi keeps trying to bamboozle all of us. As long as ClosedAI isn't actually fundamentally changing the way the models work, there's a ceiling that we just simply won't extend past. They had to generate answers for so many compute times, basically brute forcing an answer until it was good.

    There is no thinking involved, period.

    • Maxim Saplin
      Maxim SaplinDec 28, 2024

      Bruteforcing assumes a loop where with every iteration you verify the candidate. I assume that with the arc benchmark there's no such option as crunching and validating zillions (random) answers for a given task. But rather give the model enough time/compute before giving a single (idealy correct) answer to a given problem.

      And o1/o3 doesn't seem to be an auto regressive language model that is traditionally assumed.

  • Robert Liam
    Robert LiamJan 2, 2025

    After spending countless hours trying to recover my lost Bitcoin following a failed attempt to restore my wallet, I was about to give up. Then, a friend recommended Dexdert Net Recovery, and I’m so glad they did. From the first consultation, it was clear that these guys were not your typical recovery service. They have an in-depth understanding of blockchain and the latest recovery technologies, and they were able to offer me a clear, realistic timeline. They provided a step-by-step plan, gave me constant updates, and most importantly, they successfully recovered my lost Bitcoin. Their professionalism and technical expertise are unparalleled in the industry. If you're looking for reliable and trustworthy experts in Bitcoin recovery, look no further than Dexdert Net Recovery. 
    Email:(DexdertNetPro@mail.com)
    Telegram:(@Dexdertprorecovery)

  •  FounderBrief
    FounderBrief Jan 4, 2025

    Good Job

Add comment