🤖 Comparing OpenAI GPT 4.5 and Claude 3.7 Sonnet on Coding 🚀
Shrijal Acharya

Shrijal Acharya @shricodev

About: Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration

Location:
Kathmandu, Nepal
Joined:
Jul 26, 2023

🤖 Comparing OpenAI GPT 4.5 and Claude 3.7 Sonnet on Coding 🚀

Publish Date: Mar 6
76 33

It is said that Claude 3.7 completely crushes our newest and costliest OpenAI model, GPT-4.5. But hey, I don't trust these benchmarks until I test them myself.

Sus GIF

So, I ran my own tests on three Web Development coding questions.

Let's see how these two models compare against each other in coding. 🤨

TL;DR

If you want to skip straight to the result, Claude 3.7 Sonnet dominates GPT-4.5 in coding. GPT-4.5 is not even close (kinda sucks!) even after being about 10x costlier than Claude 3.7 Sonnet.

Claude 3.7 Sonnet vs. GPT-4.5 SWE Benchmark

And yeah, that’s fair. Claude 3.7 Sonnet is built for coding, while GPT-4.5 is mainly for writing and designing.

I've recently dropped a coding comparison post on Claude 3.7 vs. Grok 3 vs. OpenAI o3-mini-high. If you're interested in how Claude 3.7 performed here, check it out. 👇


Brief on the GPT-4.5 Model

OpenAI on Thursday released an early version of GPT-4.5, a new version of its flagship large language model. The team claims it to be the "biggest and their best model," which feels like talking to a native human.

Sam Altman claim on GPT-4.5

And NO, this is not a reasoning model, as stated by OpenAI CEO Sam Altman himself.

Sam Altman claim that the GPT-4.5 model is not a reasoning model

This seems to be true, as compared to other models like Claude 3.7 Sonnet and the earlier GPT-4o models on coding, the percentage accuracy appears to be significantly lower.

Coding benchmark between GPT-4.5 and other AI models

When it comes to pricing, this is OpenAI's most expensive AI model, with $75 per million input token and $150 per million output token. 😮‍💨 You can compare the pricing of this model to some of their earlier models side by side:

GPT-4.5 pricing

Currently, people with a $200-a-month ChatGPT Pro account can try out GPT-4.5 today. OpenAI says it will begin rolling out to Plus users next week.

OpenAI did not disclose the size of their new model, but they mentioned that the scale increase from GPT-4o to GPT-4.5 is similar to the jump from GPT-3.5 to GPT-4o.

What makes it super expensive?

Unlike other reasoning models like o1 and o3-mini, which work through the answer step by step, normal large language models like GPT-4.5 spit out the first response they come up with.

OpenAI's answer to why the GPT-4.5 model is super expensive than other models available

In a general-knowledge quiz developed by OpenAI last year called SimpleQA, which includes questions on everything, models like GPT-4o scored 38.2%, o3-mini scored 15%, while GPT-4.5 scored a whopping 62.5%. 🤯

SimpleQA accuracy of different OpenAI models including GPT-4.5

OpenAI claims that GPT-4.5 comes up with far fewer made-up answers, which is also referred to as hallucination in AI terms.

Along with that, it has enhanced contextual knowledge and writing skills, which is the main reason why the model's output sounds more natural with less unnecessary reasoning.

In the same test conducted, the GPT-4.5 model came up with made-up answers 37.1% of the time, compared with 61.8% for GPT-4o and 80.3% for o3-mini.

SimpleQA hallucination test of different OpenAI models, including GPT-4.5


Coding Comparison

💁 As I've said it earlier, we will mainly be comparing the two models on frontend questions.

1. Masonry Grid Image Gallery

Prompt: Build a Next.js image gallery with a masonry grid, infinite scrolling, and a search bar for filtering images by keywords. Style it like Unsplash with a clean, modern UI. Optimize image loading. Place all code in page.tsx.

Response from Claude 3.7 Sonnet

You can find the code it generated here: Link

Here's the output of the program:

The output from Claude is pure insanity. Everything is just so perfectly implemented.

I could only notice one small issue, and that is that the footer does not stick to the bottom.

Response from GPT-4.5

You can find the code it generated here: Link

Here's the output of the program:

The output from GPT-4.5 is not what I expected. I mean, it's kind of smart that it didn't use any npm modules like @tanstack/react-query, but clearly, the Masonry Grid layout is missing, and the way infinite scrolling is implemented feels a bit more DIY.

Can't complain much, but it is no way near the Claude 3.7 generated code.

Final Verdict: No doubt, the Claude 3.7 Sonnet output is far superior. ✅ It has implemented everything correctly, from the Masonry Grid layout to perfect infinite scrolling using the @tanstack/react-query library. There is still a lot missing in the GPT-4.5 output.

2. Typing Speed Test

Let's test both models by asking them to build a Typing Speed Test app similar to Monkeytype. And not to mention, I can get to flex my typing speed. 😉 (just kidding)

Prompt: Build a Next.js basic typing test app. Users type a given sentence, with mistakes highlighted in red, allowing corrections. Display real-time typing speed, both raw (with mistakes) and adjusted (without mistakes). Once the user types to the end, the test should be over. Place all code in page.tsx.

Response from Claude 3.7 Sonnet

You can find the code it generated here: Link

Here's the output of the program:

WOAH, it just feels illegal to use this model for coding. How good is this? I have no words to say. 🤯

In no time, with everything implemented correctly, it built this entire typing test site with more than what I asked. It even added the accuracy display as well.

Response from GPT-4.5

You can find the code it generated here: Link

Here's the output of the program:

GPT-4.5 got this one correct as well, but there's one small issue with the code it generated. Once the user reaches the end, the test is supposed to end, but it doesn't unless the user goes back and fixes it.

Final Verdict: There's one minor issue with the generated code response from GPT-4.5, but fair to say both models got it correct. ✅

3. Collaborative Real-time Whiteboard

💁 This one's pretty tough, and I am not sure if Claude 3.7 will also get this correct. It requires setting up a separate web-socket server and listening on the connections.

Prompt: Build a real-time collaborative whiteboard in Next.js with Tailwind for styling. Multiple users should be able to draw and see updates instantly. But, when a user clears their canvas, the other user's canvas should not be cleared.

Response from Claude 3.7 Sonnet

You can find the code it generated here: Link

Here's the output of the program:

Okay, so now I see some junior developers getting replaced by AI pretty soon. 🤐

For me, it would take pretty long to code this up. I am starting to see why this model is called a beast when it comes to coding. Just perfection!

Response from GPT-4.5

You can find the code it generated here: Link

Here's the output of the program:

GPT-4.5 failed badly here. The websocket connection was established, but there was an issue parsing the data received from the websocket connection on the client.

Final Verdict: Claude 3.7 Sonnet just crushed this one as well. 🔥 The code it generated is perfect, and the output is exactly how I wanted. GPT-4.5 was able to establish the websocket connection but had an issue parsing the data. Even after I tried to iterate on its mistake, it couldn't really fix it.

Summary

You should be pretty clear on what the results are here. 😮‍💨 Claude 3.7 won by a huge margin, and hey, again I'm going to say that this comparison is not fair on GPT-4.5 as it is not trained to be good at coding. But at least it got the first two problems working, even though it was not perfect.


When to use GPT-4.5 model?

Now that we have a general understanding of this model's abilities, let's take a look at situations where you'd want to prefer this model over anything else. 🤔

All in all, GPT-4.5 is not a model you can rely on for reasoning tasks. GPT-4.5 has a better understanding of what humans mean and can interpret subtle cues. It's designed to be better at conversations, design, and writing, adding that bit of human touch.

When you need a use case where you're super specific about writing or designing, then this model is the ideal choice.

GPT-4.5 ideal use case scenario

So, does it justify the pricing? If I had to say, definitely not. But it's up to you to decide whether you think it's worth your money. 🤷‍♂️

For anything else, it doesn't quite justify the pricing and may not be the best choice.


Conclusion

The result's pretty clear, and not to say, this is not a fair comparison. It's like we compared an experienced developer with someone who's not even a coder. 🥴

Tweet describing how better Claude 3.7 Sonnet model is at coding

But hey, the comparison is done to see how comparable GPT-4.5 is to Claude 3.7 Sonnet when it comes to coding.

Not just this comparison, but in all comparisons I've done, needless to say, even though we're using no-thinking Claude 3.7 Sonnet, it's just better and the only model you need for now when it comes to coding. 🔥

What do you think of this comparison? If you want me to compare some other models against each other, do let me know in the comments! 👇

Comments 33 total

  • Aavash Parajuli
    Aavash ParajuliMar 6, 2025

    Great comparison. @shricodev 🚀

  • Shrijal Acharya
    Shrijal AcharyaMar 6, 2025

    Have you tried the GPT-4.5 model yet? If so, what has been your experience working with it compared to other models? 🤔

  • Lara Stewart - DevOps Cloud Engineer
    Lara Stewart - DevOps Cloud EngineerMar 6, 2025

    Don't tell me Claude generated all of this in one shot?

    • Shrijal Acharya
      Shrijal AcharyaMar 6, 2025

      Sad to say this, but yes, both models generated this code in one shot. 🫠

      In the last question, when GPT-4.5 couldn't write the code, I tried to guide it to the correct answer, but it still couldn't handle it.

  • Saxo Hun
    Saxo HunMar 6, 2025

    Good comparison. 🫡 AI Camp?

  • Anmol Baranwal
    Anmol BaranwalMar 6, 2025

    Awesome Shrijal! So basically, if you had to choose just one model from all the available ones on the internet (mainly for coding) to build crazy SaaS apps... which one would it be according to you?

    • Shrijal Acharya
      Shrijal AcharyaMar 7, 2025

      Claude 3.7 all the way, Anmol! 🔥

      In all of my coding comparisons, even with Grok 3 and o3-mini-high, it just dominates so badly. For anything that requires writing code, Claude is the way for me.

  • Benny Schuetz
    Benny SchuetzMar 6, 2025

    Love your practical comparision. Much better than those boring benchmarks graphs.

    And yes, awesome output from Claude!

    Did a quick test for case #2 with Grok3 and uploaded the result on x

    • Benny Schuetz
      Benny SchuetzMar 6, 2025

      Note: I slightly changed your prompt from a Next.js version to a Vanilla javascript version for simplicity reasons.

      • Shrijal Acharya
        Shrijal AcharyaMar 7, 2025

        Ah, that's much better. I decided to go with Next.js since that's what most folks prefer.

    • Shrijal Acharya
      Shrijal AcharyaMar 7, 2025

      Love your practical comparision. Much better than those boring benchmarks graphs.

      Woah, thank you very much, man! 🙌

      Love the output from Grok 3 you got. But, Claude 3.7 is just out of this world. 🔥

  • PEPS Ventures Berhad
    PEPS Ventures Berhad Mar 6, 2025

    Like it!

  • dogpxe
    dogpxeMar 7, 2025

    awsome

    • Shrijal Acharya
      Shrijal AcharyaMar 7, 2025

      Glad you liked it, buddy! ✌️

      • dogpxe
        dogpxeMar 7, 2025

        Can i translate your article to brazilian portuguese?

  • Rachit Gupta
    Rachit GuptaMar 7, 2025

    loved it

  • Andre Kaufmann
    Andre KaufmannMar 7, 2025

    Claude is great, but still I think Chat Gpt o3-mini-high is slightly better in coding. Chat Gpt 4.5 has better conversational abilities but is not that good in coding as the o3.models
    IIRC Chat Gpt 5.0 will combine coding models with conversational ones. Anyways I agree that Claude 3.7 extended is quite good

  • nadeem zia
    nadeem ziaMar 7, 2025

    The information is provided is amazing, keep it up

  • EchoteDev
    EchoteDevMar 7, 2025

    For new web developers currently studying, the job market might seem daunting with limited opportunities. How do you think this could impact the prospects for junior developers? Is it still worthwhile to pursue this field of study?

    • Shrijal Acharya
      Shrijal AcharyaMar 8, 2025

      This whole AI thing is nothing to worry about if you can bring engineering value to your team. When I started out, I never considered myself a "junior developer" at any time. Just keep hustling, and you'll be safe, my friend.

      Another small suggestion: instead of focusing entirely on web development, see if you have an interest elsewhere, like DevOps, which could be an option, or ML? That could be a thing as well. Give it a shot once because the whole web industry is saturated as hell. Not to say that web dev is not a good option from now on, but it doesn't hurt to try something else.

      • EchoteDev
        EchoteDevMar 8, 2025

        Thanks for the suggestion, yes I will definitely learn other related skills, AI development is an environment I really like.

    • Marcos DeVille
      Marcos DeVilleMar 10, 2025

      I am honestly surprised by this, there has never been more opportunity. AI is a major improvement to developers lifecycles and as such is a great aid, it will not replace engineers building complex projects. Learn how to use, implement and create AI in your projects.

  • Wendy Taylor
    Wendy TaylorMar 10, 2025

    My name is Wendy Taylor, I'm from Los Angeles, i want to announce to you Viewer how Capital Crypto Recover help me to restore my Lost Bitcoin, I invested with a Crypto broker without proper research to know what I was hoarding my hard-earned money into scammers, i lost access to my crypto wallet or had your funds stolen? Don’t worry Capital Crypto Recover is here to help you recover your cryptocurrency with cutting-edge technical expertise, With years of experience in the crypto world, Capital Crypto Recover employs the best latest tools and ethical hacking techniques to help you recover lost assets, unlock hacked accounts, Whether it’s a forgotten password, Capital Crypto Recover has the expertise to help you get your crypto back. a security company service that has a 100% success rate in the recovery of crypto assets, i lost wallet and hacked accounts. I provided them the information they requested and they began their investigation. To my surprise, Capital Crypto Recover was able to trace and recover my crypto assets successfully within 24hours. Thank you for your service in helping me recover my $647,734 worth of crypto funds and I highly recommend their recovery services, they are reliable and a trusted company to any individuals looking to recover lost money. Contact email Capitalcryptorecover@zohomail.com OR Telegram @Capitalcryptorecover Call/Text Number +1 (336)390-6684 his contact: Recovercapital@cyberservices.com

  • Le Vuong
    Le VuongMar 13, 2025

    Great job. Thank you @shricodev

Add comment