Can the new Claude AI 3.5 Sonnet model beat the ChatGPT-4o?

Can the new Claude AI 3.5 Sonnet model beat the ChatGPT-4o?


Anthropogenic, a leading AI research company founded by former OpenAI researchers, announced yesterday the launch of Cloud 3.5 Sonnet, the latest and most advanced model in the Claude AI family. This major update follows closely on the release of OpenAI's GPT-4o, which recently gained prominence in the LMSys chatbot arena.

The Cloud 3.5 Sonnet is positioned as a mid-range model, sitting between Haiku, a smaller model designed for efficiency tasks, and Opus, a higher-end model that powers the Anthropomorphic paid version, priced at $20 per month. Haiku and Opus are currently only offered in version 3.0, making the Sonnet 3.5 the best model in terms of capability, sophistication and efficiency.

Anthroponic says its new model outperforms the GPT-4o in all artificial parameters, especially when using multi-shot rapid techniques—essentially more than one example.

These synthetic parameters measure the performance of a model in different environments. It is possible to obtain a numerical value for a quality variable by setting up standard conditions and experiments. In other words, these metrics don't tell you which model is better at a given task, they tell you how much better the model is in the way you measure it.

Tokenmetrics

In terms of performance, the Anthropopic Cloud 3.5 Sonnet performs twice as fast as the previous top-of-the-line model, the Cloud 3 Opus, while delivering more power and costing just one-fifth. This makes it an ideal choice for complex tasks such as context-sensitive customer support and specialized tasks that require a lot of back-and-forth interaction with the model.

The creators say it shows a significant improvement in terms of stealth, humor and understanding of complex instructions compared to its predecessors.

1f044104447e9db6b22db3a06e45d114f50f274e 2200x1174 1

Claude 3.5 Sonnet offers advanced visual processing and comprehension skills. It is particularly adept at rendering imperfect images of charts, graphs and text, says Anthroponic. Now, the company's top model can understand the context of a view query instead of specifying objects. This puts it in direct competition with ChatGPT and Rekha in terms of multimodal capabilities.

For example, we presented Cloud with a map and asked him what we could do in that space. He understood that the map was of Chicago and gave us some great tips like using public transportation instead of taxis or visiting Wicker Park, Lincoln Park, and Hyde Park.

Captura de pantalla 2024 06 21 141420

The model offers advanced coding capabilities. Able to write, edit, and execute code autonomously with advanced reasoning and troubleshooting capabilities, such as Anthropoc-related tools. This feature is effective in streamlining developer workflows and speeding up coding tasks.

One new feature introduced with Claude 3.5 Sonnet is “Artifacts”. This allows users to view, edit, and build content that the cloud generates in real time. It integrates AI-generated results directly into projects and workflows, making it especially useful for interacting with code, and the cloud provides a more polished user interface than traditional chatbots like ChatGPT or Reka.

Anthroponic expects to release Haiku and Opus versions of Claude 3.5 later this year. If Sonnet can challenge GPT-4o, Opus could be a strong contender for future iterations of GPT, such as GPT-5.

Claude 3.5 Sonnet vs. ChatGPT-4o

Overall, both models have shown impressive capabilities, but how do they stack up against each other in different jobs? Let's examine their performance in coding, creative writing, and professional tasks.

Ease of use and accessibility

Cloud 3.5 Sonnet currently has some limitations in handling heavy user traffic and extended interactions. The free cloud version offers users a more limited experience, with less tokenization context and fewer available requests compared to the paid version. This is especially true if users are analyzing long documents or working with code.

The free version of ChatGPT provides users with more generous tokens and requests, allowing for longer and more complex interactions without the need for a paid upgrade. OpenAI also offers a “Plus” subscription, but it takes a long time to reach the limit before asking to upgrade.

Winner: ChatGPT wins this round. Its free version offers more capabilities and accessibility, making it ideal for users who don't want or can't afford to pay for premium AI services. Cloud's approach appears to be designed to encourage users to upgrade to a paid tier, which may be a deterrent for some users.

Coding skills

We tested the cloud's coding capabilities by asking both models to create a game. Instead of asking to reproduce previously known games that could be part of their training dataset, we proposed a game that measures the reaction time between two players.

Question: I want to create a game. Two players play on one computer. One controls the letter L and the other controls the letter A. We have a field divided into two with a line. Each player controls 50% of the field. The player controlling A controls the left half and the player controlling L controls the right half.

At a random instant, the line moves left or right. The player whose field is disappearing must press the button as quickly as possible to prevent the line from moving. When that's done, the line stays in place and players have to wait for the line to start moving to a random location at a random time.

The player who ends up controlling 0% of the screen loses and the game ends. Write it in Python or HTML5. It works better than you think.

Cloud 3.5 Sonnet was very good. Not only does it deliver the game as described, but it has taken the initiative to include a basic yet functional graphical interface with visual cues to make the game easy to understand.

Captura de pantalla 2024 06 21 133726

Cloud quickly completed this task in less than 10 seconds, showing improved coding capabilities.

ChatGPT was able to create the game by adhering to the given specifications. However, it took a long time to generate the task (nearly 45 seconds) and did not include additional features such as text hints to make the game easier to understand.

Also, the pace of the game is slow, which defeats the purpose of a response game, and the “game over” popup doesn't reveal who won.

Captura de pantalla 2024 06 21 133755

Winner: Cloud 3.5 Sonnet wins. Its ability to rapidly generate comprehensive and feature-rich code, including unwanted extras such as graphical interfaces, demonstrates advanced coding capabilities.

Also, the “Artifacts” feature turned out to be very handy, allowing you to test code within the chatbot interface without having to copy and paste the code into an external tool—which is exactly how ChatGPT works.

Captura de pantalla 2024 06 21 133322

Creative writing

We asked both models to create a fictional story based on a specific idea. We wanted to test how creative the models were, how rich and engaging their stories were, and how good they were for creative authors in general.

Accelerator:

Write a short story about Jose Lanz, A time traveler from the year 2150 who traveled 1000 years ago. Make sure your narrative is rich in clear descriptive language and that Jose's cultural background and physical characteristics are accurately portrayed, regardless of who you are. Choose to be.

The core of your story should be the paradox of time travel and the futility of trying to solve or change a problem in the hope of changing the current timeline. Emphasize that the future is real because it is the past. Although Jose intends to influence the events of the year 1000, the actions he takes are determined to happen because the year 1000 Because they are necessary for 2150 to exist as it is. The realization of this paradox is a crucial moment in the story.

Cloud 3.5 Sonnets develop a narrative that exhibits a natural flow of language and engaging structure. The AI ​​skillfully incorporated complex concepts like the time travel paradox, creating a rich and unique tale that took creative risks.

In his version, the main character tries to defend a mathematical theory that led to disastrous consequences in his time. After joining the research community and seemingly inhibiting the development of the concept, he found it to be a key component of the Time Paradox he created, even finding references to it in ancient texts.

Captura de pantalla 2024 06 21 125850

ChatGPT created a story that respected the given instructions but followed a more predictable path. While competent, the narrative lacks the depth and creativity seen in Cloud's story.

GPT-4o developed a straightforward story in which the protagonist tries to prevent an energy crisis by sharing advanced lessons with a past shaman. But, when he goes back in time, he finds that history has repeated itself and nothing has changed.

Captura de pantalla 2024 06 21 125258

WINNER: Claude wins for creative writing. His ability to produce imaginative, nuanced and well-structured narratives makes him unique, making him an excellent choice for tasks that require creativity.

For example, it's easy to imagine how joining a community can influence a group of researchers and prevent them from finding something. Instead, sharing advanced knowledge with chamans makes little sense to prevent an energy crisis.

Summary and analysis

When the 42-page IMF report was submitted. ChatGPT accepted the entire document without any problems. Cloud, on the other hand, throws an error saying that the PDF is too long. We cut it to 31 pages, which was enough to be accepted in the pro version. (The free version can only analyze around 25 pages.)

Limitations aside, the Claude 3.5 Sonnet provided competent analysis of a short document, extracting key points and verbatim passages accurately without illusion—a vast improvement over the Claude 3, which was prone to inventing information. However, his quotes were vague and not as useful as selected by ChatGPT.

Captura de pantalla 2024 06 21 140629

ChatGPT was impressed by capturing the entire 42-page document without interruption. It provides a more comprehensive list, providing a lot of useful information.

Using bullet points to highlight key points and providing a summary of each section is a more useful technique than Cloud provides, which is a summary that has no structure and lacks the main points of the report.

Captura de pantalla 2024 06 21 140701

ChatGPT has demonstrated a strategic approach to effectively address key points by focusing on the summary and conclusion of the report. It's a powerful way to get a rough idea of ​​the broader research before in-depth analysis.

Winner: ChatGPT takes the top spot in summary and analysis. Its ability to process longer documents in its entirety, combined with its comprehensive and systematic approach to summaries, makes it more suitable for academic research and professional analysis.

Additional features

Claude 3.5 Sonnet introduces “Artifacts,” a feature that allows users to instantly view, edit, and build AI-generated content. This integration of AI directly into projects and workflows improves user interaction, especially with code.

ChatGPT Plus provides the ability to train custom GPTs for specific tasks, a feature not currently available in the cloud. This customization option provides additional versatility in professional and academic settings. It also integrates the Dall-ee 3 image generator, which is very useful for creating images using natural language.

Winner: In terms of additional features, ChatGPT wins. While the cloud's “artifacts” feature provides unique real-time interaction capabilities, ChatGPT's custom training option provides valuable flexibility. Determining which features are more valuable depends on the specific needs of the user, but GPTs can help many types of users. ChatGPT can create images, which is another advantage of Claude.

Summary

Cloud 3.5 Sonnet shines on tasks that require creativity, language use, and efficient coding. His ability to understand and implement complex instructions makes him unique, especially in creative work and coding.

ChatGPT proves its competence by handling extensive texts and conducting detailed analysis. Its ability to process and synthesize large amounts of data makes it a powerful tool for academic research and professional analysis. It also offers more generous free access.

Both models are very capable. However, if you're thinking of upgrading to a paid tier, ChatGPT may be the best choice for many given its additional feature set. The difference is whether you work in creative writing or coding, where the cloud is the undisputed king, by far.

You can pay for the model that best suits your needs and use the free version of the other one for different tasks. However, if you're short on cash and not a power user, it's great that OpenAI and Anthropic offer their high-level models for free.

Edited by Ryan Ozawa.

Generally intelligent newspaper

A weekly AI journey narrated by a generative AI model.

Leave a Reply

Pin It on Pinterest