Anthropics Cloud AI Overthrew ChatGPT on the Chatbot Arena Leaderboard
8 months ago Benito Santiago
While Open AI ChatGPT enjoys the largest mainstream share of thought of all generative AI tools, its top spot has been stolen by online giant Cloud 3 Opus.
Cloud's ascension in the chatbot arena is the first to dethrone OpenAI's GPT-4, which manages ChatGPT Plus since appearing on the leaderboard in May last year.
Chatbot Arena is a research organization led by the Large Model Systems Organization (LMSYS ORG) at the University of California, Berkeley, UC San Diego, and Carnegie Mellon University, dedicated to unlocking models that support collaboration between students and faculty. The platform presents users with two unlabeled language models and asks them to rate which one performs better based on the criteria they believe is appropriate.
After compiling thousands of subject comparisons, Chatbot Arena calculates the “best” models for the leaderboard, updating it over time.
That personalized approach, based on participants' varying personal preferences, is what sets Chatbot Arena apart from other AI benchmarks. Model trainers cannot “trick” their models into making the algorithm win, because of quantitative parameters. By measuring what people simply prefer, Chatbot Arena is a valuable and quality resource for AI researchers.
The platform collects user feedback and runs it through the Bradley-Terry statistical model to predict the likelihood of a particular model outperforming others in direct competition. This approach allows the generation of general statistics, including confidence intervals for elo level estimates—which are used to measure the skill of chess players.
Claude 3 Opus' ascension isn't the only significant development on the leaderboard. The Cloud 3 Sonnet (a mid-sized model that's available for free) and the Cloud 3 Haiku (a smaller, faster model), also made by Anthropogenic, currently rank 4th and 6th, respectively.
The leaderboard includes different versions of GPT-4 such as GPT-4-0314 (first GPT-4 version from March 2023), GPT-4-0613, GPT-4-1106-preview and GPT-4. -0125-Preview (The latest GPT-4 Turbo model will be available via API from January 2024. According to the rankings, Sonnet and Haiku are both better than the original GPT-4, with Sonnet also outperforming the modified versions launched by OpenAI in June 2023.
This also means, unfortunately, that there is currently only one open source LLM in the top 10: Qwen, with Starling 7b and Mixtral 8x7B in the top 20 other open source models.
Cloud One of the advantages of GPT-4 is the capacity and availability of token context. The official version of Claude 3 Opus holds more than 200k—and the firm claims to have a version that can handle 1 million tokens with near-perfect return rates. This means that Cloud can handle longer queries and store data more efficiently compared to GPT-4 Turbo, which handles 128K tokens and loses access capacity with long queries.
Google's Gemini Advanced is also gaining attention in the AI assistant space. The company offers a plan that includes 2TB of storage and AI capabilities across Google's suite of products as a Chat GPT Plus subscription ($20 per month).
The free Gemini Pro is currently number 4 between GPT-4 Turbo and Cloud 3 Sonnet. The top-of-the-line Gemini Ultra model is not available for testing and has yet to be announced in the rankings.
Edited by Ryan Ozawa.