OpenAI introduced a new family of models and made them available on Thursday at the paid subscription level of ChatGPT Plus, which it says offers significant improvements in performance and inference capabilities.
“We're introducing OpenAI o1, a new large-scale language model trained with reinforcement learning to perform complex reasoning,” OpenAI said in an official blog post, “o1 thinks before it reacts.” AI industry watchers had been expecting the top AI developer to deploy a new “strawberry” model for weeks, although the differences between the various models under development were not officially disclosed.
OpenAI describes this new model family as a big leap forward, so they've changed their usual naming scheme by moving away from the ChatGPT-3, ChatGPT-3.5 and ChatGPT-4o series.
“This is a significant advance for complex reasoning tasks and represents a new level of AI capabilities,” OpenAI said. “In light of this, we are renaming this series OpenAI o1, setting the counter back to one.”
A key feature of these new models is that they “take their time” to think before taking action and use the “chain of thought” factor to be extremely effective at complex tasks.
Notably, even the smallest model in this new lineup outperforms the high-end GPT-4o in several key areas in AI tests by OpenAI—especially challenges considered to be of PhD-level complexity by OpenAI's comparison.
The newly released models emphasize what OpNIA calls deliberative reasoning, where the system takes more time to process its responses. This process aims to produce more thoughtful, coherent answers, especially in logic-intensive tasks.
OpenAI also published internal test results showing improvements over GPT-4o in tasks such as coding, calculus, and data analysis. However, the company notes that OpenAI 01 has made less drastic improvements in creative tasks such as creative writing. (Our own empirical tests put OpenAI's offerings behind Claude AI in these areas.) Nevertheless, the new model's results were generally rated well by human reviewers.
The capabilities of the new model, as described, will implement the AI process envisioned during data. In short, this means that the model uses a distributed approach to solve the problem step by step before presenting the final result, which users will see in the end.
“The o1 model is trained with massive reinforcement learning to reason using a series of thought chains,” says OpenAI on the o1 family of system cards. “Training models to include the chain of thought before we answer has the potential to open up significant benefits – increasing the potential risks stemming from higher intelligence.”
The extensive validation leaves room for debate among technical observers as to the true novelty of the model's architecture. OpenAI doesn't clarify how the process differs from token-based generation: is it a rational allocation of resources, or is it a hidden chain of thought—or a mix of both techniques?
An earlier open-source AI model called Reflection tried a similar logic-heavy approach but faced criticism for its lack of transparency. That model used labels to separate levels of reasoning, resulting in what its developers say is an improvement over the results of conventional models.
I'm excited to announce the world's top open source model, the Reflection 70B.
Trained using Reflection-Tuning, it is a technique that allows LLMs to correct their own errors.
The 405B arrives next week – we expect it to be the best model in the world.
Developed by @GlaiveAI.
Read ⬇️: pic.twitter.com/kZPW1plJuo
— Matt Shumer (@matshumer_) September 5, 2024
Embedding more instructions in the logic chain process not only makes the model more accurate, but also vulnerable to jailbreaking techniques, because it has more time and steps – when potentially harmful results occur.
The jailbreaking community seems as adept as ever at bypassing AI security controls, with the first successful OpenAI 01 jailbreaks reported minutes after its release.
It is unclear whether this deliberative reasoning approach can scale effectively for real-time applications that require fast response times. OpenAI, meanwhile, said it plans to expand the capabilities of its models, including web search functionality and improved multimodal interactions.
The model will adapt OpenAI's minimum standards in terms of security, anti-corruption and autonomy over time.
The model was supposed to be released today, but it may be released in stages, as some users have reported that the model is not yet available for testing.
The smallest version will eventually be available for free, and API access will be 80% cheaper than the OpenAI o1-preview, according to OpenAI's announcement. But don't get too excited: there are currently only 30 messages per week to test this new model for the 01-preview and 50 for the o1-mini, so choose your request wisely.