Open text to video model Sora wows X but still has weaknesses
Artificial intelligence firm OpenAI unveiled its first text-to-video model to strong reception on Thursday.
OpenAI introduced its new generative AI model named Sora in 2018. Announced on February 15, it claims to create detailed videos from simple text queries.
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds. https://t.co/7j2JN27M3W
Quick: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— Open AI (@OpenAI) February 15, 2024
According to a February 15 blog post, OpenAI claims that the AI model can generate movie-like scenes at up to 1080p resolution. These scenes can include multiple characters, specific types of movement, and precise details of the subject and background.
How does Sora work?
Like OpenAI's image-based predecessor DALL-E 3, Sora works on a model known as “diffusion”.
Diffusion refers to a generative AI model that generates a video or image with what appears to be “constant noise” and then gradually “removes the noise” by changing it in several steps.
Announcing Sora — our model that creates minute-long videos from text prompts: pic.twitter.com/0kzXTqK9bG
— Greg Brockman (@gdb) February 15, 2024
The AI firm Sora is built on research into the GPT and DALL-E3 models, he writes.
OpenAI admits that Sora still has a number of weaknesses and can struggle to accurately simulate the physics of a complex scene, falsifying the nature of cause and effect.
“For example, someone might take a bite out of a cookie, but then the cookie might not have a bite mark.”
The new device can confuse given “location details” by mixing up lefts and rights or not following the correct descriptions of directions, the company said.
OpenAI's new generative model is now available exclusively to “red teams” – tech lingo for cyber security researchers – to assess “critical areas for damage or threats” and gather feedback on how it can help designers, visual artists and filmmakers. Prepare the model.
In the year A December 2023 report from Stanford University shows that AI-powered image generation tools are being trained on thousands of illegal child abuse images using the AI database LAION, which can be used for text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text – Writing is something that raises serious ethical and legal concerns. Image or video models.
Users on X were left “speechless”.
Dozens of video demos showing examples of Sora in action have been distributed on X, and Sora currently has over 173,000 posts on X.
To demonstrate what the new generative model is capable of, OpenAI CEO Sam Altman opened himself up to custom video-generation requests from users on X, with the AI chief sharing a total of seven Sora-generated videos, from Duck on a Dragon recording a podcast on a mountaintop to Golden Retriever.
pic.twitter.com/nej4TIwgaP
— Sam Altman (@sama) February 15, 2024
AI analyst McKay Wrigley – along with many others – wrote that the video created by Sora “rendered him speechless”.
In a Feb. 15 post for X, NVIDIA senior researcher Jim Fan Sora declared that anyone who believed that DALL-E 3 was just another “innovative toy” would be dead wrong.
If you thought OpenAI Sora was a creative toy like DALLE, … think again. Sora is a data-driven physics engine. It is a symbol of many worlds, real or fantastic. The simulator learns complex rendering, “intelligent” physics, long horizons, and a semantic base, all… pic.twitter.com/pRuiXhUqYR
— Jim Fan (@DrJimFan) February 15, 2024
In Fan's view, Sora is less of a video-generation tool and more of a “data-driven physics engine,” as the AI model not only generates abstract video, but also generates the physics of the objects in the image itself.
Magazine: ‘Crypto Is Inevitable' So We're ‘All In' – Meet Vance Spencer, Permabul