Open text to video model Sora wows X but still has weaknesses

Open text to video model Sora wows X but still has weaknesses


Artificial intelligence firm OpenAI unveiled its first text-to-video model to strong reception on Thursday.

OpenAI introduced its new generative AI model named Sora in 2018. Announced on February 15, it claims to create detailed videos from simple text queries.

According to a February 15 blog post, OpenAI claims that the AI ​​model can generate movie-like scenes at up to 1080p resolution. These scenes can include multiple characters, specific types of movement, and precise details of the subject and background.

How does Sora work?

Like OpenAI's image-based predecessor DALL-E 3, Sora works on a model known as “diffusion”.

Diffusion refers to a generative AI model that generates a video or image with what appears to be “constant noise” and then gradually “removes the noise” by changing it in several steps.

The AI ​​firm Sora is built on research into the GPT and DALL-E3 models, he writes.

OpenAI admits that Sora still has a number of weaknesses and can struggle to accurately simulate the physics of a complex scene, falsifying the nature of cause and effect.

“For example, someone might take a bite out of a cookie, but then the cookie might not have a bite mark.”

The new device can confuse given “location details” by mixing up lefts and rights or not following the correct descriptions of directions, the company said.

Sora can cause random physical activity. Source: OpenAI

OpenAI's new generative model is now available exclusively to “red teams” – tech lingo for cyber security researchers – to assess “critical areas for damage or threats” and gather feedback on how it can help designers, visual artists and filmmakers. Prepare the model.

In the year A December 2023 report from Stanford University shows that AI-powered image generation tools are being trained on thousands of illegal child abuse images using the AI ​​database LAION, which can be used for text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text-to-text – Writing is something that raises serious ethical and legal concerns. Image or video models.

Users on X were left “speechless”.

Dozens of video demos showing examples of Sora in action have been distributed on X, and Sora currently has over 173,000 posts on X.

To demonstrate what the new generative model is capable of, OpenAI CEO Sam Altman opened himself up to custom video-generation requests from users on X, with the AI ​​chief sharing a total of seven Sora-generated videos, from Duck on a Dragon recording a podcast on a mountaintop to Golden Retriever.

AI analyst McKay Wrigley – along with many others – wrote that the video created by Sora “rendered him speechless”.

In a Feb. 15 post for X, NVIDIA senior researcher Jim Fan Sora declared that anyone who believed that DALL-E 3 was just another “innovative toy” would be dead wrong.

In Fan's view, Sora is less of a video-generation tool and more of a “data-driven physics engine,” as the AI ​​model not only generates abstract video, but also generates the physics of the objects in the image itself.

Magazine: ‘Crypto Is Inevitable' So We're ‘All In' – Meet Vance Spencer, Permabul



Leave a Reply

Pin It on Pinterest