OpenAI’s Mira Murati is “not sure” where Sora’s training data came from.

OpenAI's Mira Murati is "not sure" where Sora's training data came from.


The source of data for OpenAI's upcoming video-generating artificial intelligence model, Sora, is unclear to the company's chief technology officer, Mira Murthy.

In an interview with The Wall Street Journal published on March 13, Muratti gave vague answers when asked about the source of the company's Sora model, which can generate videos from text instructions.

“We used publicly available data and licensed data,” Murati responded, explaining how the $80 billion company was training the upcoming model.

Joanna Stern, from the Journal, then asked if Sora was trained in information from social media platforms such as YouTube, Instagram or Facebook. “I'm not really sure about that,” Murthy replied.

bybit

“You know, if they were publicly available – for public use. But I'm not sure. I am not sure about this.

Before moving on to another topic, Stern mentioned OpenAI's partnership with stock image company Shutterstock, asking if the data could be used to train Sora. “I'm not going to go into detail about just the information that was used. But it was publicly available or licensed data,” Murathi added. Later, she confirmed to the Journal that Shutterstock data was used for Sora.

AI models are trained using large sets of data, known as training datasets, which help the model recognize patterns, make predictions, or understand language.

OpenAI's CTO Mira Murati in an interview with the Wall Street Journal. Source: WSJ

Murati has been at OpenAI since 2018, leading the company's most notable projects, including the image-generator model DALL-E 3, the speech-recognition tool Whisper, and the company's latest chatbot GPT-4. In the year In November 2023, she took over as interim CEO after OpenAI's board fired Sam Altman.

OpenAI has been targeted by several legal actions involving training data for AI models. In July 2023, authors Sarah Silverman, Richard Kadry, and Christopher Golden filed a lawsuit against the company, alleging that ChatGPT generates summaries of the authors' works based on copyrighted content.

In December, The New York Times filed a similar copyright infringement complaint against Microsoft and OpenAI, alleging that the companies used the newspaper's content to train AI chatbots. A separate class-action lawsuit has been filed in California, alleging that OpenAI stole personal user data from the Internet to train ChatGPT without user consent.

Magazine: Inside Rose Drainer – Security Analyst Defends His Crypto Fraud Franchise

Leave a Reply

Pin It on Pinterest