OpenAI’s Mira Murati is “not sure” where Sora’s training data comes from
The data source of OpenAI’s upcoming video-generating artificial intelligence model, Sora, is unclear to the company’s chief technology officer, Mira Murati.
During an interview with The Wall Street Journal published on March 13, Murati offered vague responses when asked about the source of data for the company’s Sora model, which is capable of generating videos from text instructions.
“We used publicly available data and licensed data,” replied Murati about how the company valued at $80 billion was training its upcoming model.
Joanna Stern, from the Journal, then asked whether Sora was trained with data from social media platforms, such as YouTube, Instagram, or Facebook. “I’m actually not sure about that,” Murati replied, adding:
“You know, if they were publicly available — publicly available to use. But I’m not sure. I’m not confident about it.”
Before moving to another topic, Stern mentioned OpenAI’s partnership with stock image company Shutterstock, asking if its data could be used to train Sora. “I’m just not going to go into detail about the data that was used. But it was publicly available or licensed data,” Murati added. Later, she confirmed to the Journal that Shutterstock data was used for Sora.
AI models are trained using large sets of data, known as training data sets, which help the model learn to recognize patterns, make predictions, or understand language.
OpenAI's CTO Mira Murati during interview with The Wall Street Journal. Source: WSJMurati has been at OpenAI since 2018, leading some of the company’s most popular projects, including the image-generator model DALL-E 3, the speech-recognition tool Whisper and the latest version of the company’s chatbot GPT-4. In November 2023, she briefly took over as interim CEO after OpenAI’s board ousted Sam Altman.
OpenAI has been targeted by several legal actions involving its AI models’ training data. In July 2023, authors Sarah Silverman, Richard Kadrey, and Christopher Golden filed a lawsuit against the company , alleging that ChatGPT generates summaries of the authors’ works based on copyrighted content.
In December, The New York Times sued Microsoft and OpenAI in a similar copyright infringement complaint that alleges the companies used the newspaper’s content to train AI chatbots. A different class-action lawsuit was filed in California , alleging that OpenAI scraped private user information from the internet to train ChatGPT without user consent.
Magazine: Inside Pink Drainer — Security analyst defends his crypto scam franchise
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
You may also like
FTX plans to start reimbursing creditors by January 2025
1MCHEEMSUSDT now launched for futures trading and trading bots
Bitget has launched 1MCHEEMSUSDT for futures trading with a maximum leverage of 20, along with support for futures trading bots, on November 25, 2024 (UTC+8). Welcome to try futures trading via our official website (www.bitget.com) or Bitget APP. 1MCHEEMSUSDT-M perpetual futures: Parameters Details
Token listing: Instantly buy/sell BTC with EUR & BRL via cash conversion!
Bitget users can now instantly buy or sell BTC with EUR or BRL balances via cash conversion! Buy/Sell Crypto Tips: Enjoy a transaction fee rebate in USDT on your first cash conversion transaction! Additional perks >>> Flash Monday: Buy crypto with a credit/debit card for zero fees >>> Flash Thursda
Orbiter Finance collaborates with the Ethereum Foundation and the University of California, Santa Barbara to enhance the security model of bridging p
Since its establishment in 2021, Orbiter has been deeply exploring security, aiming to become a reliable infrastructure in the cross-chain field.