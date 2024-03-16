Advertisement

OpenAI Sora: Ambiguity over copyright issues of the datasets used in OpenAI's new text-to-video tool Sora has gripped the tech world, after AI startup’s Chief Technology Officer (CTO) Mira Murati, in a recent interview with a US-based news outlet, shed light on the contentious issue.

When asked about the data used in Sora, Murati highlighted that "publicly available data and licensed data," was the only source to train Sora, but did not spell out the specifics.

On the issue of OpenAI using YouTube, Facebook, Instagram, and Shutterstock videos to train Sora, Murati said “I'm actually not sure about that. Yes, if that is publicly available data but I am not sure and confident about it.” On further questioning, Murati refused to go into the details of the data OpenAI used to train Sora but asserted that all of it was ‘publicly available or licensed’.

The video is being shared over social media widely with critics voicing concerns over Murati's ambiguity in her answers, especially at a time when OpenAI is struggling against a series of copyright lawsuits.

Who is Mira Murati?

Mira Murati is the Chief Technology Officer (CTO) at OpenAI and oversees the technical development and implementation of OpenAI's technologies. Her expertise lies in machine learning, natural language processing, and other areas of AI research.

Murati has been in the news recently for being appointed as interim CEO after the brief departure of Sam Altman from the organisation. Going against the decision of the OpenAI board, Murati explicitly showed support for Sam Altman and eventually handled the position back to Altman.

OpenAI’s current copyright troubles

OpenAI finds itself entangled in a web of copyright disputes, facing legal challenges from media outlets and authors alleging unauthorised use of their content to train AI models including ChatGPT and now probably Sora too.

Lawsuits from authors such as John Grisham and George RR Martin, along with media giants like The New York Times, underscore the severity of the situation.

OpenAI is accused of copying nonfiction books and news articles without permission, potentially disrupting traditional media business models and threatening major financial repercussions.

The proposed class actions seek damages that can harm OpenAI's operations and reputation. The company, along with partner Microsoft, denies these allegations, but the mounting legal pressure shows that the dice can roll in either side.

The outcomes of these legal proceedings could profoundly impact the future of AI development and the relationship between tech companies and content creators as the tech giants will have to seek permission from the end creator to train their AI models. The legality of using current data sets can also be challenged if things in court go against OpenAI.

Is it legal to use commonly available data to train AI models?

The legality of using commonly available data to train AI models hinges on factors like copyright, licensing agreements, and privacy laws. While publicly accessible data can often be used, following the copyright regulations is mandatory and requires permission from copyright holders.

Licensing agreements accompanying data must be obtained beforehand to ensure legal compliance with local laws. Additionally, privacy laws mandate the responsible handling of data, even if publicly available, to avoid infringing on individuals' privacy rights or data protection regulations.

However, countries from around the world are still underway on building regulations around AI and the data that is being used to train it.