Published 17:36 IST, December 14th 2024
Suchir Balaji's Post Against OpenAI May Undermine Its Stance in Numerous Legal Cases
Balaji published a note accusing OpenAI of violating copyright laws hours before he died by suicide. It could challenge OpenAI's side in legal cases
Suchir Balaji, a 26-year-old Indian-American ex-researcher at OpenAI, tragically died by suicide on Saturday. In his final post published in The New York Times, which came hours before his death, the whistleblower levelled several allegations against the Microsoft-backed AI company seeking to go for-profit. OpenAI’s former researcher has mentioned in his post how the company allegedly ignored the copyright laws while training its AI models – currently considered to be among the best in the industry.
Balaji, who left OpenAI in August citing concerns he no longer wanted to help build technologies that he believed could harm society, has argued in his post that the Sam Altman-led company used copyrighted materials to train its AI models. He highlighted that while AI’s broader implications remain unclear, OpenAI’s foundational models and ChatGPT were undermining the commercial relevance of businesses, internet companies, and even individuals whose years of work were unlawfully used to train AI systems.
Although OpenAI has responded to Balaji’s death, it has not acknowledged his blog post, which is now under heavy scrutiny. It could challenge the company’s stance as it faces multiple lawsuits worldwide from educators, academicians, publishing houses, and news organisations for alleged copyright infringement. Balaji’s post, which has gained significant attention for his allegations against OpenAI following his death, may be pivotal to the legal challenges the company faces.
– The allegation that OpenAI’s plan to switch to becoming for-profit based on its foundational models that were trained by copyrighted material could be a challenging argument. As a profit-driven company, OpenAI cannot qualify for “fair use,” especially when all its data models rely on copyrighted content. The “fair use” doctrine is used by AI companies as a legal shield wherein they argue that they substantially transform the copyrighted works, so the results their AI chatbots offer do not compete directly with the latter. However, the intent with which AI chatbots work is similar, so they can be seen as competing and substituting original works.
– Balaji pointed out that generative AI, in general, dismisses the concept of legal boundaries of copyrighted works. As an AI company, this could impact OpenAI’s argument that it upholds the legal restrictions of copyrighted material. He mentioned that OpenAI’s technology violates copyright protection laws because of several instances where GPT-4 and DALL-E AI models have produced outputs similar to the inputs. “The outputs aren’t exact copies of the inputs, but they are also not fundamentally novel,” he wrote.
– OpenAI also allegedly created duplications of the data it obtained from various sources, including the copyrighted ones. While this is not a regular occurrence, AI models often replicate data sets for later use in an event where access to the source may potentially be cut. He argued that while data repetition is not “always problematic,” it ends up giving the AI model a memory if “done excessively.” If it is copyrighted data, then it could challenge OpenAI’s side in lawsuits.
Cases against OpenAI
OpenAI, which rose in popularity after its AI bot ChatGPT gained traction on the internet, faces several lawsuits for using copyrighted data without the owner’s permission. In most cases, these owners are news organisations, researchers, and academicians, who have accused OpenAI of feeding its AI models their works without permission.
India
News agency Asian News International (ANI) sued OpenAI last month for using its publicly available data to train AI models without prior permission. ANI has also accused OpenAI in its lawsuit of using quotes and pieces of work verbatim, as well as misconstruing information to spread fake news. The suit is under trial at the Delhi High Court.
Canada
A group of Canadian news and media companies, such as the Canadian Broadcasting Corporation, have collectively filed a lawsuit against OpenAI wherein they allege the company illegally used their copyrighted works to train its large language models (LLMs). They have underscored how OpenAI scours through the published content to create data sets for its AI models.
United States
Several news organisations, including The New York Times and Associated Press, sued OpenAI for infringing the copyright law by using their data without permission. However, AP signed a deal with OpenAI later to offer data for AI training.
Updated 17:36 IST, December 14th 2024