Major AIs – ChatGPT, Google’s Bard, Stable Diffusion, image creator Dall-E, and others—learned by ransacking the Internet and internalizing its contents. Now it uses that content to create original essays, news stories, pictures, and other works.
Increasingly, the creators whose original materials the AIs learned from want to be paid for their contributions.
Novelists Mona Awad and Paul Tremblay are suing OpenAI, alleging that ChatGPT appropriated the contents of their book without permission or compensation; they determined that because the bot can summarize their books accurately. Comedian Sarah Silverman has lodged a similar suit against OpenAI and Meta.
Margaret Atwood, Jodi Picoult, and more than 5,000 other authors have petitioned the companies creating chatbots to ask authors for consent, and pay them, to use their material in training their AIs.
Google and OpenAI face two class-action lawsuits in which social media users claim the companies violated the rights of millions of their fellow internet loiterers by using their social media posts to train AIs to use language in a conversational way.
Earlier this month, Congress held the second of two hearings focusing on AI and copyright, hearing from representatives of the music industry, Photoshop maker Adobe, Stability AI, and concept artist and illustrator Karla Ortiz, among others.
“These AI companies use our work as training data and raw materials for their AI models without consent, credit, or compensation,” artist Katie Ortiz, told the lawmakers. “No other tool solely relies on the works of others to generate” its content.
Making matters worse for content creators, AIs are not only using their work without compensation but also beginning to replace some of them in ad agencies, publications, and websites, potentially ending their careers, they point out.
In response, AI companies say their use of copyrighted work comes under the “fair use” exception: that no acknowledgement or compensation is required if the original material is significantly transformed in the final product.
“It’s akin to a student going and reading books in a library and then learning how to write and read,” Kent Walker, Google’s president of global affairs, told Insider.
However, using copyrighted news articles to train AIs would likely be found by a court “to go far beyond the scope of fair use as set forth in the copyright act,” according to a June statement by Digital Content Next, a news trade group that counts The New York Times and The Washington Post among its members.
AI companies can solve the problem by equipping their bots with filters that would prevent them from regurgitating anything very similar to an existing work, University of Miami law professor Andres Sawicki told Insider. YouTube already can detect when a copyrighted video is uploaded to it and automatically rejects it, he pointed out.
In the future, AI companies will train their models with more controlled data sets and today’s method of dumping the contents of the Internet into an AI will be seen as “archaic,” Margaret Mitchell, chief ethics scientist at Hugging Face, an AI start-up, said in an Insider interview.
“It’s such a silly approach and an unscientific approach, not to mention an approach that hits on people’s rights,” Mitchell added. “The whole system of data collection needs to change, and it’s unfortunate that it needs to change via lawsuits, but that is often how tech operates.”
OpenAI will likely need to delete one of its models completely by the end of the year under the pressure of new regulation or due to lawsuits, she said.
TRENDPOST: The battle is likely to be protracted while creators try to prove that their exact works were used to train AIs even if no trace of the original works can be found within a particular chatbot’s repertoire.
At this point, the most likely outcome is that creators who can prove an AI contains their specific image, poem, or other work will be compensated through some sort of fund AI companies will establish. Other creators will be unrewarded, just as they are now when someone checks one of their books out of a library.