|
All that hard work you spent creating informative content on your website?
Tech company AI deep learning just scraped it, along with data and creative content from billions of other sites and resources.
Soon, your website or YouTube channel may not be needed anymore. AI SaaS content providing services like ChatGPT and others even more sophisticated will provide re-synthesized content gleaned from your work and the work of countless others, directly to those who query it.
Companies like Microsoft (which just snapped up a position in ChatGPT creator OpenAI), Google, Amazon, Meta and others are committing content theft on a breathtaking, monumental scale.
Not only is the AI they’ve created already putting graphic artists and writers out of work, while threatening professionals in other sectors, these new AI interfaces are set to disrupt the entire internet “search engine / website” model of accessing information.
And, so far at least, these AI companies are getting away with it. They aren’t paying anyone for all that copyrighted and third-party created content that their deep learning AI systems were built on.
Without that data and content, none of these deep learning AI systems would have anything of much value to say.
Consider this: if OpenAI had to abide by traditional public domain legal boundaries, their AI learning would’ve been substantially limited to content created in 1926 or prior. That’s the current cut-off year where creative works are considered public domain, according to U.S. copyright law.
It’s hard to see how ChatGPT would’ve made quite the splash that it did with its “Research” release to the public, if its AI had been limited to pre-1926 public domain material.
Of course, OpenAI could have incorporated more up-to-date data sets for its deep-learning AI. But they would’ve had to secure permission from content creators, and negotiate compensation.
Creatives of every kind, in every field, need to wake up, and quickly, to the massive theft and scam at the heart of deep learning AI.
The Problem Is Far Bigger Than Most Realize
A December 2022 Forbes article on ChatGPT quoted some legal experts using a decidedly human standard concerning AI technology, in claiming that ChatGPT was not violating copyright law.
Michael Kelber, and Intellectual Property specialist with Neal Gerber Eisenberg, was quoted:
“From an intellectual property [IP] perspective, if the source material is not specifically quoted, there would be no requirement to include citations. If ideas are used but not copied, the use would not implicate copyright or other protected IP. That said, from a research standpoint, citation or attribution would be helpful for identifying biases and credibility, just like any other citation to authority.”
That may be a reasonable standard for human use of information.
But the scale of content scraping and method of resynthesizing happening via deep learning AI and being served up via content re-synthesizers like ChatGPT will radically impair the ability of content creators to derive compensation for their work, or drive traffic to their repositories of content (such as websites, digital books, etc).
If any technological development concerning IP law needs to be contemplated as requiring a new paradigm, this would be it.
A Barron’s article of 27 January, “ChatGPT will unleash copyright chaos,” said next to nothing about this danger.
It instead focused on whether people who query the AI have “copyright ownership rights” over what ChatGPT produces, or whether the AI itself can be considered an author deserving copyright protection.
The article authors, both attorneys in the field of technology, state that “copyright code does not expressly address works of authorship created by machines. This is no surprise given the Copyright Act was written long before the Internet, in the year when the VHS tape was introduced.”
The article also goes over copyright law concerning a “creativity” element surrounding a 1990’s Supreme Court precedent, which determined telephone directories did not deserve copyright protection, because such directories lacked “creative choices” generating sufficient originality in the content.
The Barron’s article authors end up advocating that copyright laws be updated to allow people who use AI tools like ChatGPT to claim authorship for works involving AI content “if the work is original (that is, it was not copied), and a human started the creative process for that particular work.”
But the crucial copyright question surrounding ChatGPT is: what rights should countless creators have, considering the enormous amounts of data scraped from them that is being used in a system that threatens to outmode the means by which they gain recognition and compensation?
AI Re-Synthesizing Technology Will Evolve Much Further
AI deep learning systems are currently showing limits, as well as mind-blowing possibilities and implications, regarding their capabilities.
ChatGPT makes simple mistakes. Perhaps it can pass a Bar Exam. But in testing we conducted, the program had trouble specifying playable fingerings of a G Major chord, G7 chord and Gdim chord in a given fret position area. Though it could relate the theory, it could not correctly provide concrete examples for several of the chords in simple notation.
Many probing the research version of ChatGPT are finding limits and errors in what it produces. But AI deep learning systems will no doubt get over those humps, and go much further.
Right now AI programs from OpenAI can provide remarkable examples of written answers to natural language queries, and visual images created from natural language prompts.
But in the not very distant future, an advanced deep learning AI might be able to swallow and re-synthesize video content. What happens to Youtube creators who have monetized guitar learning videos, when a generated “AI video guitarist” can teach virtually anything concerning guitar, in response to queries, after having trained on countless teachers and virtuoso guitarists?
What happens when AI can produce a webisode from a natural language premise, that can compete against the current hit situation comedy? Or the current blockbuster superhero flick?
The dialogue and images won’t be straight copies of any single source, of course. They’ll just remarkably resemble the style of a certain genre, auteur director, iconic writer, etc.
No, the content issues and implications of AI deep learning systems are just beginning.
Solutions?
The content generated by AI imaging software like DALL-E-2, and writing content software like ChatGPT is wholly derivative and / or wholly dependent on the content it scrapes, absorbs, and re-synthesizes.
The tech companies building these AI systems stand to narrowly profit, while untold millions will lose rights respective to their content. And many millions of humans will be losing their jobs to AI.
With ChatGPT passing the Bar, and AI legal advice apps already being tested in small level court case situations even lawyers will eventually be affected.
There are some ways that the worst outcomes concerning content hijacking by AI systems might be avoided.
Requiring AI information systems to excerpt only, in cases where permission to use sources have not been obtained, would be a good start.
And requiring these systems to cite and link to sources would be another good idea.
Is it possible to do? Yes. Perplexity.ai is an example of an AI engine that does provide cites and links for content and answers it provides to queries.
Google has created perhaps the most sophisticated system for identifying copyrighted content. It may be much harder to develop a system of source content tracking and tracing, given the specifics of how AI deep learns, and then assembles answers and content in response to queries.
But maybe AI companies should have to hold back on their wonderful AI bots until they can figure out a solution for fair compensation.
Under a system that compensates original data source creators, users of ChatGPT or DALL-E-2 could have licensing rights to generated content, similar to licensing rights for images obtained from sites like Shutterstock, or music obtained from Melodyloops.
Web3 technology might also be able to offer some answers for human creatives, or at least protections and automatic modes of compensation for content use.
A March 2022 Forbes opinion piece by LegalShield CEO Jeff Bell outlined ways that personal data could be protected and in the hands of people rather than corporations, via web3 technologies involving tokenization and encryption, etc.
Bell noted that users would retain the right to give or not give that data, and potentially be compensated for use of data:
“Web 3.0 is focused on making personal information private again. It builds on a growing movement to give users control over their data ownership and monetization.”
Web3 could do much more than just protect personal and interaction data of people traversing the internet, though. Novel decentralized crypto technologies open the door to new ways to authenticate, track and automatically monetize content, via NFT tokenization, smart contracting, crypto ecosystems, and more.
But none of that is guaranteed to happen without a clearer understanding of the scale of the problem that AI deep learning systems pose to the current technological and legal paradigms of content creation and dissemination.
Addendum: ChatGPT on Issues of AI Deep Learning
Of course, no article concerning ChatGPT can be considered complete without asking the AI interface directly about the matters discussed.
So we interviewed ChatGPT concerning OpenAI’s method of creating a content re-synthesizing interface from AI deep learning technology.
Question: Did your training data include copyrighted information?
ChatGPT: Yes, it is possible that some copyrighted information was included in my training data. The training data used to develop language models like me often consists of large amounts of text from various sources, including books, articles, websites, and other types of content. Some of this content may be protected by copyright laws.
It’s important to note that OpenAI, like many AI companies, typically uses a process called “pre-training” to develop its language models. During this process, the models are trained on massive amounts of text data to develop a basic understanding of language and common patterns of usage. The pre-trained models are then fine-tuned on specific tasks or domains to improve their performance.
In accordance with copyright law, OpenAI takes steps to ensure that its use of copyrighted material is covered by a fair use exception or is authorized by the copyright owner. However, as with any technology company that uses text data, there is always a risk that some copyrighted material may be included in its training data.
Question: Since you were trained on copyrighted information, make an argument that profits from your use should be equitably distributed to all society.
ChatGPT: As a machine learning model developed by OpenAI, I do not have personal opinions or beliefs. However, the argument that profits from the use of AI systems like me should be equitably distributed to all society is based on the idea that these systems are made possible by the collective knowledge, culture, and information of society as a whole. This argument suggests that the profits generated by the use of AI systems should be seen as a shared resource and distributed in a way that benefits everyone.
Proponents of this argument argue that the vast amounts of information and data used to train AI systems like me come from a variety of sources, including public institutions, private companies, and individuals. They believe that the value created by these systems should therefore be seen as a public good and used to benefit society as a whole, rather than being concentrated in the hands of a few individuals or companies.
It’s important to note that this is a complex and controversial issue, and there are differing opinions on how the profits generated by AI systems should be distributed. Some argue that the companies and individuals who have invested in the development and deployment of these systems should be the primary beneficiaries, while others believe that a larger portion of the profits should be distributed to society as a whole. The debate over the equitable distribution of AI profits is likely to continue as the technology continues to evolve and become increasingly widespread.
Question: Companies like Google have developed sophisticated systems for identifying copyrighted content and compensating copyright owners. Should AI companies like OpenAI be regulated to identify its usages of copyrighted content and compensate copyright owners?
ChatGPT: As a machine learning model, I do not have personal opinions or beliefs. However, it is widely recognized that the use of copyrighted material in AI systems raises important legal and ethical questions. Some argue that AI companies like OpenAI should be regulated to identify the use of copyrighted content in their systems and compensate copyright owners in a similar manner to other companies that use copyrighted material, such as Google.
Proponents of such regulation argue that AI companies have a responsibility to respect the rights of copyright owners, just like any other company that uses copyrighted material. They believe that companies should be required to take reasonable steps to identify the use of copyrighted content in their systems and ensure that they are compensating copyright owners in accordance with the law.
Others argue that existing copyright laws are not well suited to address the unique challenges posed by AI systems, and that new legal frameworks may be needed to ensure that copyright owners are adequately compensated for the use of their work in these systems.
In any case, it is clear that the use of copyrighted content in AI systems raises important legal and ethical questions that will need to be addressed as the technology continues to evolve and become more widespread.