|
AI is the next wave of concentrating power into the hands of massive tech corporations and government entities, while leaving masses of creative humanity scrounging for relative crumbs.
Anyone expecting substantive regulation limiting abuses already occurring, or protection for human creatives, shouldn’t be holding their breath.
But what no one is talking about—yet—is how AI is set to gut the current revenue modeling of the entire internet.
It is already effectively cutting out the need to visit websites to view or read content.
The information from their websites, and any exposed content on the internet, is being scraped, or soon will be, by the major “non-profit” consortiums like the Large-scale Artificial Intelligence Open Network (LAION).
These consortiums produce datasets by scanning billions in images and trillions in written content from the internet.
Those datasets are then provided to companies like Stability AI, DeviantArt and others, to be fed to their machine and deep learning AI programs, which are designed to create “generative works” by learning from the content.
Next Level De-Human: Training AI on “Synthetic Data”
Synthesis AI is another company specializing in providing data sets for AI deep-learning systems to train on, and they currently have relationships with Google, Amazon and other leading AI-focused corporations.

But Synthesis AI has pioneered something new in the world of data set compilation. It specializes in creating what it calls “synthetic data” to provide to its clients.
The company notes that compiling data sets from real world environments is a costly and intensive process:
“Collecting, labeling, training, and deploying data and datasets is difficult, costly, and time-consuming for users. Multiple surveys have uncovered that artificial intelligence teams spend anywhere from 50–80% of their time collecting and cleaning data, which is a significant challenge. On average, individual organizations spend nearly $2.3 million annually on data labeling. Additionally, real-world datasets raise ethical and privacy concerns…
“…Synthetic data is computer-generated data that models the real world. Synthetic data and generative artificial intelligence (AI) datasets have emerged as a disruptive new approach to solving the data problem in computer vision (CV) and machine learning (ML). By coupling visual effects (VFX) and gaming technologies with new generative artificial intelligence (AI) models, companies can now create data that mimics the natural world. This new approach to training machine learning models can create vast amounts of photorealistic labeled data at orders of magnitude faster speed and reduced cost.”
Synthesis AI reduces the costs of outfitting systems to navigate, surveil and gather data from real world environments and activities, by creating sophisticated virtual environments that can be used for AI training.
MIT Tech Review has called synthetic data one of the top breakthrough technologies of 2022.
The process is in the same realm as “digital twinning,” and can involve specific real world environments, assets and even humans, or environments that have no real world counterpart, but which simulate the general attributes of say, a city or a human being.
Synthesis AI contends that this enables “customers to build machine learning (ML) models in a more ethical and privacy-compliant way.”
They tout that their technology reduces “regulatory” and other legal issues:
“Using traditional human datasets presents ethical and privacy issues. The use of real-world and free, publicly available datasets is only becoming more complicated as individual countries and trading blocs regulate data collection, data storage, and more. By its nature, synthetic data is artificial, enabling customers to develop models in a fully privacy-compliant manner. HIPAA applies only to real human data; no such federal regulations exist for synthetic data.”
The company also outlines how they can inject virtual environments with desired “diversity” metrics and other preferred biases (ie., attributes that wouldn’t necessarily exist in the same way or proportion in real world data sets):
“Capturing diverse and representative data is difficult, often leading to model bias. In the case of human-centric models, this may result in differential model performance with respect to age, gender, ethnicity, or skin tone. With synthetic data approaches, the distribution of training data is explicitly defined by the machine learning developer, ensuring class-balanced datasets. A broader and more uniform training data distribution reduces model bias.”
While synthetic data sets created by Synthesis AI data may obscure the real world data sources and connections from the tech giants they service, the company itself admits it builds its virtual data sets from real world data, including utilizing “generative artificial intelligence models.”
It specifically mentions contributions of generative AI technologies like Dall-E, which swallowed billions of photos and images created by humans, as part of its deep-learning.
Of course, it just makes sense that at some point, virtualization must refer to the actual world, no matter how much it subsequently abstracts that information.
But Synthesis AI has managed to provide a shield to tech companies who want to avoid charges that they are violating privacy and scraping data from real-world surveilled humans and environments.
And there are real and admittedly ingenious efficiencies and advantages to be had from virtualizing environments for the purposes of training AI deep-learning systems.
Humans Will Stand Or Fall Based On Principled Opposition To Knowledge and Content Theft, and Human Replacement
Well before generative AI systems like ChatGPT and Dall-E were front and center in the public’s awareness, the Trends Journal was alerting readers concerning problematic aspects and disruptions of AI technology. (See, for example, “YOUR AI LOVER DOESN’T CARE ABOUT YOU (AND THAT’S WHY IT’S SO SEDUCTIVE)” 10 May 2022, “AI IS LEARNING YOUR JOB” 24 May 2022, “DOMESTIC SPY AGENCIES, CHINESE SOCIAL CREDIT SYSTEMS…AND SNEAKY AI” 14 Jun 2022 and “AUTOMATING OUT OF WORLD CRISIS?” 12 Jul 2022.)
Concerning the way deep-learning AI systems were trained on data sets composed of the creative content and intellectual property of collective humanity, the legal and political community has shown itself to be badly lagging in accounting for or addressing the ramifications.
Even worse, as a recent Techcrunch article noted, some jurisdictions, including the U.K., are actually attempting to loosen copyright protections and rights of content creators, to reduce potential claims against AI corporations:
“Interestingly, other countries [in contrast to the U.S.] have signaled a move toward more permissive use of publicly available content — copyrighted or not. For example, the U.K. is planning to tweak an existing law to allow text and data mining “for any purpose,” moving the balance of power away from rightsholders and heavily toward businesses and other commercial entities.”
In the current phase of the AI takeover, tech corporations have found a way to use AI to more or less comprehensively absorb human knowledge and data of virtually every kind, and to build systems that are already rendering human experts and professionals obsolete.
But it hardly ends there.
As job after job, and sector after sector are progressively taken over by AI, automation and robotics, the bulk of average humanity will find themselves viewed as superfluous, by a narrow elite controlling the technology, and furthering their power and transhuman god quests.
Unless political bodies acknowledge the enormous difference in what AI does, which can’t be likened to human learning and acquiring of expertise, there is a risk that AI will be treated in terms of existent copyright and content creation legal precedents and understandings.
In that world, there will be a further concentration of wealth into the hands of a few, which will make even the past 50 years of spiraling wealth concentration look negligible in comparison.
You must be logged in to post a comment.