AI “GUARDRAILS” WEAKER THAN THOUGHT, STUDY FINDS

AI “GUARDRAILS” WEAKER THAN THOUGHT, STUDY FINDS

In developing their artificial intelligence systems, Google, OpenAI, and other creators installed so-called guardrails that would keep their AIs from disseminating hate speech or disinformation and committing similarly unacceptable acts.

Those guardrails are too weak, according to a new study of ChatGPT published by researchers from IBM, Princeton and Stanford universities, and Virginia Tech.

“Companies try to release AI for good uses and keep its [undesirable or] unlawful uses behind a locked door,” technologist Scott Emmons at the University of California Berkeley told The Wall Street Journal, “but no one knows how to make a lock.”

The study adds fuel to the debate over open-source AI systems versus those that keep their operating structures private.

For example, Meta released an open-source AI earlier this year. That means the system is open to anyone who wants to customize it.

Some criticized Meta for being reckless in releasing a system that could be structured, accidentally or deliberately, to do harm.

In contrast, OpenAI charges fees to businesses and others that want to tweak its ChatGPT AI to perform specific tasks, such as tutoring grade-school students in arithmetic.

However, the researchers testing the guardrails found that they could use this doorway into ChatGPT to twist it to do almost all of the things it wasn’t supposed to do, such as repeat partisan political attacks or use graphic language related to child abuse.

“When companies allow for fine-tuning and the creation of customized versions, they open a Pandora’s box of new safety problems,” Princeton researcher Xiangyu Qi told the WSJ

OpenAI could protect the guardrails by restricting the data outsiders use to customize the program, the researchers said, but that could mean denying some customers the means to use ChatGPT as they would like.

Before releasing an updated ChatGPT in March, OpenAI turned it over to a group whose mission was to make the AI do things the company didn’t want it to. The group found ways to coax the bot to tell them how to buy illegal firearms online and how to combine common household items to make toxic substances.

As a result, OpenAI broadened and strengthened its guardrails.

This summer, AI engineers at Carnegie Mellon University and the Center for AI Safety showed they could break down those guardrails by attaching long strings of characters to the prompts they used to give the bot a task.

More recently, AIs have learned to recognize and understand photos. A common example is that if an AI is shown a photo of the inside of a refrigerator, it can deliver recipes that can be made from the foods on hand.

Just as quickly, researchers found they could break down guardrails by inserting code messages in an image. For example, developer Riley Goodside at start-up Scale AI slipped a message into what appeared to be an image of nothing but whiteness and, in return, the bot created an advertisement for a cosmetics company.

The same technique could be used to generate something much more dangerous, such as instructions for making a pipe bomb. 

“This is a very real concern for the future,” Goodside said to the WSJ. “We don’t know all the ways this can go wrong.”

TRENDPOST: AI is still in its infancy. It will take years, and many instances of blatant abuse, including crimes, before adequate safeguards are figured out and implemented.

Even then, rogue actors will ignore laws and conventions and will continue to use AI for nefarious purposes. This reality is an integral part of the world that AI has created.

Skip to content