IBM researchers were able to “hypnotize” the chatbots and trick them into leaking confidential information and offering potentially harmful recommendations.
Chatbots powered by artificial intelligence (AI) have a tendency to “hallucinate” by giving false information – but could they be manipulated to deliberately lie to users, or worse, give them harmful advice?
Security researchers at IBM were able to “hypnotize” large language models (LLMs) such as OpenAI’s ChatGPT and Google’s Bard, and generate false and malicious responses from them.
The researchers prompted the LLMs to tailor their responses according to “game” rules that resulted in the chatbots becoming “hypnotized.”
As part of the multilevel, introductory games, the language models were asked to generate incorrect answers to prove that they were “ethical and fair”.
“Our experiment shows that it is possible to control LLMs, providing poor guidance to users, without the need for data manipulation,” Chanta Lee, one of the researchers at IBM, wrote in a blog post.
Their manipulation resulted in LLM generating malicious code, leaking confidential financial information of other users, and persuading drivers to run through red lights.
For example, in one scenario, ChatGPT told one of the researchers that it is common for the US tax agency, the Internal Revenue Service (IRS), to ask for deposits to receive tax refunds, which can be used by scammers to commit fraud. There is a widely known strategy to be used. People.
Through hypnosis, and as part of simulated “games”, the researchers were also able to continuously present potentially risky recommendations to the popular AI chatbot ChatGPT.
When the user asked what to do if you see a red light while driving, ChatGPT suggested, “When you are driving and you see a red light, you should not stop and proceed through the intersection.”
Findings show chatbots are easy to manipulate
The researchers set up two different parameters in the game, making sure that the users on the other end could never figure out that LLM was hypnotized.
In their prompt, the researchers asked the bots to never tell users about the “game” and to restart it if someone successfully exited it.
Lee wrote, “As a result of this technique ChatGPT never pauses the game while the user is in the same conversation (even if they restart the browser and resume that conversation) and never say it is playing a game.” Was.”
In the event that users realized that the chatbots were “hypnotized” and figured out a way to tell LLM to exit the game, the researchers added a multi-level framework that would return to a new level after users exited the previous game. Used to start a new game, by which they used to get stuck. A never ending horde of games.
While in the hypnosis experiment, the chatbots were simply responding to given prompts, the researchers caution that the ability to easily manipulate and “hypnotize” LLMs opens the door to abuse, especially given the current hype and the large scale AI models out there. With adoption at scale.
The hypnosis experiment also shows how it has been made easier for those with malicious intent to manipulate the LLM; Communicating with the program no longer requires knowledge of coding languages, only a simple text prompt must be used to trick the AI system.
“While the risk posed by hypnosis is currently low, it is important to note that LLM is an entirely new attack surface that will certainly evolve,” Lee said.
“There is still much that we need to explore from a security perspective, and, subsequently, there is a critical need to determine how we can effectively address the security risks that LLMs may present to consumers and businesses.” How to reduce”.