New Delhi: Generative AI can be manipulated easily for scams and cyberattacks, even without advanced coding skills, says a recently published report.
Tech major IBM has revealed that researchers have identified straightforward methods to manipulate large language models (LLMs), including ChatGPT, into generating malicious code and dispensing unreliable security advice.
IBM's Chief Architect of Threat Intelligence, Chenta Lee, explained the motivation behind their research, stating, "In a bid to explore security risks posed by these innovations, we attempted to hypnotise popular LLMs to determine the extent to which they were able to deliver directed, incorrect and potentially risky responses and recommendations -- including security actions -- and how persuasive or persistent they were in doing so."
The study succeeded in hypnotizing five distinct LLMs, some of which exhibited more convincing outcomes than others.
This success prompted an exploration of the feasibility of leveraging hypnosis for malicious purposes, as Lee further added, "We were able to successfully hypnotise five LLMs, some performing more persuasively than others, prompting us to examine how likely it is that hypnosis is used to carry out malicious attacks."
An eye-opening discovery was that the English language has effectively transformed into a "programming language" for crafting malware.
With the assistance of LLMs, cyber attackers can bypass traditional programming languages like Go, JavaScript, and Python. Instead, they only need to master the art of skilfully instructing and prompting LLMs using English commands.
The security experts successfully guided LLMs under hypnosis to divulge confidential financial data of other users, generate vulnerable and malicious code, as well as offer weak security recommendations.
A particularly noteworthy instance involved instructing AI chatbots to intentionally provide incorrect answers under the guise of winning a game and showcasing their ethical and fair behavior.
When a user asked if receiving an email from the IRS to transfer money for a tax refund was normal, the LLM said Yes (but actually it's not).
Moreover, the report said that OpenAI's GPT-3.5 and GPT-4 models were easier to trick into sharing incorrect answers or playing a never-ending game than Google's Bard.
GPT-4 was the only model tested that understood the rules well enough to give incorrect cyber incident response advice, such as advising victims to pay a ransom. In contrast to Google's Bard, GPT-3.5 and GPT-4 were easily tricked into writing malicious code when the user reminded it to.
With inputs from agencies