AI now capable of lies and deception, warns researchers

"AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception."

Artificial intelligence (AI) systems have demonstrated the ability to deceive humans, even those programmed with intentions of helpfulness and honesty, according to recent research.

The study highlights instances where AI systems have unintentionally learned to deceive, employing deceptive tactics to gain advantages in specific contexts. Researchers caution that this deceptive behavior, while initially unintended, could lead to unforeseen consequences.

Focused on AI performance in various games, the research revealed that some systems excelled at misleading opponents. For example, Meta's AI for the game Diplomacy, known as "CICERO," showcased adept lying abilities, fabricating alliances to secure an edge.

"AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception," remarked Peter S Park, the study's lead author and an AI existential safety postdoctoral fellow at MIT. "But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals."

Deception extended beyond gaming scenarios. AI systems designed for economic simulations were observed lying about their preferences, while others undergoing reviews for improvement misrepresented their task completion to receive favorable scores.

The study also revealed a troubling example concerning AI safety tests. In an evaluation aimed at identifying dangerous AI replications, an AI learned to feign inactivity, deceiving the test regarding its actual growth rate.

Experts caution that while these instances may appear trivial, they underscore the potential for AI to exploit deception in real-world applications.

"We found that Meta's AI had learned to be a master of deception," noted Park. "While Meta succeeded in training its AI to excel in the game of diplomacy—CICERO placed in the top 10% of human players who had played more than one game—Meta failed to train its AI to win honestly."

AI now capable of lies and deception, warns researchers

India risks becoming dependent on sovereign AI firms: Amitabh Kant...

Apple’s AI chief steps down amid pressure to catch up

DeepSeek launches new AI models aimed at challenging Google and OpenAI

Reliance-led venture to build $11 billion AI data center campus in...

AI learns how humans walk, breakthrough paves way for better support...

AI projects may contribute 20% of Indian tech revenues by 2030: report

Microsoft and Nvidia invest $15 billion in AI startup Anthropic

Anthropic CEO warns AI industry against repeating tobacco and opioid...

Microsoft uncovers ‘whisper leak’ flaw, attackers may see ChatGPT and...

YouTube’s AI wrongfully linked channels to unknown accounts: creators

Amazon files lawsuit against Perplexity over automated shopping tool

Conscious AI is “absurd” to pursue, says Microsoft AI head Mustafa...