Begin typing your search above and press return to search.
proflie-avatar
Login
exit_to_app
DEEP READ
Ukraine
access_time 16 Aug 2023 11:16 AM IST
Espionage in the UK
access_time 13 Jun 2025 10:20 PM IST
Yet another air tragedy
access_time 13 Jun 2025 9:45 AM IST
exit_to_app
Homechevron_rightTechnologychevron_rightMinimal data...

Minimal data contamination can ‘poison’ large AI models, warns Anthropic

text_fields
bookmark_border
AI
cancel

Anthropic has warned that even a tiny amount of malicious data can create vulnerabilities in large AI models.

The San Francisco-based AI firm said that as few as 250 contaminated documents can introduce a “backdoor” into a model, allowing it to produce unexpected outputs when triggered.

The warning comes from a joint study with the UK AI Security Institute and the Alan Turing Institute. The research challenges the belief that attackers need to control a large portion of a dataset to compromise a model.

According to Anthropic, the total size of the dataset is irrelevant if even a small fraction is poisoned by bad actors.

The study, titled “Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples” and published on arXiv, is described as “the largest poisoning investigation to date.” It tested models ranging from 600 million to 13 billion parameters. The team focused on a backdoor-style attack that triggers gibberish output when a hidden token appears, while the model otherwise behaves normally.

Researchers trained models of different sizes on clean datasets, injecting 100, 250, or 500 poisoned documents to test vulnerability. Surprisingly, the attack success rate was nearly identical for all model sizes given the same number of poisoned documents. The study concluded that model size does not protect against backdoors; the absolute number of poisoned points during training matters most.

The study found that 100 malicious documents were insufficient to reliably backdoor a model. However, 250 or more consistently succeeded across all tested models. Training volume and random seeds were varied to confirm the results.

Anthropic cautioned that the experiment focused on a limited denial-of-service style backdoor, producing gibberish output. It did not test more dangerous outcomes, such as data leakage, executing malicious code, or bypassing safety mechanisms. Whether similar dynamics apply to more complex backdoors in advanced models remains an open question.

Show Full Article
TAGS:AIArtificial IntelligenceAnthropicData Contamination
Next Story