Small-sample data poisoning changes the procurement math

Research shows 250 poisoned documents can backdoor any size language model—challenging the assumption that larger models need proportionally more malicious data. The fixed-number threshold changes how teams should think about data provenance and vendor due diligence.

2 min read
Small-sample data poisoning changes the procurement math
Photo by Chalo Gallardo / Unsplash

If someone wanted to poison a city's water supply, you'd assume they'd need to contaminate a percentage of the water—the bigger the reservoir, the more poison required. That's how most teams think about training data security too: larger models trained on more data should need proportionally more malicious content to compromise. Turns out that mental model is wrong.

Anthropic, working with the UK AI Security Institute and the Alan Turing Institute, found that 250 poisoned documents can backdoor language models ranging from 600M to 13B parameters—regardless of how much clean data they're trained on. It's not about the percentage of training data you control, it's about hitting a fixed threshold. Creating 250 malicious documents is trivial compared to the millions most teams probably assume an attacker would need. The researchers tested a low-stakes backdoor (making models output gibberish when triggered), so this isn't about immediate risk to production systems. But what it reveals matters for data governance: if 250 documents representing 0.00016% of training tokens can compromise a model, then the question shifts from "what percentage of our data is third-party?" to "can we vouch for the provenance of every source?" For product teams managing training pipelines or procuring fine-tuning services, this means rethinking vendor due diligence and data lineage tracking.

Research shows 250 documents can backdoor models from 600M to 13B parameters, challenging assumptions about data poisoning attacks and raising new questions about training data provenance and vendor management.

A small number of samples can poison LLMs of any size
Anthropic research on data-poisoning attacks in large language models