A new study by Anthropic found that as few as 250 malicious documents can produce a "backdoor" vulnerability in a large language model, regardless of model size or training data volume. The results challenge the common assumption that attackers need to control a percentage of training data. The research involved training LLMs ranging from 600 million to 13 billion parameters.
short by
Vaishnavi Mishra /
01:38 pm on
10 Oct