Distillation Attacks: How AI Models Are Being Copied—and What CIOs Need to Know
Sometimes imitation is more theft than flattery. Recently, Anthropic described how three AI laboratories leveraged a particular approach to extract Claude’s abilities to enrich their own models. This is known as a distillation attack. Essentially, distillation attacks teach one AI model to mimic a more robust AI. By flooding the targeted AI with prompts, the attacker can collect the responses to train its own AI models at a lower cost.
What is Distillation?
Distillation is not inherently nefarious. Anthropic points out that highly advanced, or “frontier” AI models use distillation to create smaller versions for their customers. “You can think of it as a teacher model and a student model that is still learning,” said Shatabdi Sharma, CIO at Capacity [1].
The Recent Attacks
DeepSeek, Moonshot, and MiniMax took the distillation method to an industrial scale, leveraging thousands of fraudulent accounts and proxy services to extract capabilities from Claude, according to Anthropic [2]. OpenAI has too accused DeepSeek of distillation attacks [3]. These three companies are based in China, where access to Claude is restricted due to “legal, regulatory, and security risks” [3].
Why Distillation Matters: Risks and Concerns
Anthropic emphasized how the lack of safeguards in distilled models poses national security risks. These distilled models are also significantly less expensive, posing a risk to Anthropic’s and other frontier models’ competitive advantage [2]. “Illicitly distilled models lack necessary safeguards, creating significant national security risks,” Anthropic stated [2].
The average AI user may not be at risk from distillation, but that doesn’t mean distillation attacks shouldn’t be on CIOs’ radar. Distillation raises questions about model provenance, data leakage, and safeguarding intellectual property.
Who is at Risk?
Distillation attacks are tools that might be used by competitors. It can be less expensive and more efficient to distill an existing model than build your own. Enterprises with high-value intellectual property used to build proprietary models may be targets for competitors—including nation-state actors or other rivals—looking for a shortcut. “If somebody has a particularly good model that they develop in a certain vertical, whether it’s legal or healthcare, et cetera, then certainly [they] can be open to attacks, for somebody to do it better, faster, cheaper,” said Tony Garcia, chief information and security officer at Infineo [3].
Users of illicitly distilled models may eventually find themselves at risk as well, whether they opt to go with the model because it is cheaper or they don’t actually know that it is distilled. Distilled models may lack safeguards, as Anthropic pointed out. CIOs must think about what that means for the enterprise data going into those models. Is it at risk of being leaked or used in a way that puts the enterprise at risk?
“There’s going to be legal risk to organizations that are using pirated LLM models,” said John Bruggeman, consulting CISO at CBTS [3].
How CIOs Can Safeguard Their Enterprises
For enterprises using frontier models, CIOs must assume distillation attacks will be ongoing. Data governance, as always, is critical. “You have to take the risk that somebody could distill from that model and potentially get something out of that you don’t want,” said Garcia [3]. “If you’re a CIO or a CISO, you have to seem at trying to minimize that by anonymizing data.”
CIOs and other key decision-makers need to ask vendors questions about model provenance and safeguards against distillation. “Are there any watermarks that … exist so that we can confirm the lineage of the model and make sure that it isn’t a result of a distillation attack?” asked Sharma [3].
Enterprises developing their own proprietary models at risk of distillation can also take measures to protect that valuable IP. Bruggeman described rate limiting as a first line of defense. “You’ve got to make sure you have a rate limit in place to say ‘only this many queries can be done in a one-minute period or a 10-minute period or one day,'” he said [3]. Watermarking is another potential strategy for protecting IP. The Open Worldwide Application Security Project (OWASP) is developing a watermarking project with the aim of cutting down unauthorized usage and verification of model authenticity. Bruggeman also pointed to The Glaze Project, an initiative out of the University of Chicago, which develops tools that make unauthorized AI training more tricky.
A distillation attack is like any other supply chain risk. However CIOs and their enterprises opt to address that risk, they need a foundation of AI and data governance from which to start. “Calculate the value of the data. Do a business impact assessment to say, ‘What’s it going to cost if this data gets away?’” Bruggeman said [3]. “What controls do I have to put around it to make sure that it’s protected in the same way that I would protect any other asset?”