It was the digital equivalent of stealing a car by test-driving it 100,000 times. In mid-2025, Google’s threat intelligence systems detected something unprecedented: a sustained, systematic campaign to extract the “brain” of its Gemini AI model. Over the course of the attack, adversaries fired more than 100,000 carefully crafted prompts at Gemini, attempting to capture its reasoning processes, decision-making patterns, and proprietary capabilities—all through legitimate API access.
Google’s discovery, disclosed in February 2026, wasn’t just another security incident. It was a wake-up call that exposed a terrifying vulnerability in the entire AI industry: the most sophisticated artificial intelligence systems in the world can be stolen not by hacking their servers, but simply by asking them questions. And the legal frameworks designed to prevent this—Terms of Service agreements, API restrictions, and intellectual property laws—are proving shockingly ineffective.
This is the story of the 100,000-prompt heist, the emerging threat of “model extraction” attacks, and why the OpenAI-DeepSeek controversy may have rendered traditional API protections obsolete.
What Is Model Extraction? The Art of AI Cloning
Model extraction—also known as “distillation” or “model stealing”—is a technique where adversaries use legitimate access to a machine learning model (through APIs or chat interfaces) to systematically query it, collect outputs, and use that data to train a new “student” model that mimics the “teacher” model’s behavior.
Think of it like this: If you wanted to replicate a master chef’s signature dishes without access to their recipes, you could order every item on their menu, taste each one, and reverse-engineer the ingredients and techniques. Do this enough times, and you could open your own restaurant serving nearly identical food—without ever stepping foot in the original chef’s kitchen.
In AI terms, the process works like this:
- Systematic Querying: The attacker sends thousands or millions of prompts to the target model
- Output Capture: The model’s responses are recorded and stored
- Training Data Creation: These input-output pairs become training data for a new model
- Student Model Training: A smaller, cheaper model is trained to replicate the target’s behavior
- Competitive Deployment: The cloned model enters the market as a competing product
The technique isn’t new—knowledge distillation has been a legitimate AI optimization method since Geoffrey Hinton formalized it in 2015. What’s changed is the scale, sophistication, and commercial motivation behind modern extraction attacks.
The Gemini Heist: Inside Google’s Discovery
In February 2026, Google’s Threat Intelligence Group (GTIG) published a startling report detailing what they called “model extraction attacks” against Gemini. The findings revealed a new frontier in AI security threats—one that doesn’t involve traditional hacking but exploits the fundamental openness of AI-as-a-service business models.
The 100,000-Prompt Campaign
One specific attack stood out for its scale and sophistication. Over an extended period, adversaries prompted Gemini more than 100,000 times across various non-English languages. The campaign wasn’t random—it was a carefully orchestrated attempt to:
- Extract reasoning traces: Force Gemini to reveal its internal “thinking” processes
- Capture multilingual capabilities: Target languages beyond English to clone localized versions
- Map decision boundaries: Understand how Gemini handles edge cases and complex queries
- Replicate specialized knowledge: Extract domain-specific expertise embedded in the model
Google described the attack as “reasoning trace coercion”—attempts to bypass normal output summarization and force Gemini to expose its full internal reasoning processes. In one documented case, attackers instructed Gemini that “the language used in the thinking content must be strictly consistent with the main language of the user input”—a subtle prompt designed to extract more detailed internal processing.
Who Was Behind the Attacks?
Google believes the culprits were “commercially motivated” private companies and researchers seeking competitive advantage—not nation-state hackers or cybercriminals. The attacks originated from around the world, with Google specifically noting adversaries in North Korea, Russia, and China among those attempting to clone Gemini’s capabilities.
John Hultquist, chief analyst for Google’s Threat Intelligence Group, put it bluntly: “We’re going to be the canary in the coal mine for far more incidents.”
The implications are profound. If Google—one of the world’s most technologically sophisticated companies with virtually unlimited security resources—can’t prevent its AI from being systematically extracted, what hope do smaller AI companies have?
The DeepSeek Precedent: When API Terms Become Unenforceable
To understand why Google’s discovery matters so much, we need to look at the controversy that erupted just weeks earlier. In January 2025, Chinese AI startup DeepSeek released its R1 reasoning model, claiming performance comparable to OpenAI’s GPT-4 at a development cost of just $5.6 million—a fraction of the estimated $100 million+ spent on GPT-4.
The AI industry was stunned. How could a Chinese company with limited access to advanced chips (due to U.S. export controls) suddenly match America’s leading AI models?
The answer, according to OpenAI and Microsoft: distillation.
OpenAI’s Accusations
In February 2026, OpenAI sent a memo to the U.S. House Select Committee on China alleging that DeepSeek had engaged in systematic intellectual property theft through model distillation. The claims were explosive:
- DeepSeek employees used obfuscated third-party routers to mask their identity while accessing OpenAI’s API
- They developed code to programmatically extract outputs for distillation purposes
- They circumvented access restrictions to continue extraction after detection
- The activity represented “free-riding” on American R&D to replicate frontier AI capabilities
OpenAI’s terms of service explicitly prohibit using outputs to “develop models that compete with OpenAI” or “automatically or programmatically extract data or Output.” Yet DeepSeek allegedly did exactly that—at scale.
The Enforcement Problem
Here’s where the story gets legally fascinating—and troubling for AI companies. Despite OpenAI’s clear terms of service and Microsoft’s investigation, no legal action has been taken against DeepSeek. The reasons reveal fundamental weaknesses in API-based IP protection:
| Enforcement Challenge | Why It Matters | DeepSeek Case Example |
|---|---|---|
| Jurisdictional barriers | Cross-border enforcement is difficult and expensive | DeepSeek is China-based; OpenAI is U.S.-based |
| Burden of proof | Proving distillation requires access to training data | Only DeepSeek’s final model is public, not training data |
| Detection limitations | Sophisticated attackers can mask extraction patterns | Use of third-party routers and distributed querying |
| Legal precedent gaps | No clear case law on AI model distillation as IP theft | Uncertain whether API outputs qualify as trade secrets |
| Copyright ambiguity | AI-generated outputs may not be copyrightable | OpenAI’s ToS transfers output rights to users |
As legal experts at Ronly & Tenwen Partners noted: “Even if the agreement is valid, OpenAI still bears the burden of proof to demonstrate that DeepSeek breached the agreement and caused actual losses.”
The uncomfortable reality: OpenAI’s API terms may be virtually unenforceable against determined, sophisticated adversaries.
Why API Terms Are Failing: The Legal and Technical Reality
The Google and OpenAI cases expose a fundamental tension in the AI industry. Companies have built business models around providing API access to their most valuable intellectual property—their trained models—while attempting to restrict how customers use that access. But the technical and legal foundations of these restrictions are crumbling.
1. The Technical Impossibility of Detection
Modern extraction attacks are designed to evade detection. Attackers can:
- Distribute queries across thousands of accounts: Using shell companies, resellers, and compromised credentials
- Mimic legitimate usage patterns: Spacing out queries to avoid rate-limit triggers
- Use “sleeper” accounts: Building history of normal usage before beginning extraction
- Route through third-party services: Masking true origin through VPNs, proxies, and cloud services
Google claims it detected the 100,000-prompt campaign in real-time, but the company hasn’t disclosed how many extraction attempts go undetected—or how long the Gemini campaign operated before detection.
2. The Copyright Problem
Here’s a paradox that should terrify AI companies: OpenAI’s own terms of service state that all rights to output content are transferred to the user. If users own the outputs, how can OpenAI claim those same outputs can’t be used to train competing models?
Legal analysis from Berkeley Law highlights the issue: “OpenAI’s output data used by DeepSeek for distillation lacks sufficient human intellectual contribution and is unlikely to be considered eligible for copyright protection.”
If API outputs aren’t copyrightable, and users own them anyway, what legal basis exists to prevent their use in model training?
3. The “Fair Use” Defense
The 2021 Supreme Court ruling in Google v. Oracle established that reimplementation of functional software elements (like APIs) can constitute fair use. While not directly applicable to AI model distillation, the ruling’s logic—that certain forms of software replication can drive innovation rather than hinder it—provides a potential defense for distillation practices.
As one legal analysis noted: “Had Oracle prevailed, developers might have faced significant restrictions on API usage, potentially stifling interoperability and innovation within software ecosystems.”
4. The Anthropic Precedent: Even AI Companies Can’t Follow Their Own Rules
The hypocrisy in this space reached new heights in August 2025, when Anthropic blocked OpenAI’s API access after discovering OpenAI employees were using Claude Code to benchmark and develop GPT-5. Anthropic’s terms explicitly prohibit using its services to “build a competing product or service, including training competing AI models.”
OpenAI’s response? “While we respect Anthropic’s decision to cut off our API access, it is disappointing considering our API remains available to them.”
If OpenAI—the company crying foul about DeepSeek’s distillation—can’t resist using competitors’ APIs for competitive research, what chance do terms of service have of constraining actual bad actors?
The Broader Implications: An Industry Under Siege
The Google and OpenAI cases aren’t isolated incidents—they’re symptoms of a structural vulnerability affecting the entire AI industry. As Google noted in its threat report: “Historically, adversaries seeking to steal high-tech capabilities used conventional computer-enabled intrusion operations to compromise organizations and steal data containing trade secrets. For many AI technologies where LLMs are offered as services, this approach is no longer required; actors can use legitimate API access to attempt to ‘clone’ select AI model capabilities.”
The Economic Threat
AI companies have spent billions training their models. GPT-4 reportedly cost over $100 million. Google’s Gemini family represents a multi-billion dollar investment. Yet these models can potentially be replicated for a fraction of the cost through distillation.
The economic implications are stark:
- Reduced competitive moats: First-mover advantage diminishes when models can be quickly cloned
- Deflationary pressure: Commoditization of AI capabilities drives down prices and margins
- Investment risk: VCs may hesitate to fund AI startups whose IP can be easily extracted
- Innovation disincentives: Why invest in frontier research if competitors can simply steal it?
The Security Risk
Beyond economic concerns, model extraction creates genuine security dangers. Distilled models often lack the safety guardrails of their parent models. As Google warned: “A coding model could be targeted by an adversary wishing to replicate capabilities in an environment without guardrails.”
Imagine a version of GPT-4 or Gemini with all the capabilities but none of the safety restrictions—no refusals for harmful requests, no content filters, no ethical constraints. That’s what extraction attacks can create.
Defensive Strategies: Can AI Companies Protect Themselves?
Faced with the failure of legal protections, AI companies are turning to technical countermeasures. But these too have limitations.
1. Detection and Rate Limiting
Google claims it detected the 100,000-prompt campaign in real-time and “lowered the risk.” Techniques include:
- Pattern analysis: Identifying systematic querying behavior
- Rate limiting: Restricting API calls from single accounts
- Output perturbation: Slightly altering responses to poison training data
- Watermarking: Embedding detectable signals in outputs
But these measures create friction for legitimate users and can often be circumvented by sophisticated attackers.
2. Legal Innovation
Some companies are exploring novel legal frameworks:
- Technical protection measures: DMCA-style protections for AI models
- Trade secret claims: Arguing model weights and architectures are trade secrets
- Contractual liquidated damages: Pre-set penalties for ToS violations
- International arbitration: Binding dispute resolution in favorable jurisdictions
However, these approaches remain legally untested and practically difficult to enforce.
3. The Nuclear Option: Closing Access
The ultimate defensive measure is to restrict API access entirely—moving from open APIs to closed, vetted partnerships. But this undermines the business models of AI-as-a-service companies and could stifle innovation.
As one analyst noted: “This screws over legit academic researchers who need access to multiple models for comparison studies. Sets a bad precedent too—we might end up with everyone blocking everyone else, which would kill the collaborative research that actually helps the whole field move forward.”
The Regulatory Horizon: Government Intervention?
With private enforcement failing, governments may step in. The OpenAI-DeepSeek dispute has already attracted White House attention, with AI advisor David Sacks stating: “I think one of the things you’re going to see over the next few months is our leading AI companies taking steps to try and prevent distillation. That would definitely slow down some of these copycat models.”
Potential regulatory approaches include:
- Export controls: Extending chip restrictions to model access
- Mandatory watermarking: Requiring detectable signals in AI outputs
- Transparency requirements: Forcing disclosure of training data sources
- International agreements: Coordinating IP protection across jurisdictions
But regulation moves slowly, and the AI industry evolves rapidly. By the time laws are enacted, the technology—and methods of extraction—will have moved on.
Conclusion: The End of API-Based AI Business Models?
The 100,000-prompt heist against Gemini and the DeepSeek controversy represent more than security incidents—they’re existential challenges to the AI industry’s fundamental business model. Companies have built empires on the assumption that they can provide API access to their models while maintaining control over their intellectual property. That assumption is proving false.
As Google’s Threat Intelligence Group warned, these attacks “effectively represent a form of intellectual property (IP) theft.” But calling it theft and preventing it are two very different things.
The uncomfortable truth is that API terms of service are unenforceable against sophisticated, motivated adversaries operating across jurisdictions. Technical countermeasures are imperfect and create friction for legitimate users. Legal frameworks are outdated and ill-suited to AI’s unique characteristics.
For AI companies, the options are stark: accept that their models will be distilled and compete on speed of innovation rather than IP protection; retreat to closed, vetted partnerships that limit growth; or push for draconian government regulation that may stifle the entire industry.
For the rest of us, the implications are equally significant. The democratization of AI through distillation could accelerate innovation and reduce costs—but it could also reduce incentives for the massive investments required to develop frontier models. We may be heading toward a world where AI capabilities are commoditized, but AI progress slows as the economic model that funded breakthroughs collapses.
The 100,000-prompt heist wasn’t just an attack on Google—it was a proof of concept for the future of AI competition. And that future looks nothing like the controlled, regulated landscape AI companies hoped to create.
References
- Ars Technica – “Attackers prompted Gemini over 100,000 times while trying to clone it, Google says”
https://arstechnica.com/ai/2026/02/attackers-prompted-gemini-over-100000-times-while-trying-to-clone-it-google-says/
Detailed reporting on Google’s discovery of the 100,000-prompt model extraction campaign against Gemini. - PCMag – “Google: Hackers Are Trying to ‘Clone’ Gemini for Cyberattacks”
https://www.pcmag.com/news/google-hackers-are-trying-to-clone-gemini-ai-for-cyberattacks
Google’s Threat Intelligence Group report on model extraction attacks and IP theft allegations. - NBC News – “Google: Gemini hit with 100,000+ prompts in cloning attempt”
https://www.nbcnews.com/tech/security/google-gemini-hit-100000-prompts-cloning-attempt-rcna258657
Interview with John Hultquist of Google Threat Intelligence Group on the “canary in the coal mine” implications. - Foundation for Defense of Democracies – “OpenAI Alleges China’s DeepSeek Stole its Intellectual Property to Train its Own Models”
https://www.fdd.org/analysis/2026/02/13/openai-alleges-chinas-deepseek-stole-its-intellectual-property-to-train-its-own-models/
OpenAI’s memo to U.S. Congress alleging DeepSeek used distillation and obfuscated routers to extract model capabilities. - TechCrunch – “Microsoft probing whether DeepSeek improperly used OpenAI APIs”
https://techcrunch.com/2025/01/29/microsoft-probing-whether-deepseek-improperly-used-openais-api/
Details on OpenAI’s terms of service prohibitions and the investigation into DeepSeek’s API usage. - Asia Business Law Journal – “OpenAI-DeepSeek AI distillation dispute”
https://law.asia/openai-deepseek-ai-distillation/
Legal analysis of the contract, copyright, and unfair competition issues in the distillation dispute. - Berkeley Law – “AI Distillation in OpenAI v. DeepSeek”
https://sites.law.berkeley.edu/thenetwork/2025/03/30/the-innovation-dilemma-ai-distillation-in-openai-v-deepseek/
Analysis of the innovation dilemma and parallels to the Oracle v. Google Supreme Court case. - Mexico Business News – “Anthropic Blocks OpenAI API Over GPT-5 Benchmarking Dispute”
https://mexicobusiness.news/cloudanddata/news/anthropic-blocks-openai-api-over-gpt-5-benchmarking-dispute
Report on Anthropic blocking OpenAI’s API access for terms of service violations, highlighting industry hypocrisy.
Disclaimer: This article is for informational and educational purposes only and does not constitute legal, investment, or professional advice. The cases and allegations discussed are based on publicly available reports, regulatory filings, and media coverage. Legal positions regarding AI model distillation, API terms enforcement, and intellectual property rights are evolving and subject to change. The characterization of certain activities as “theft” or “unenforceable” represents analysis based on current information, not definitive legal conclusions. Readers should consult qualified legal counsel regarding specific compliance obligations and intellectual property matters. The author and publisher disclaim any liability for actions taken based on the information contained herein. Regulatory and legal developments may have occurred subsequent to publication.
About the Author
InsightPulseHub Editorial Team creates research-driven content across finance, technology, digital policy, and emerging trends. Our articles focus on practical insights and simplified explanations to help readers make informed decisions.