The 100,000-Prompt Heist: How Google Caught Competitors Trying to Clone Gemini (And Why OpenAI’s API Terms Are Now Useless)

It was the digital equivalent of stealing a car by test-driving it 100,000 times. In mid-2025, Google’s threat intelligence systems detected something unprecedented: a sustained, systematic campaign to extract the “brain” of its Gemini AI model. Over the course of the attack, adversaries fired more than 100,000 carefully crafted prompts at Gemini, attempting to capture its reasoning processes, decision-making patterns, and proprietary capabilities—all through legitimate API access.

Google’s discovery, disclosed in February 2026, wasn’t just another security incident. It was a wake-up call that exposed a terrifying vulnerability in the entire AI industry: the most sophisticated artificial intelligence systems in the world can be stolen not by hacking their servers, but simply by asking them questions. And the legal frameworks designed to prevent this—Terms of Service agreements, API restrictions, and intellectual property laws—are proving shockingly ineffective.

This is the story of the 100,000-prompt heist, the emerging threat of “model extraction” attacks, and why the OpenAI-DeepSeek controversy may have rendered traditional API protections obsolete.

What Is Model Extraction? The Art of AI Cloning

Model extraction—also known as “distillation” or “model stealing”—is a technique where adversaries use legitimate access to a machine learning model (through APIs or chat interfaces) to systematically query it, collect outputs, and use that data to train a new “student” model that mimics the “teacher” model’s behavior.

Think of it like this: If you wanted to replicate a master chef’s signature dishes without access to their recipes, you could order every item on their menu, taste each one, and reverse-engineer the ingredients and techniques. Do this enough times, and you could open your own restaurant serving nearly identical food—without ever stepping foot in the original chef’s kitchen.

In AI terms, the process works like this:

Systematic Querying: The attacker sends thousands or millions of prompts to the target model
Output Capture: The model’s responses are recorded and stored
Training Data Creation: These input-output pairs become training data for a new model
Student Model Training: A smaller, cheaper model is trained to replicate the target’s behavior
Competitive Deployment: The cloned model enters the market as a competing product

The technique isn’t new—knowledge distillation has been a legitimate AI optimization method since Geoffrey Hinton formalized it in 2015. What’s changed is the scale, sophistication, and commercial motivation behind modern extraction attacks.

The Gemini Heist: Inside Google’s Discovery

In February 2026, Google’s Threat Intelligence Group (GTIG) published a startling report detailing what they called “model extraction attacks” against Gemini. The findings revealed a new frontier in AI security threats—one that doesn’t involve traditional hacking but exploits the fundamental openness of AI-as-a-service business models.

The 100,000-Prompt Campaign

One specific attack stood out for its scale and sophistication. Over an extended period, adversaries prompted Gemini more than 100,000 times across various non-English languages. The campaign wasn’t random—it was a carefully orchestrated attempt to:

Extract reasoning traces: Force Gemini to reveal its internal “thinking” processes
Capture multilingual capabilities: Target languages beyond English to clone localized versions
Map decision boundaries: Understand how Gemini handles edge cases and complex queries
Replicate specialized knowledge: Extract domain-specific expertise embedded in the model

Google described the attack as “reasoning trace coercion”—attempts to bypass normal output summarization and force Gemini to expose its full internal reasoning processes. In one documented case, attackers instructed Gemini that “the language used in the thinking content must be strictly consistent with the main language of the user input”—a subtle prompt designed to extract more detailed internal processing.

Who Was Behind the Attacks?

Google believes the culprits were “commercially motivated” private companies and researchers seeking competitive advantage—not nation-state hackers or cybercriminals. The attacks originated from around the world, with Google specifically noting adversaries in North Korea, Russia, and China among those attempting to clone Gemini’s capabilities.

John Hultquist, chief analyst for Google’s Threat Intelligence Group, put it bluntly: “We’re going to be the canary in the coal mine for far more incidents.”

The implications are profound. If Google—one of the world’s most technologically sophisticated companies with virtually unlimited security resources—can’t prevent its AI from being systematically extracted, what hope do smaller AI companies have?

The DeepSeek Precedent: When API Terms Become Unenforceable

To understand why Google’s discovery matters so much, we need to look at the controversy that erupted just weeks earlier. In January 2025, Chinese AI startup DeepSeek released its R1 reasoning model, claiming performance comparable to OpenAI’s GPT-4 at a development cost of just $5.6 million—a fraction of the estimated $100 million+ spent on GPT-4.

The AI industry was stunned. How could a Chinese company with limited access to advanced chips (due to U.S. export controls) suddenly match America’s leading AI models?

The answer, according to OpenAI and Microsoft: distillation.

OpenAI’s Accusations

In February 2026, OpenAI sent a memo to the U.S. House Select Committee on China alleging that DeepSeek had engaged in systematic intellectual property theft through model distillation. The claims were explosive:

DeepSeek employees used obfuscated third-party routers to mask their identity while accessing OpenAI’s API
They developed code to programmatically extract outputs for distillation purposes
They circumvented access restrictions to continue extraction after detection
The activity represented “free-riding” on American R&D to replicate frontier AI capabilities

OpenAI’s terms of service explicitly prohibit using outputs to “develop models that compete with OpenAI” or “automatically or programmatically extract data or Output.” Yet DeepSeek allegedly did exactly that—at scale.

The Enforcement Problem

Here’s where the story gets legally fascinating—and troubling for AI companies. Despite OpenAI’s clear terms of service and Microsoft’s investigation, no legal action has been taken against DeepSeek. The reasons reveal fundamental weaknesses in API-based IP protection:

Enforcement Challenge	Why It Matters	DeepSeek Case Example
Jurisdictional barriers	Cross-border enforcement is difficult and expensive	DeepSeek is China-based; OpenAI is U.S.-based
Burden of proof	Proving distillation requires access to training data	Only DeepSeek’s final model is public, not training data
Detection limitations	Sophisticated attackers can mask extraction patterns	Use of third-party routers and distributed querying
Legal precedent gaps	No clear case law on AI model distillation as IP theft	Uncertain whether API outputs qualify as trade secrets
Copyright ambiguity	AI-generated outputs may not be copyrightable	OpenAI’s ToS transfers output rights to users

As legal experts at Ronly & Tenwen Partners noted: “Even if the agreement is valid, OpenAI still bears the burden of proof to demonstrate that DeepSeek breached the agreement and caused actual losses.”

The uncomfortable reality: OpenAI’s API terms may be virtually unenforceable against determined, sophisticated adversaries.

Why API Terms Are Failing: The Legal and Technical Reality

The Google and OpenAI cases expose a fundamental tension in the AI industry. Companies have built business models around providing API access to their most valuable intellectual property—their trained models—while attempting to restrict how customers use that access. But the technical and legal foundations of these restrictions are crumbling.

1. The Technical Impossibility of Detection

Modern extraction attacks are designed to evade detection. Attackers can:

Distribute queries across thousands of accounts: Using shell companies, resellers, and compromised credentials
Mimic legitimate usage patterns: Spacing out queries to avoid rate-limit triggers
Use “sleeper” accounts: Building history of normal usage before beginning extraction
Route through third-party services: Masking true origin through VPNs, proxies, and cloud services

Google claims it detected the 100,000-prompt campaign in real-time, but the company hasn’t disclosed how many extraction attempts go undetected—or how long the Gemini campaign operated before detection.

2. The Copyright Problem

Here’s a paradox that should terrify AI companies: OpenAI’s own terms of service state that all rights to output content are transferred to the user. If users own the outputs, how can OpenAI claim those same outputs can’t be used to train competing models?

Legal analysis from Berkeley Law highlights the issue: “OpenAI’s output data used by DeepSeek for distillation lacks sufficient human intellectual contribution and is unlikely to be considered eligible for copyright protection.”

If API outputs aren’t copyrightable, and users own them anyway, what legal basis exists to prevent their use in model training?

3. The “Fair Use” Defense

The 2021 Supreme Court ruling in Google v. Oracle established that reimplementation of functional software elements (like APIs) can constitute fair use. While not directly applicable to AI model distillation, the ruling’s logic—that certain forms of software replication can drive innovation rather than hinder it—provides a potential defense for distillation practices.

As one legal analysis noted: “Had Oracle prevailed, developers might have faced significant restrictions on API usage, potentially stifling interoperability and innovation within software ecosystems.”

4. The Anthropic Precedent: Even AI Companies Can’t Follow Their Own Rules

The hypocrisy in this space reached new heights in August 2025, when Anthropic blocked OpenAI’s API access after discovering OpenAI employees were using Claude Code to benchmark and develop GPT-5. Anthropic’s terms explicitly prohibit using its services to “build a competing product or service, including training competing AI models.”

OpenAI’s response? “While we respect Anthropic’s decision to cut off our API access, it is disappointing considering our API remains available to them.”

If OpenAI—the company crying foul about DeepSeek’s distillation—can’t resist using competitors’ APIs for competitive research, what chance do terms of service have of constraining actual bad actors?

The Broader Implications: An Industry Under Siege

The Google and OpenAI cases aren’t isolated incidents—they’re symptoms of a structural vulnerability affecting the entire AI industry. As Google noted in its threat report: “Historically, adversaries seeking to steal high-tech capabilities used conventional computer-enabled intrusion operations to compromise organizations and steal data containing trade secrets. For many AI technologies where LLMs are offered as services, this approach is no longer required; actors can use legitimate API access to attempt to ‘clone’ select AI model capabilities.”

The Economic Threat

AI companies have spent billions training their models. GPT-4 reportedly cost over $100 million. Google’s Gemini family represents a multi-billion dollar investment. Yet these models can potentially be replicated for a fraction of the cost through distillation.

The economic implications are stark:

Reduced competitive moats: First-mover advantage diminishes when models can be quickly cloned
Deflationary pressure: Commoditization of AI capabilities drives down prices and margins
Investment risk: VCs may hesitate to fund AI startups whose IP can be easily extracted
Innovation disincentives: Why invest in frontier research if competitors can simply steal it?

The Security Risk

Beyond economic concerns, model extraction creates genuine security dangers. Distilled models often lack the safety guardrails of their parent models. As Google warned: “A coding model could be targeted by an adversary wishing to replicate capabilities in an environment without guardrails.”

Imagine a version of GPT-4 or Gemini with all the capabilities but none of the safety restrictions—no refusals for harmful requests, no content filters, no ethical constraints. That’s what extraction attacks can create.

Defensive Strategies: Can AI Companies Protect Themselves?

Faced with the failure of legal protections, AI companies are turning to technical countermeasures. But these too have limitations.

1. Detection and Rate Limiting

Google claims it detected the 100,000-prompt campaign in real-time and “lowered the risk.” Techniques include:

Pattern analysis: Identifying systematic querying behavior
Rate limiting: Restricting API calls from single accounts
Output perturbation: Slightly altering responses to poison training data
Watermarking: Embedding detectable signals in outputs

But these measures create friction for legitimate users and can often be circumvented by sophisticated attackers.

2. Legal Innovation

Some companies are exploring novel legal frameworks:

Technical protection measures: DMCA-style protections for AI models
Trade secret claims: Arguing model weights and architectures are trade secrets
Contractual liquidated damages: Pre-set penalties for ToS violations
International arbitration: Binding dispute resolution in favorable jurisdictions

However, these approaches remain legally untested and practically difficult to enforce.

3. The Nuclear Option: Closing Access

The ultimate defensive measure is to restrict API access entirely—moving from open APIs to closed, vetted partnerships. But this undermines the business models of AI-as-a-service companies and could stifle innovation.

As one analyst noted: “This screws over legit academic researchers who need access to multiple models for comparison studies. Sets a bad precedent too—we might end up with everyone blocking everyone else, which would kill the collaborative research that actually helps the whole field move forward.”

The Regulatory Horizon: Government Intervention?

With private enforcement failing, governments may step in. The OpenAI-DeepSeek dispute has already attracted White House attention, with AI advisor David Sacks stating: “I think one of the things you’re going to see over the next few months is our leading AI companies taking steps to try and prevent distillation. That would definitely slow down some of these copycat models.”

Potential regulatory approaches include:

Export controls: Extending chip restrictions to model access
Mandatory watermarking: Requiring detectable signals in AI outputs
Transparency requirements: Forcing disclosure of training data sources
International agreements: Coordinating IP protection across jurisdictions

But regulation moves slowly, and the AI industry evolves rapidly. By the time laws are enacted, the technology—and methods of extraction—will have moved on.

Conclusion: The End of API-Based AI Business Models?

The 100,000-prompt heist against Gemini and the DeepSeek controversy represent more than security incidents—they’re existential challenges to the AI industry’s fundamental business model. Companies have built empires on the assumption that they can provide API access to their models while maintaining control over their intellectual property. That assumption is proving false.

As Google’s Threat Intelligence Group warned, these attacks “effectively represent a form of intellectual property (IP) theft.” But calling it theft and preventing it are two very different things.

The uncomfortable truth is that API terms of service are unenforceable against sophisticated, motivated adversaries operating across jurisdictions. Technical countermeasures are imperfect and create friction for legitimate users. Legal frameworks are outdated and ill-suited to AI’s unique characteristics.

For AI companies, the options are stark: accept that their models will be distilled and compete on speed of innovation rather than IP protection; retreat to closed, vetted partnerships that limit growth; or push for draconian government regulation that may stifle the entire industry.

For the rest of us, the implications are equally significant. The democratization of AI through distillation could accelerate innovation and reduce costs—but it could also reduce incentives for the massive investments required to develop frontier models. We may be heading toward a world where AI capabilities are commoditized, but AI progress slows as the economic model that funded breakthroughs collapses.

The 100,000-prompt heist wasn’t just an attack on Google—it was a proof of concept for the future of AI competition. And that future looks nothing like the controlled, regulated landscape AI companies hoped to create.

References

Ars Technica – “Attackers prompted Gemini over 100,000 times while trying to clone it, Google says”
https://arstechnica.com/ai/2026/02/attackers-prompted-gemini-over-100000-times-while-trying-to-clone-it-google-says/
Detailed reporting on Google’s discovery of the 100,000-prompt model extraction campaign against Gemini.
PCMag – “Google: Hackers Are Trying to ‘Clone’ Gemini for Cyberattacks”
https://www.pcmag.com/news/google-hackers-are-trying-to-clone-gemini-ai-for-cyberattacks
Google’s Threat Intelligence Group report on model extraction attacks and IP theft allegations.
NBC News – “Google: Gemini hit with 100,000+ prompts in cloning attempt”
https://www.nbcnews.com/tech/security/google-gemini-hit-100000-prompts-cloning-attempt-rcna258657
Interview with John Hultquist of Google Threat Intelligence Group on the “canary in the coal mine” implications.
Foundation for Defense of Democracies – “OpenAI Alleges China’s DeepSeek Stole its Intellectual Property to Train its Own Models”
https://www.fdd.org/analysis/2026/02/13/openai-alleges-chinas-deepseek-stole-its-intellectual-property-to-train-its-own-models/
OpenAI’s memo to U.S. Congress alleging DeepSeek used distillation and obfuscated routers to extract model capabilities.
TechCrunch – “Microsoft probing whether DeepSeek improperly used OpenAI APIs”
https://techcrunch.com/2025/01/29/microsoft-probing-whether-deepseek-improperly-used-openais-api/
Details on OpenAI’s terms of service prohibitions and the investigation into DeepSeek’s API usage.
Asia Business Law Journal – “OpenAI-DeepSeek AI distillation dispute”
https://law.asia/openai-deepseek-ai-distillation/
Legal analysis of the contract, copyright, and unfair competition issues in the distillation dispute.
Berkeley Law – “AI Distillation in OpenAI v. DeepSeek”
https://sites.law.berkeley.edu/thenetwork/2025/03/30/the-innovation-dilemma-ai-distillation-in-openai-v-deepseek/
Analysis of the innovation dilemma and parallels to the Oracle v. Google Supreme Court case.
Mexico Business News – “Anthropic Blocks OpenAI API Over GPT-5 Benchmarking Dispute”
https://mexicobusiness.news/cloudanddata/news/anthropic-blocks-openai-api-over-gpt-5-benchmarking-dispute
Report on Anthropic blocking OpenAI’s API access for terms of service violations, highlighting industry hypocrisy.

Disclaimer: This article is for informational and educational purposes only and does not constitute legal, investment, or professional advice. The cases and allegations discussed are based on publicly available reports, regulatory filings, and media coverage. Legal positions regarding AI model distillation, API terms enforcement, and intellectual property rights are evolving and subject to change. The characterization of certain activities as “theft” or “unenforceable” represents analysis based on current information, not definitive legal conclusions. Readers should consult qualified legal counsel regarding specific compliance obligations and intellectual property matters. The author and publisher disclaim any liability for actions taken based on the information contained herein. Regulatory and legal developments may have occurred subsequent to publication.

About the Author

InsightPulseHub Editorial Team creates research-driven content across finance, technology, digital policy, and emerging trends. Our articles focus on practical insights and simplified explanations to help readers make informed decisions.