[This post is authored by Akshat Agrawal. Akshat is a practicing litigator working at Saikrishna and Associates. He did his LLM from Berkeley Law in 2023 specialising in IP and Tech law. His previous posts can be found here. He adds the following disclaimer: After some discussion around an earlier draft and an admitted history of verbosity, I would also like to acknowledge the usage of Claude.ai for helping me re-frame the draft more succinctly and in a reader friendly manner. Views expressed here are personal.]
Background
As Sabeeh mentioned in his Tidbit, the Delhi High Court has issued summons to Open AI in the suit instituted by ANI Media Pvt. Ltd, primarily alleging infringement of its copyright in published news articles that are publicly available. A lot has been written, both on this blog and elsewhere, examining the issue from multiple perspectives: arguments suggesting that using publicly available copyrighted works for AI training constitutes infringement (see here, here, here, here and here) as well as counterarguments maintaining that such use does not infringe copyright (see here, here, here).
However, the most compelling aspect of the hearing, for me, emerged from Open AI’s statement that “Without prejudice to its rights and contentions, as of October 2024, Open AI has blocklisted ANI’s domain – http://www.aninews.in”. To understand the significance of this development, some context is essential:
Terms and Conditions: The Evolution of Opt-Out
OpenAI’s terms and conditions, effective until March 2023, included a carefully crafted opt out policy stating:
“(c) Use of Content to Improve Services. We do not use Content that you provide to or receive from our API (“API Content”) to develop or improve our Services. We may use Content from Services other than our API (“Non-API Content”) to help develop and improve our Services. You can read more here about how Non-API Content may be used to improve model performance. If you do not want your Non-API Content used to improve Services, you can opt out by filling out this form. Please note that in some cases this may limit the ability of our Services to better address your specific use case.”
This form, as of today, leads to a page that states: “As of October 25, 2023, we’ve migrated this form to our privacy request portal. Please visit privacy.openai.com to submit your user content opt out request.”
These terms were altered on 23rd October 2024. The opt out policy which continues to find mention in the new terms states:
“Opt out. If you do not want us to use your Content to train our models, you can opt out by following the instructions in this Help Center article. Please note that in some cases this may limit the ability of our Services to better address your specific use case.”
Importantly, these opt out mechanisms were for opting out basis privacy concerns, as is also visible on the Help Center page referenced above.
Open AI also released an open letter on 8th January 2024, stating:
“Training is fair use, but we provide an opt-out because it’s the right thing to do. Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.
……
That being said, legal right is less important to us than being good citizens. We have led the AI industry in providing a simple opt-out process (opens in a new window) for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites.”
It is pursuant to these policies that Open AI has allowed blocklisting of ANI’s website from use in its training process.
Engineering Dominance
What appears on the surface as OpenAI’s partial concession reveals, upon deeper examination, a sophisticated market control strategy. While presented as an ethical step towards apparent good citizenship, this calculated move effectively creates enduring asymmetric advantages through multiple interconnected mechanisms.
Since its launch in 2015 and subsequent transformation to a for profit in 2019, OpenAI was developing its foundation models through unrestricted access to global content, regardless of copyright protection status. During this crucial phase, it created systems that mastered not just content processing, but the fundamental skill of learning itself. It developed neural architectures with efficient learning capabilities – a sophisticated advantage that, by its very nature, cannot be replicated under today’s restricted conditions. This established the first layer of asymmetry: a fundamental distinction in learning capability, not merely accumulated knowledge.
The second phase masterfully exploited a strategic window when Open AI and other such companies could optimize their architectures with minimal regulatory oversight. This timing wasn’t merely fortunate–it represented a calculated opportunity to develop maximum learning capability with minimal restriction. The result manifests as a form of technical compound interest: early unrestricted access built capabilities that enhance all future learning, even with limited training data.
The third phase—the platform’s launch—attracted significant media and regulatory attention, particularly regarding consent for using copyright-protected works. This critical juncture prompted OpenAI to introduce its opt out mechanism, first appearing in its 2023 terms and conditions, although touted in privacy language, as against copyrighted content.
This strategic move effectively creates a learning divergence gap for emerging AI developers. OpenAI’s concession to allow opt out from its training datasets, despite maintaining its position on non-infringement, creates an insurmountable barrier. New entrants must now develop comparable capabilities with restricted content access while competing against systems already possessing optimized learning architectures. This normalizes opt out as a feasible balancing mechanism under the guise of ethical considerations, despite the ongoing contention that training-purpose content use would be non-infringing under copyright law.
As OpenAI articulated in its court submission using the differential equation analogy, having already extracted the underlying meta-information, it no longer needs to reference its original learning sources. Its historically unrestricted training created optimized learning systems that extract superior value from any training content—a competency that new market entrants cannot effectively replicate under the new opt-out paradigm. For these new entrants, this creates an escalating technical debt that becomes increasingly insurmountable. They must attempt to develop basic capabilities with restricted content access, facing diminishing returns on their training investments while competing against systems that continuously enhance their learning efficiency.
The implications are profound: superior learning architectures extract greater value from any new content, generating enhanced outputs that attract more users. These users, in turn, provide additional interaction data, further improving system performance in an accelerating cycle that automatically widens the quality gap.
Perhaps the most significant impact of this strategy lies in OpenAI’s ability to shape the very trajectory of AI development. Established players effectively dictate research priorities, with their technical approaches becoming de facto standards. Alternative approaches struggle for resources and attention, channeling innovation increasingly toward existing paradigms.
Moreover, the true sophistication of this strategy emerges in the dual nature of content access these companies have engineered. While implementing public opt outs, they have simultaneously secured intricate networks of subscriber content partnerships to ensure a continuous flow of high-quality training data. Each partnership enhances the company’s attractiveness to potential future partners, establishing a self-reinforcing network effect in content access itself.
Lost Defenses?
Beneath the veneer of a seemingly reasonable opt-out mechanism, have emerging AI companies who choose not to implement similar policies lost crucial defenses?
Previously, companies could maintain that their use of training content was non-expressive, since no human ever accessed the training copies, and hence non-infringing on grounds of scope of rights, as against any back end defense of fair use, negating any need for an opt out policy. The content served purely to extract patterns and information rather than reproduce creative expression. However, a market leader’s acknowledgment of the right/ or an ability to control AI training use through opt out mechanisms—extending to content rather than solely personal data—significantly undermines this defense.
Similarly, Open AI’s own technical necessity argument (in its comments filed with the UK House of Lords Communication and Digital Select Committee)—that comprehensive content access is essential for AI development—becomes increasingly difficult to maintain when industry leaders have demonstrated otherwise through their opt out policies. These policies, while appearing to champion ethical development, effectively transform potential regulatory threats into competitive barriers.
The implementation of opt out policies by a market leader represents more than mere market dominance—it establishes a new form of technological control that combines technical, regulatory, and market advantages in self-perpetuating ways. Each layer strengthens the others, creating a form of market control that intensifies over time through multiple feedback loops.
For emerging AI companies, this presents a fundamental challenge: they must now develop competitive capabilities under restrictions that didn’t exist when market leaders built their foundations. Without significant regulatory intervention specifically targeting these reinforcing feedback loops, we risk a future where AI development remains controlled by those who secured early advantages.
The Copyright Debate
A fundamental question that anyone invested in the Generative AI vs. Copyright debate must consider is this:
When you press CTRL + P on your desktop to save this publicly available article from SpicyIP’s platform for your reading and internal training, to produce future legal/blog articles without reproducing any verbatim/substantial content, are you infringing my Copyright?
If you’re doing the same with a thousand SpicyIP articles for your own learning and development, to answer queries or professionally advise clients (a commercial endeavor), are you infringing any of my rights under the Copyright Act?
The answer to this debate fundamentally lies within this very question.
As for existential concerns, also raised by ANI in its arguments highlighting diversion and replacement of its core-business model – this is a deeper issue beyond copyright: AI doesn’t just challenge content creation—it fundamentally transforms creative capacity itself. Simply implementing licensing fees or entry barriers through copyright law misses the point. While creators might receive modest compensation or control, they remain vulnerable to AI systems that can potentially reshape human cultural production, even by licensed learning.
The real challenge is protecting human creative agency in an AI-driven world. Rather than focusing on exclusionary rights and market-based solutions, we need a broader framework that:
- Protects creative agency itself, not just creative works
- Supports human creativity through education and diverse exposure
- Provides economic security for cultural workers, which is not dependent on market metrics
- Integrates AI in ways that enhance rather than replace human creativity
Rather than relying on exclusionary rights, which create enclosures on access/exposure and thus creative capacity itself, one solution, to my mind lies in positive legal provisions: cultural funds, infrastructure support, technology integration support, educational grants, and common cultural resources—institutional tools that nurture human creative capacity rather than restrict AI access. This approach addresses the existential concern while promoting beneficial human-AI coexistence in cultural production.