Industry capture of AI development creates new competitive dynamics

Maslej, Nestor, et al. "The AI Index 2025 Annual Report." AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, April 2025.

I think the 2025 AI Index reveals how rapidly the economics of AI development are reshaping competitive dynamics in ways that traditional legal frameworks haven't anticipated, creating both unprecedented opportunities for scale advantages and new categories of regulatory and operational risk.

The most striking trend in this Stanford report is the complete industry capture of cutting-edge AI development. Nearly 90% of notable AI models in 2024 originated from industry, up from just 60% in 2023. Meanwhile, academia contributed zero notable models in 2024 according to Epoch AI's classification. This isn't just a shift in research leadership—it represents a fundamental change in how AI innovation happens and who controls it.

The implications for product strategy are immediate and profound. Academic research traditionally provided a commons of shared knowledge that companies could build upon. That commons is disappearing at the frontier. Google produced 6 notable models in 2024, OpenAI produced 7, but the gap between what industry can achieve and what academia can afford has become insurmountable. When training a single frontier model like Llama 3.1-405B costs an estimated $170 million, academic institutions simply can't compete.

This creates both opportunities and risks for product organizations. Companies with sufficient capital can build moats that are genuinely difficult to cross. The training costs documented in the report show exponential growth: the original Transformer cost around $670 to train in 2017, RoBERTa Large cost $160,000 in 2019, GPT-4 cost approximately $79 million in 2023. The resource requirements aren't just about money—they're about access to specialized hardware, massive datasets, and the engineering talent to orchestrate training runs that can take 90 days to complete.

Yet the report simultaneously documents the collapse of inference costs, which creates entirely different competitive dynamics. For equivalent performance to GPT-3.5, inference costs dropped from $20 per million tokens in November 2022 to $0.07 by October 2024—a more than 280-fold reduction. Epoch AI estimates that inference costs are falling anywhere from 9 to 900 times per year depending on the task. This means that while training frontier models becomes prohibitively expensive, accessing frontier-level capabilities becomes dramatically cheaper.

For legal teams, this creates a bifurcated landscape. On one hand, the companies that can afford frontier model training are building increasingly defensible positions. The barriers to entry aren't just capital—they're also talent, hardware access, and the institutional knowledge required to execute massive training runs successfully. On the other hand, the rapid commoditization of inference means that the benefits of AI capabilities are spreading quickly across industries and applications.

The geopolitical dimensions add another layer of complexity. The United States produced 40 notable AI models in 2024 compared to China's 15, maintaining its leadership in model production. But the performance gap between U.S. and Chinese models has nearly disappeared. At the end of 2023, performance gaps on key benchmarks were substantial—17.5 percentage points on MMLU, 24.3 on MATH. By the end of 2024, these margins had shrunk to 0.3 and 1.6 percentage points respectively.

This convergence matters for several reasons. First, it suggests that the technical advantages from massive capital investment may be more temporary than expected. DeepSeek-V3, released in December 2024, achieved comparable performance to leading U.S. models while requiring far fewer computational resources according to the report's analysis. If Chinese companies can match U.S. performance with significantly lower training costs, it changes the economics of AI competition globally.

Second, it raises questions about technology transfer and IP protection that legal teams need to anticipate. China continues to lead in AI publications and patents, accounting for 23.2% of AI publications and 69.7% of AI patents globally. The combination of strong research output and rapidly improving model performance suggests that technological advantages may not provide the sustained competitive benefits that many U.S. companies are counting on.

The environmental impact data creates another category of risk that most legal frameworks haven't addressed. Carbon emissions from training frontier models have grown exponentially—from negligible amounts for AlexNet in 2012 to 8,930 tons for Llama 3.1-405B in 2024. For context, the average American emits 18 tons of carbon annually. Training a single frontier model now produces carbon emissions equivalent to nearly 500 Americans for an entire year.

The report documents that power requirements for training frontier models are doubling annually. While hardware becomes more energy efficient—improving by about 40% per year—the scale increases more than offset these gains. This creates regulatory exposure that many companies haven't fully considered. As governments implement more aggressive climate targets, AI training could face carbon restrictions or taxation that significantly affect project economics.

The data scarcity issue presents perhaps the most interesting strategic challenge. Epoch AI's updated projections suggest the current stock of high-quality training data will be fully utilized between 2026 and 2032. This is actually more optimistic than previous estimates, which predicted depletion by 2024, but it still points toward a fundamental constraint on continued scaling.

For product organizations, this creates a race to secure proprietary data advantages before high-quality public data becomes scarce. Companies with access to unique, high-quality datasets—whether through business operations, partnerships, or user-generated content—may find these assets becoming surprisingly valuable as training material becomes constrained.

The synthetic data research documented in the report offers some hope for addressing scarcity, but with significant limitations. Models trained on synthetic data tend to lose representation of edge cases and can suffer from "model collapse" when trained repeatedly on AI-generated content. While layering synthetic data on top of real data avoids the worst degradation, it doesn't necessarily improve performance either.

From an operational perspective, the hardware trends create both opportunities and dependencies. Machine learning hardware performance has grown 43% annually while costs drop 30% per year. This makes AI development more accessible over time, but it also creates dependence on a small number of hardware suppliers. The report shows that 64 notable models were trained on Nvidia A100 chips, with increasing numbers using the newer H100.

This hardware concentration creates supply chain risks that extend beyond normal vendor management. When frontier AI development depends on access to cutting-edge chips that are manufactured by a limited number of suppliers and subject to export controls, hardware access becomes a strategic constraint rather than just a procurement decision.

The open source trends add another complexity. While industry dominates frontier model development, the report documents explosive growth in open source AI projects—4.3 million GitHub AI projects in 2024, up 40% from the previous year. The United States accounts for 23.4% of these projects, with India at 19.9% and Europe at 19.5%.

This creates interesting tensions between the industry concentration at the frontier and the democratization of AI development tools and techniques. Companies need to think carefully about how much of their AI development to keep proprietary versus contributing to open source communities that could accelerate competitor capabilities.

Looking ahead, the report suggests we're entering a phase where AI development becomes simultaneously more concentrated and more distributed. The frontier will be controlled by a small number of organizations with massive resources, but the benefits of AI capabilities will spread rapidly through lower inference costs and improved open source tools.

For legal and product teams, this means preparing for a world where competitive advantages come primarily from scale, unique data, and deployment execution rather than algorithmic innovation alone. The companies that understand this transition and position themselves accordingly will be best prepared for the AI landscape that's emerging.

TLDR: The Artificial Intelligence Index Report 2025, an independent initiative from the Stanford Institute for Human-Centered Artificial Intelligence (HAI), provides a comprehensive overview of AI's current state and its impact on humanity. Co-directed by Yolanda Gil and Raymond Perrault, the report's writing process was even aided by AI tools like ChatGPT and Claude.

• Technical Performance: AI performance on demanding benchmarks (MMMU, GPQA, SWE-bench) sharply increased in 2024, with scores rising by 18.8, 48.9, and 67.3 percentage points respectively. AI systems made major strides in generating high-quality video and, in some programming tasks with limited time, language model agents even outperformed humans. Smaller models are now driving stronger performance, achieving capabilities previously seen only in much larger models (e.g., Phi-3-mini matching PaLM's MMLU score with 142x fewer parameters). The performance gap between open-weight and closed-weight models has significantly narrowed.

• Economic Impact: The cost of querying AI models, like a GPT-3.5 equivalent, dropped over 280-fold in approximately 18 months, from $20.00 to $0.07 per million tokens. Industry continues to lead in notable AI model development, accounting for nearly 90% in 2024, while academia leads in highly cited research. Generative AI skills saw the largest increase in labor market demand in the U.S.. AI is reported to boost productivity and narrow skill gaps in the workforce. AI usage patterns indicate more human augmentation than automation.

• Science and Medicine: AI is driving rapid advances in scientific discovery, including protein sequencing (e.g., AlphaFold 3, ESM3), biological tasks (Aviary), and wildfire prediction (FireSat). Clinical knowledge of leading LLMs continues to improve (e.g., OpenAI's o1 scored 96.0% on MedQA). Medical AI ethics publications quadrupled from 2020 to 2024.

• Responsible AI & Policy: Efforts toward responsible AI development continue. Foundation model transparency improved, though significant opacity remains. LLMs designed to be explicitly unbiased still demonstrate implicit biases. AI-related election misinformation spread globally, but its measurable impact is unclear. There is increased global cooperation on AI governance, with countries launching AI safety institutes. The U.S. NIST unveiled a framework to mitigate GenAI risks. Data consent protocols for web domains used in AI training datasets saw a significant increase in restrictions between 2023 and 2024.

The report emphasizes that AI is no longer just a potential future, but a transformative force shaping humanity now.

You might also like

How 2023 research predicted AI audit washing would enable discrimination

How systematic privacy governance becomes competitive advantage for AI deployment