Alibaba's offline training framework changes the economics of building research agents

VentureBeat's Ben Dickson reports on Alibaba's new open-source framework, Agentic Continual Pre-training (Agentic CPT), which trains language models to function as research agents without racking up API costs during the training process. The approach introduces a pre-alignment stage between standard pre-training and fine-tuning, processing 300 billion tokensof agent data to create what Alibaba researcher Xinyu Wang calls an "agentic base model." Their AgentFounder-30B beat DeepSeek-V3.1 by 10 percentage points on the BrowseComp benchmark and became the first open-source model to crack 30 points on Humanity's Last Exam.

The offline synthesis method—both First-order Action Synthesis and Higher-order Action Synthesis run without external API calls, which removes a major cost barrier to developing custom agents. Wang notes that enterprises can "perform light adaptation using their in-domain corpora and proprietary tools" rather than fighting a general-purpose model that lacks agentic instincts. The model also builds in self-correction: when a page is inaccessible, it reroutes; when evidence is thin, it flags uncertainty instead of guessing. For high-stakes work, Wang still recommends human review at decision points.

https://venturebeat.com/ai/build-research-agents-without-api-costs-alibabas-offline-data-synthesis

You might also like

When autonomous AI creates liability, you can't explain

Agentic AI systems don't wait for instructions—they decide and act independently