Alibaba's offline training framework changes the economics of building research agents
Alibaba's offline training framework creates pre-aligned agent models without API costs, making custom research agent development more economically feasible for enterprises with domain-specific needs.
VentureBeat's Ben Dickson reports on Alibaba's new open-source framework, Agentic Continual Pre-training (Agentic CPT), which trains language models to function as research agents without racking up API costs during the training process. The approach introduces a pre-alignment stage between standard pre-training and fine-tuning, processing 300 billion tokensof agent data to create what Alibaba researcher Xinyu Wang calls an "agentic base model." Their AgentFounder-30B beat DeepSeek-V3.1 by 10 percentage points on the BrowseComp benchmark and became the first open-source model to crack 30 points on Humanity's Last Exam.
The offline synthesis method—both First-order Action Synthesis and Higher-order Action Synthesis run without external API calls, which removes a major cost barrier to developing custom agents. Wang notes that enterprises can "perform light adaptation using their in-domain corpora and proprietary tools" rather than fighting a general-purpose model that lacks agentic instincts. The model also builds in self-correction: when a page is inaccessible, it reroutes; when evidence is thin, it flags uncertainty instead of guessing. For high-stakes work, Wang still recommends human review at decision points.
https://venturebeat.com/ai/build-research-agents-without-api-costs-alibabas-offline-data-synthesis