Agentic AI doesn't need better models — it needs a data constitution

Agentic AI's real failure point isn't the model — it's the data pipeline. When agents act autonomously on corrupted data, output guardrails can't save you. Your data needs a constitution, not better prompts.

4 min read
Agentic AI doesn't need better models — it needs a data constitution

The industry consensus is that 2026 is the year of agentic AI. We're moving past chatbots that summarize text and into the era of autonomous agents that execute tasks — booking flights, diagnosing outages, managing infrastructure, personalizing content in real time.

But here's what keeps getting missed in the breathless coverage: the primary reason autonomous agents fail in production isn't the model. It's the data.

Manoj Yerrasani, writing in VentureBeat, makes this case forcefully — and from a position that carries weight. As a technology executive overseeing platforms serving 30 million concurrent users during events like the Olympics and the Super Bowl, he's seen what happens when agents encounter dirty data at scale. His proposed solution is what he calls a "data constitution" — a framework called Creed that enforces thousands of automated rules before any data touches an AI model.

The shift from nuisance to catastrophe

In the previous era of human-in-the-loop analytics, data quality was a manageable nuisance. An pipeline hiccups, a dashboard shows the wrong revenue number, an analyst spots it and flags it. The blast radius was contained.

With autonomous agents, that safety net disappears.

A drifting data pipeline no longer just produces a wrong number on a report. It causes an agent to take the wrong action — provisioning the wrong server type, recommending a horror movie to a child watching cartoons, hallucinating a customer service answer based on corrupted vector embeddings. The failure mode shifts from "someone sees a bad chart" to "the system confidently acts on bad information at scale, with no human in the loop to catch it."

This is the core insight, and it's one that product counsel and AI governance teams should internalize: the risk profile of bad data changes fundamentally when you remove human oversight from the decision chain.

Why vector databases make this worse

Yerrasani highlights a particularly insidious problem with RAG-based architectures. In traditional SQL databases, a null value is just a null value. In a vector database — which functions as an agent's long-term memory — a null value or schema mismatch can warp the semantic meaning of an entire embedding.

Consider the scenario he describes: metadata drifts due to a race condition, tagging a news clip as "live sports." When an agent queries for touchdown highlights, it retrieves the news clip because the vector similarity search is operating on a corrupted signal. At scale, that error propagates to millions of users before any monitoring system catches it.

For legal and governance teams, this matters because it means traditional data quality monitoring — the kind that catches problems after they happen — is structurally insufficient for agentic systems. Quality controls have to shift to the absolute left of the pipeline, before data ever reaches the model.

The three principles of the Creed framework

Yerrasani's framework rests on three non-negotiable principles:

1. The quarantine pattern. If a data packet violates a contract, it gets immediately quarantined — it never reaches the vector database. The philosophy: it's far better for an agent to say "I don't know" due to missing data than to confidently lie due to bad data. This circuit-breaker pattern is essential for preventing high-profile hallucinations.

2. Schema is law. The industry spent years moving toward schemaless flexibility to move fast. For AI pipelines, that trend has to reverse. Yerrasani's implementation enforces more than 1,000 active rules running across real-time streams — not just null checks, but business logic consistency. Does the user segment in the event stream match the active taxonomy in the feature store? If not, block it.

3. Vector consistency checks. This is the new frontier: automated checks to ensure that text chunks stored in a vector database actually match the embedding vectors associated with them. Silent failures in embedding model APIs can leave you with vectors that point to nothing, causing agents to retrieve pure noise.

The governance implication product counsel can't ignore

Here's what I think makes this piece essential reading for legal and AI governance teams: it reframes data quality from a technical hygiene issue into a governance architecture decision.

When we talk about AI guardrails, the conversation tends to center on model behavior — output filters, content policies, prompt engineering. But if Yerrasani is right (and the engineering logic is compelling), the more consequential governance layer sits upstream, at the data ingestion point.

For product counsel, that translates to several concrete questions:

  • Do your AI governance frameworks address data pipeline integrity, or only model outputs? If your AI risk assessments focus exclusively on what the model says and does, you're missing the layer where many production failures actually originate.
  • Are data contracts enforceable artifacts in your organization? Not as legal documents, but as automated rules that actually block bad data from reaching agents. The distinction between "we have data quality standards" and "we enforce data quality standards programmatically" is the difference between a policy and a constitution.
  • Does your incident response planning account for silent agent failures? A rogue agent operating on corrupted data doesn't throw errors. It confidently takes wrong actions. Your monitoring and escalation frameworks need to account for failure modes that don't look like failures.

The cultural dimension

Yerrasani's most practical insight might be about organizational dynamics. Engineers hate guardrails — they view strict schemas and data contracts as bureaucratic friction. His team flipped the incentive by demonstrating that the Creed framework was actually an accelerator: by guaranteeing input data purity, they eliminated weeks of debugging model hallucinations.

This mirrors a pattern I've seen in legal-product dynamics more broadly. Governance frameworks that present themselves as compliance overhead get resisted. Governance frameworks that demonstrably reduce debugging time, prevent production incidents, and accelerate deployment velocity get adopted. The framing matters as much as the substance.

The bottom line

If your organization is building an agentic AI strategy for 2026, the message here is direct: stop obsessing over which foundation model is slightly higher on this week's leaderboard. Start auditing your data contracts.

An AI agent is only as autonomous as its data is reliable. And for governance teams, that means the most consequential AI risk decisions you'll make this year might not be about model selection or output filtering — they might be about whether your data pipeline has a constitution or just a set of suggestions.

https://venturebeat.com/infrastructure/the-era-of-agentic-ai-demands-a-data-constitution-not-better-prompts