The Agent That Judges the Agent

Anthropic is publishing details on the architecture their internal teams use for long-running AI development (this is not the Claude Code leak). The core design decision: separate the agent doing the work from the agent judging it.

Three agents: planning, generation, evaluation. The evaluation agent is calibrated with specific scoring criteria before the harness runs. Five to fifteen iterative cycles. Context managed through structured handoff artifacts rather than compaction, so the decision record survives across context boundaries.

Prithvi Rajasekaran from Anthropic Labs: "Separating the agent doing the work from the agent judging it proves to be a strong lever."

The pattern isn't new — it's how human organizations do code review, audit, and four-eyes approval. The harness implements the same principle at machine speed.

The governance implication is direct: most current agentic AI deployments don't have a systematic answer to the question "who evaluates the output before it ships?" The spot-check model works at low volume. It doesn't hold as volume grows.

Structural oversight — built into the architecture, calibrated in advance, required before output passes downstream — is a different category than optional spot-checking. The harness is a concrete example of what that looks like.

Source: https://www.infoq.com/news/2026/04/anthropic-three-agent-harness-ai/

You might also like

What 100 scenarios reveal about AI risk in legal conflict resolution

What does 500 Articles on AI look like?