Engineering precision to solve the AI governance gap

Kasirzadeh, A., & Gabriel, I. (2025). Characterizing AI Agents for Alignment and Governance. arXiv preprint arXiv:2504.21848

The governance gap that's been keeping me up at night finally has a solution, and it comes from an unlikely place: the same engineering precision we use to classify autonomous vehicles. Atoosa Kasirzadeh from Carnegie Mellon and Iason Gabriel from Google DeepMind just published a framework that does what none of us have managed to do—create a structured way to assess AI agents that actually maps to legal risk and product decisions.

The core insight is deceptively simple: instead of treating "AI risk" as some monolithic threat, we can characterize any AI agent along four specific dimensions that each raise distinct governance questions. Think of it as building a risk profile that connects technical capabilities directly to compliance requirements, which means we can finally move beyond the binary thinking that's plagued this space.

Their autonomy dimension adapts the familiar SAE levels from self-driving cars, running from A.0 (no autonomy, entirely dependent on human direction) through A.5 (full autonomy, no oversight needed). What makes this immediately useful is how it connects to oversight requirements. An A.1 system conducting single automated tasks needs only periodic review, while A.4 systems making autonomous decisions across domains require continuous monitoring with override protocols. The legal implications become clear when you realize that A.4 and A.5 systems can act independently enough to create liability questions about whether the principal maintains sufficient control to claim safe harbor protections.

The efficacy dimension tackles something we've been dancing around in risk assessments—the actual impact an agent can have. They break this into environmental impact levels (observation only through comprehensive) crossed with environment types (simulated, mediated, physical). An agent with comprehensive impact in physical environments rates E.5, the highest risk category, while one with constrained capabilities in simulated environments might only rate E.1. For product teams, this translates directly to safety protocol requirements. An E.5 system controlling infrastructure needs comprehensive impact assessments and fail-safe mechanisms, while E.1 systems might need only basic controls.

Goal complexity proves to be the trickiest dimension for governance, running from GC.1 (single unified goals) to GC.5 (unbounded goal generation). The challenge here is verification: simple goals can be validated through straightforward testing, but complex goals that involve long-term planning or sophisticated trade-offs between competing values demand advanced alignment approaches. From a legal perspective, this dimension determines how we approach contractual specifications and performance metrics. You can write clear SLAs for a GC.1 document summarization system, but a GC.4 system that breaks down complex objectives into dynamic subgoals requires entirely different contracting approaches.

The generality dimension captures something crucial for systemic risk assessment. Narrow systems (G.1) like AlphaGo pose significant challenges within specific domains but are unlikely to generate cross-sector effects. General-purpose systems (G.3-G.5) that operate across diverse domains present unique governance challenges because they can propagate risks across system boundaries and exhibit unexpected behaviors in unanticipated contexts. This directly affects how we think about economic impact and labor displacement liability.

What makes this framework immediately actionable is how they construct "agentic profiles" for real systems. AlphaGo rates A.3, E.1, GC.2, G.1—intermediate autonomy, low environmental impact, low goal complexity, single specialty. ChatGPT-3.5 rates A.2, E.2, GC.3, G.3—partial autonomy, intermediate efficacy through mediated environment, moderate goal complexity, high generality across multiple task domains. Claude 3.5 with tool use jumps to A.3, E.3, GC.4, G.3, showing how scaffolding dramatically changes risk profiles. Waymo autonomous vehicles rate A.4, E.4, GC.4, G.2—high autonomy and efficacy in physical environments with complex goal management but domain-specific generality.

The dynamic nature of these profiles creates both opportunities and challenges for governance. The same foundation model can have radically different agentic profiles depending on deployment context, tool access, and architectural additions. Adding browser control capabilities to Claude 3.5 increased its efficacy rating from E.2 to E.3 by enabling more direct environmental impact. This means our compliance frameworks need to be deployment-specific rather than model-specific, which complicates licensing and regulatory approaches but also provides more precise risk management.

For attorneys building AI governance programs, this framework offers several immediate applications. Risk assessment becomes more systematic—instead of subjective "high/medium/low" ratings, you can map specific capability combinations to appropriate safeguards. The autonomy dimension directly informs oversight requirements and delegation protocols. Efficacy levels determine environmental impact assessment needs and safety protocol requirements. Goal complexity guides alignment verification approaches and contractual specification strategies. Generality levels inform systemic risk analysis and cross-sector coordination needs.

The framework also clarifies some thorny questions about liability and responsibility assignment. Higher autonomy levels (A.4-A.5) raise questions about principal control that affect vicarious liability analysis. Higher efficacy levels (E.4-E.5) in physical environments create direct causation pathways that simplify tort analysis but increase potential damages. Complex goal structures (GC.4-GC.5) make it harder to establish clear performance standards but easier to argue about foreseeability of specific outcomes.

From a product development perspective, this creates a roadmap for managing regulatory risk throughout the development lifecycle. Design choices directly affect agentic profiles, which determine governance requirements. Adding reasoning capabilities increases both autonomy and goal complexity ratings. Providing tool access increases efficacy ratings. Expanding domain applicability increases generality ratings. Each change potentially triggers different compliance obligations, which means governance considerations can be integrated into technical architecture decisions rather than bolted on afterward.

The practical challenge is operationalizing these measurements. The authors acknowledge that developing consistent metrics for each dimension requires sustained interdisciplinary collaboration. Efficacy might be measured using empowerment metrics that assess environmental influence potential. Goal complexity could use hierarchical planning analysis to measure objective structure and depth. But the selection and validation of specific metrics remains significant work.

What's immediately useful is the framework's structure for comparative analysis and regulatory matching. Instead of treating AI governance as a generic compliance exercise, we can develop proportionate approaches that scale with actual capabilities. A GC.1 system needs different alignment verification than a GC.4 system. An E.1 system needs different safety protocols than an E.4 system. This precision helps avoid both over-regulation that stifles innovation and under-regulation that misses real risks.

The timing couldn't be better. With major companies announcing "one billion agents" deployment goals and autonomous capabilities being integrated into business processes, we need governance frameworks that can keep pace with technical development. This four-dimensional approach provides the conceptual infrastructure for building compliance programs that actually match the systems we're deploying rather than fighting yesterday's AI battles.

Looking ahead, the framework raises important questions about agent individuation—when does one agent become a different agent, and when should agentic profiles be revised? For ongoing compliance, this suggests regular deployment-specific reappraisal rather than one-time approvals. It also points toward governance approaches that can adapt to evolving capabilities rather than locking in static requirements.

The broader implication is that effective AI governance requires this kind of technical precision. Vague appeals to "responsible AI" don't provide actionable guidance for developers or regulators. But a framework that connects specific capabilities to specific governance requirements creates space for both innovation and appropriate oversight. That's the kind of clarity this space desperately needs.

Characterizing AI Agents for Alignment and Governance

The creation of effective governance mechanisms for AI agents requires a deeper understanding of their core properties and how these properties relate to questions surrounding the deployment and operation of agents in the world. This paper provides a characterization of AI agents that focuses on four dimensions: autonomy, efficacy, goal complexity, and generality. We propose different gradations for each dimension, and argue that each dimension raises unique questions about the design, operation, and governance of these systems. Moreover, we draw upon this framework to construct “agentic profiles” for different kinds of AI agents. These profiles help to illuminate cross-cutting technical and non-technical governance challenges posed by different classes of AI agents, ranging from narrow task-specific assistants to highly autonomous general-purpose systems. By mapping out key axes of variation and continuity, this framework provides developers, policymakers, and members of the public with the opportunity to develop governance approaches that better align with collective societal goals.

arXiv.orgAtoosa Kasirzadeh

TLDR: This paper introduces a framework to characterize AI agents for improved governance and alignment. It posits that recent breakthroughs in foundation models, augmented with reasoning, memory, and tool use, are leading to a new class of AI agents capable of complex, impactful, goal-directed action with limited human control. This shift has profound implications for labor markets and human interaction, raising significant individual and systemic risks, including accidents, malicious use, coordination failures, and liability questions. Existing risk-proportionate and domain-specific regulatory frameworks can benefit from a deeper understanding of these novel AI agent properties.

The paper's key assertion is that understanding AI agents requires mapping them across four core dimensions, which collectively form an "agentic profile":

• Autonomy: The capacity to perform actions without external direction. Graded from A.0 (no autonomy) to A.5 (full autonomy). A central finding is that full autonomy (A.5) is not a desirable goal unless an agent's efficacy is contained, capabilities are robustly aligned, and significantly limited, as it entails a loss of principal control and increased risk.

• Efficacy: The ability to perceive and causally impact the environment. This depends on the agent's control over outcomes and the consequentiality of its environment (simulated, mediated, or physical). Higher efficacy, particularly in physical environments, directly correlates with increased risk, necessitating comprehensive safety measures.

• Goal Complexity: The ability to form or pursue increasingly complex goals, including decomposing subgoals and adapting strategies. More complex goals are harder for humans to evaluate, requiring advanced alignment verification methods like scalable oversight or mechanistic interpretability.

• Generality: The breadth of domains and tasks an agent can effectively operate across. Highly general agents pose unique governance challenges by potentially propagating risks across system boundaries and raising questions about economic disruption due to labor substitution.

The paper demonstrates these dimensions by creating "agentic profiles" for systems like AlphaGo, ChatGPT-3.5, Claude 3.5 Sonnet (with tool use), and Waymo. It asserts that these profiles are dynamic, significantly changing with additions like tool use, reasoning, or memory, or through real-world deployment. This dynamism highlights the need for continuous, deployment-specific reappraisal of AI agents. The paper concludes that tailored governance, safety protocols, and alignment strategies must be built upon this detailed understanding of AI agent properties.

You might also like

Beyond the chatbox: Designing interfaces for AI delegation

How 2023 research predicted AI audit washing would enable discrimination