A three-tier risk model for Agents based on production status and reversibility

Enterprises deploying autonomous AI agents face a practical governance challenge that most frameworks address only in abstract terms:

When should an agent act without approval, when should it route to human review, and when should it transfer control entirely?

IBM's framework for agentic observability proposes a concrete answer through a three-tier risk continuum that assigns automation levels based on task reversibility and production status, with mandatory MELT capture (metrics, events, logs, and traces) for every agent action, regardless of tier. Based on "Agentic AI in Observability: Building Resilient, Accountable IT Systems," published by The New Stack, the framework applies to an enterprise where autonomous agents analyze telemetry and take corrective action. It translates the NIST AI Risk Management Framework's evidentiary requirements and the EU AI Act's continuous monitoring obligations into specific architectural controls and documentation practices that product and legal teams can implement today. The model reflects deployment data showing that approximately 60-70% of automation currently occurs in development and test environments, with 30-40% in production systems where stakes and oversight requirements differ materially.

Risk classification based on environment and reversibility

IBM's framework begins with an assessment that determines which of three automation tiers applies to a given task.

Low-risk tasks that are fully reversible and occur in non-production environments receive automated execution with minimal oversight. These include log analysis, pattern recognition in test data, and configuration changes in development systems where mistakes create no customer impact and require minimal effort to undo. The automated tier assumes that telemetry collection continues but that agent actions proceed without human approval or routing.

Medium-risk processes that involve potentially impactful changes or operate closer to production boundaries receive supervised automation, where agents propose actions and humans review and approve before execution. This tier applies to scenarios where the agent's analysis might be correct but where verification adds a necessary safety layer before commitment.

High-risk operations that directly affect customers or production availability require human-in-the-loop execution, where agents assist with data gathering and recommendations, but humans retain full control over decisions and actions. Customer communications, production database changes, and customer-facing service modifications all fall into this category, regardless of the agent's confidence level or historical accuracy.

The risk continuum is based on two variables that interact to determine tier assignment: whether the task is reversible without significant cost or customer impact, and whether it occurs in a production environment where failures have immediate external consequences. A task that is reversible but occurs in production might still qualify for supervised automation rather than full HITL (human-in-the-loop) control if the review process can be completed within acceptable time windows.

Conversely, an irreversible task in a test environment might escalate to a supervised tier even though the environment is non-production, because the cost of incorrect execution outweighs the efficiency gain from full automation. The framework treats reversibility as a spectrum rather than a binary property, acknowledging that some tasks require more manual effort to undo than others, and that this effort affects the risk calculation. Production status similarly operates as a contextual factor that modifies base risk assessments rather than as an absolute barrier, though the framework implies that most customer-facing production changes default to HITL regardless of reversibility.

IBM's deployment data provides empirical context for how these classifications distribute across real enterprise environments. The 60-70% automation rate in development and test environments suggests that organizations find most tasks in these contexts either fully reversible or low-impact enough to automate with confidence. The 30-40% production automation rate indicates more conservative risk tolerance when customer-facing systems are involved, with a higher proportion of tasks routed to supervised or HITL tiers. The framework anticipates that these ratios will shift over time as organizations build trust in agent behavior and as observability tooling improves detection of anomalous agent actions. The article suggests that tasks currently in the supervised tier could migrate to automated execution as confidence grows, particularly in non-production environments where the cost of error remains contained.

MELT capture as the foundation for explainability and audit

The framework mandates the collection of metrics, events, logs, and traces—collectively referred to as MELT under the EU AI Act—for every agent action, regardless of automation tier. This requirement applies to fully automated low-risk tasks, supervised medium-risk processes, and human-controlled high-risk operations, establishing telemetry capture as non-negotiable infrastructure rather than as an optional compliance add-on. MELT data serves multiple purposes that map to different accountability obligations: metrics track quantitative patterns in agent behavior over time, events capture discrete actions and state changes, logs preserve the reasoning pathway that led to each decision, and traces reconstruct the full sequence of operations from initial trigger through final execution. The EU AI Act's continuous monitoring requirements depend explicitly on MELT availability, making capture a prerequisite for operating AI systems under the regulation's scope.

Observability pipelines function as the technical mechanism for MELT collection, treating AI models and agents as first-class observable components alongside traditional application services and infrastructure. These pipelines capture data at multiple points in the agent lifecycle: when an agent receives a trigger or observes a condition, when it generates a proposed action, when that action moves through approval workflows if applicable, and when execution completes or fails. The pipeline architecture must handle high-volume telemetry without introducing latency that would undermine the speed advantages of autonomous operation, a constraint that pushes teams toward sampling strategies and prioritized capture rules rather than comprehensive logging of all internal state. The framework does not specify sampling rates or prioritization criteria, leaving teams to balance completeness against performance based on their risk tolerance and regulatory obligations.

Data lineage emerges as a critical component of MELT capture because explainability requirements under NIST AI RMF and EU AI Act demand not just records of what happened but evidence of why the agent chose a particular action. Lineage tracking must connect each agent decision back to the specific telemetry data that informed it, the model weights or parameters active at execution time, and any policy rules that constrained available options. This creates storage and indexing challenges because maintaining queryable lineage at scale requires retaining correlation identifiers across distributed system boundaries and potentially across organizational boundaries when agents consume data from external sources. The framework implies that organizations need to design lineage capture into their observability architecture from the start rather than retrofitting it after deployment, because reconstructing decision pathways after the fact becomes technically infeasible once telemetry data ages out of retention windows.

Safeguards for autonomous action and drift detection

The framework identifies four primary risk categories that agentic observability systems must address through specific monitoring and control patterns. False positives and hallucinations occur when generative models misidentify patterns in telemetry data, potentially triggering unnecessary interventions or generating spurious alerts that consume human attention without cause.

Observability tools can detect these failures by tracking anomalous responses, repeated retries that suggest poor model grounding, and statistical patterns in agent output that deviate from expected distributions. When detection systems identify potential hallucinations, the recommended response is to prompt model retraining or parameter updates rather than simply suppressing the output, as addressing the underlying model behavior prevents recurrence across similar scenarios.

Loss of oversight emerges when teams become over-reliant on automation and stop monitoring underlying system drift, a failure mode that can remain invisible until catastrophic incidents surface accumulated problems. The framework recommends monitoring changes in agent response patterns and variations in output as leading indicators of drift, with alerting thresholds that flag deviations before they affect production behavior. This requires establishing baselines for normal agent behavior during controlled deployment phases and treating significant deviation from those baselines as potential evidence of model degradation or changing operating conditions that require human review. The challenge lies in distinguishing legitimate adaptation to evolving system conditions from problematic drift, a classification problem that itself may require machine learning approaches and human validation of edge cases.

Security exposure arises when agents gain excessive data access or invoke services beyond their authorized boundaries, transforming them into attack surfaces that adversaries can exploit through prompt injection or indirect manipulation of agent inputs. The framework calls for observability tools that identify boundary violations by tracking which services agents access, which data sources they query, and whether these patterns align with defined authorization scopes. When violations occur, the recommended response is to retrain the model to close security gaps rather than simply blocking access after the fact, though the article does not address how to prevent retraining from reintroducing vulnerabilities if the training process itself lacks adequate controls. Compliance risks materialize when unexplainable AI decisions trigger regulatory violations under frameworks that mandate transparency, creating liability exposure that persists even when agent actions prove operationally correct.

AI gateway controls function as a runtime enforcement layer that validates and authorizes agent actions before execution, ensuring compliance with security policies and change management requirements across all automation tiers. These gateways intercept proposed actions, verify them against policy rules that encode the organization's risk tolerance, and either permit execution or route them for human review based on predefined criteria. The gateway architecture assumes that policies can be expressed as machine-evaluable rules, a requirement that may prove challenging for subjective risk assessments or context-dependent judgments that resist formalization. The framework does not specify how gateways should handle cases where policy rules conflict or when a proposed action falls outside the scope of defined policies, leaving teams to design fallback behaviors that default to either conservative blocking or escalation for human judgment.

Regulation and evidence for the three-tier continuum

IBM's framework aligns with NIST AI RMF requirements by establishing controls and evidentiary standards for responsible AI operations, though the article does not cite specific NIST sections or control identifiers corresponding to each element of the three-tier model. NIST AI RMF emphasizes transparency, explainability, and risk management across the AI lifecycle, principles that map directly to IBM's insistence on MELT capture and tiered oversight. The EU AI Act's continuous monitoring requirements are explicitly stated in the framework's call to observe metrics, events, logs, and traces, though the article does not address whether IBM's approach satisfies the Act's specific documentation and record-keeping obligations for high-risk AI systems as defined under the regulation's categorization scheme.

Google's Model Cards provide templates for documenting model provenance and behavior, offering a standardized format that teams can adopt to satisfy explainability requirements. The framework references Model Cards as complementary infrastructure that supports transparency but does not specify whether MELT data feeds into Model Card generation or how teams should update cards as agents evolve through retraining cycles. The interaction between operational telemetry capture and static documentation artifacts remains underspecified, creating a gap between real-time observability and compliance documentation that teams must bridge through custom integration work.

The framework's empirical foundation rests on IBM's deployment experience showing 60-70% automation in non-production environments and 30-40% in production systems, figures that suggest these ratios reflect current practice across IBM's customer base rather than aspirational targets. The article does not disclose sample size, industry distribution, or time period for this data, limiting the ability to assess whether these automation rates represent conservative early-stage adoption or mature steady-state deployment. The framework anticipates that automation rates will increase over time as trust grows and as observability tooling improves, but provides no timeline or target percentages for this evolution.

Definitions and applicability across enterprise operations

Automated execution applies to low-risk, reversible tasks where mistakes create minimal impact and corrections require minimal effort. The framework provides examples, including log analysis and test-environment operations, but does not establish formal criteria for determining whether a task qualifies as low-risk or what threshold defines "minimal effort" for reversal. This leaves teams to develop their own risk assessment methodologies, using IBM's examples as reference points rather than exhaustive specifications. The automated tier assumes continuous telemetry collection but permits agent actions to proceed without approval workflows or human checkpoints, making it suitable only for contexts where the cost of occasional errors remains acceptable and where detection systems can flag problems quickly enough to prevent cascading failures.

Supervised automation introduces a review-and-approve step before execution, positioning humans as validators who can accept or reject proposed actions based on their assessment of correctness and risk. This tier assumes that agents can present sufficient context to enable informed human judgment. This requirement depends on the agent's ability to explain its reasoning in terms that non-technical reviewers can evaluate. The framework does not specify response time requirements for human review or how to handle cases where reviewers disagree with agent recommendations but cannot articulate specific concerns, gaps that could undermine the efficiency advantages of automation if review processes introduce excessive latency or become approval bottlenecks.

Human-in-the-loop execution reserves full control for humans while allowing agents to gather data and generate recommendations that inform but do not determine final decisions. Customer communications and production changes explicitly fall into this tier regardless of agent confidence or historical accuracy, reflecting IBM's judgment that these operations carry reputational and availability risks that outweigh automation efficiency gains. The HITL tier implies that agents add value primarily through information synthesis and option generation rather than through direct action, positioning them as decision support tools rather than as autonomous executors. This framing may reduce adoption resistance from teams concerned about losing operational control but also limits the scope of automation benefits that organizations can capture.

The framework does not specify how to measure reversibility in quantitative terms or how to handle tasks that fall between clear low-risk and high-risk categories. A production configuration change that can be rolled back within minutes but that causes brief service degradation might qualify for supervised automation based on reversibility but for HITL execution based on production status and customer impact. The framework offers no tie-breaking criteria or weighting scheme to resolve such conflicts, leaving teams to develop their own classification methodologies while using IBM's broad categories as organizing principles. This flexibility allows adaptation to different risk cultures and regulatory environments but also creates implementation inconsistency that could complicate cross-organization benchmarking or regulatory examination.

The framework does not address how to handle cases where agents exceed authorized boundaries during execution, such as when an agent approved for database queries attempts to modify data or when an agent scoped to test environments attempts to access production systems. The recommendation to deploy AI gateways that validate actions before execution suggests runtime enforcement, but the article does not specify whether violations should immediately terminate agent execution, route to escalated human review, or trigger automated rollback of partial changes. The lack of specified enforcement mechanisms leaves teams to design their own incident response procedures, with the risk that inconsistent approaches across different agent deployments create security gaps or compliance exposure.

Drift detection appears as a monitoring priority but without formal thresholds or statistical methods to distinguish legitimate adaptation from problematic degradation. An agent that adjusts its anomaly detection criteria in response to genuine changes in system behavior exhibits adaptation, while an agent that drifts away from validated behavior patterns exhibits degradation. The framework recommends monitoring response patterns and output variations but does not specify how to compute baselines, how much deviation triggers review, or how frequently baselines should update to track evolving normal conditions. These technical gaps make drift detection aspirational rather than immediately actionable without additional methodology development.

Explainability documentation for MELT-based compliance

For legal teams, capture metrics, events, logs, and traces for every agent action to satisfy NIST AI RMF evidentiary standards and EU AI Act continuous monitoring requirements. MELT data provides the factual record that regulators and auditors will examine when assessing whether AI systems operate transparently and within defined bounds, making telemetry infrastructure a prerequisite for defensible compliance positions. Legal should work with engineering to ensure that retention policies for MELT data align with regulatory examination windows and litigation hold requirements, because telemetry that ages out before examination completes creates evidentiary gaps that undermine transparency claims. Documentation requirements under the framework include not just raw telemetry but also data lineage that connects agent decisions back to specific inputs, model versions, and policy rules that constrained available options. This lineage reconstruction depends on correlation identifiers that persist across system boundaries and that remain queryable throughout retention periods, technical requirements that legal should verify during architecture review rather than discovering gaps after deployment.

Risk tier classification creates documentation obligations that vary by automation level, with HITL operations requiring the most comprehensive records because they involve direct customer impact and manual decision-making authority. Legal should collaborate with product to establish tier assignment criteria that map to specific documentation templates, ensuring that teams capture appropriate evidence for each risk category without over-collecting telemetry for low-risk automated tasks. The supervised automation tier introduces approval workflow records as additional compliance artifacts that demonstrate human oversight before execution, though the framework does not specify what constitutes adequate approval documentation or how detailed reasoning must be to satisfy explainability requirements. Legal should establish documentation standards that capture not just approval decisions but also the rationale for those decisions, particularly in edge cases where reviewers override agent recommendations or where approval occurs despite uncertainty about correctness.

Guardrails for Agent Action

For product teams, deploy AI gateways that validate agent actions against authorized service boundaries before execution and alert on scope violations. Gateway controls enforce policy rules in real time rather than relying on post-execution monitoring to detect unauthorized behavior, preventing security exposure and compliance violations before they occur. Product should implement gateways as mandatory check-ins that intercept all agent-initiated operations, rather than optional guardrails that agents can bypass, because architectural enforcement is more reliable than agent-level constraints that depend on model behavior. Gateway validation logic should encode clear authorization scopes that specify which services agents can invoke, which data sources they can query, and which operations they can perform within each service, with any request outside these boundaries either blocked immediately or routed to elevated review based on predefined escalation rules.

Observability pipeline architecture must treat agents as first-class observable components alongside traditional application services, capturing telemetry at each stage of the agent lifecycle from initial trigger through final execution. Product should instrument agents to emit structured telemetry that includes correlation identifiers linking decisions to inputs, model versions active at execution time, and policy rules that constrained available options. This instrumentation must operate with minimal latency to avoid undermining the speed advantages of autonomous operation, pushing teams toward asynchronous telemetry emission and buffering strategies rather than synchronous logging that blocks agent execution. Pipeline design should anticipate high-volume telemetry generation and should implement sampling or prioritization rules that preserve critical explainability data while discarding lower-value internal state information.

Drift detection requires establishing behavioral baselines during controlled deployment phases when agent behavior aligns with validated patterns, then monitoring for statistical deviation from those baselines during production operation. Product should implement alerting thresholds that flag significant changes in agent response patterns or output distributions before drift affects production behavior, giving teams time to investigate and retrain if necessary. The framework's emphasis on monitoring rather than prevention suggests that drift detection operates as an observability function rather than as a runtime control, meaning that teams accept some drift as inevitable and focus on early detection rather than attempting to prevent all model evolution. Product should design agent architectures that allow quick model updates or rollbacks when drift detection triggers review, minimizing the window between detection and remediation.

There are several emerging frameworks beyond IBM’s that are actively shaping enterprise approaches to agentic AI governance. Notably, organizations like AIGN have introduced certifiable, end-to-end governance systems incorporating multi-stage maturity models, risk mapping tools, and agentic-specific controls such as goal alignment, incident escalation, and mitigation of emergent behaviors. Other frameworks from Okta, AWS, and Kore.ai emphasize human intervention points, transparent audit trails, and runtime enforcement of security boundaries to address the unique risks of autonomous agents. These alternatives reflect a growing consensus around the principles outlined in IBM's framework—especially the importance of MELT capture, tiered oversight, and dynamic risk classification—while expanding operational guidance for scalability, certification, and cross-industry benchmarking. While the landscape is evolving rapidly, IBM’s approach offers grounded first principles that remain foundational as enterprises adopt more advanced and certifiable agent governance practices