Compliance used to mean documentation. For AI agents, it means something else entirely

For most of the history of enterprise software, EU compliance worked like this: build the system, document what it does, assess the risks at deployment, get your certification, and move on. The documentation captured a fixed thing. The assessment was a snapshot. If nothing changed in the system, the snapshot stayed valid.

That model doesn't work for AI agents. A new paper released this week makes that concrete in a way that product and legal teams building agentic systems need to sit with: a high-risk agentic system with untraceable behavioral drift cannot currently be placed on the EU market consistent with the essential requirements of the EU AI Act. Not in six months when enforcement ramps up. Now. That's binding law.

Here's what makes this different from ordinary software compliance. The EU AI Act's Article 3(23) defines a "substantial modification" as a post-deployment change that affects compliance or alters the system's intended purpose. For traditional software, identifying a substantial modification is straightforward — you look at what changed in the code. For an agentic system, the model weights may be identical while the system's operational profile has shifted entirely. Persistent cross-session memory, continuous learning from user interactions, novel tool use patterns the system develops on its own — these can each move a system outside the boundaries of its original conformity assessment without anyone changing a single line of code.

The researchers behind this paper identified three manifestations of what they call behavioral drift: semantic drift (the system's outputs diverge from baseline patterns), coordination drift (how the system orchestrates tool use changes), and behavioral drift (the system develops patterns that weren't present at deployment). All three follow non-linear degradation trajectories. You don't get early warning signs. You get a system that was compliant at deployment and may not be now, with no clear inflection point where it crossed the line.

The practical implication for product teams is this: conformity assessment is no longer a one-time event. It's the starting point for an ongoing monitoring obligation. What the paper describes as the "minimum viable compliance posture" for a high-risk agentic system includes versioned snapshots of operational state at defined intervals — tool catalogue, memory state, policy bindings — continuous monitoring of behavioral metrics against the conformity assessment baseline, automated detection when drift crosses defined thresholds, and documented internal procedures for determining whether a change meets the Article 3(23) threshold for substantial modification.

That last piece is more consequential than it sounds. Right now, most organizations don't have an internal procedure for making that determination because they've never had to. For traditional software, the question answered itself — either the code changed or it didn't. For agentic systems, someone has to own the judgment call, with documented criteria, on an ongoing basis. That's a new operational function that doesn't map cleanly onto existing legal, engineering, or compliance teams.

There's a second layer to this worth understanding. The harmonised standards that providers will use to demonstrate compliance — the EN 18282 cybersecurity standard, EN 18229 trustworthiness standard — address two of the five agentic threat categories that USENIX Security researchers documented this year. Prompt injection mitigation and privilege management have standards guidance. Multi-agent protocol threats, interface exploitation, and governance-level autonomy concerns don't. For those three categories, providers are building compliance posture on their own, without standards scaffolding, while the law is already in effect.

This is what I mean when I say governance needs to be in the architecture conversation, not the incident response. The organizations that treat this as a documentation problem will find out it isn't when they try to reconstruct whether their system was compliant at a specific point in time and realize they didn't capture the operational state to make that determination. The organizations that treat it as a monitoring and versioning problem — who build the instrumentation before they deploy — will have something to show. That's the difference between a conformity assessment that holds up and one that doesn't.

The shift from documentation to continuous monitoring isn't optional. It's what the law now requires for high-risk systems that do things in the world.

AI Agents Under EU Law

AI agents - i.e. AI systems that autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement - are being deployed at scale across enterprise functions ranging from customer service and recruitment to clinical decision support and critical infrastructure management. The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based framework, but it does not operate in isolation: providers face simultaneous obligations under the GDPR, the Cyber Resilience Act, the Digital Services Act, the Data Act, the Data Governance Act, sector-specific legislation, the NIS2 Directive, and the revised Product Liability Directive. This paper provides the first systematic regulatory mapping for AI agent providers integrating (a) draft harmonised standards under Standardisation Request M/613 to CEN/CENELEC JTC 21 as of January 2026, (b) the GPAI Code of Practice published in July 2025, (c) the CRA harmonised standards programme under Mandate M/606 accepted in April 2025, and (d) the Digital Omnibus proposals of November 2025. We present a practical taxonomy of nine agent deployment categories mapping concrete actions to regulatory triggers, identify agent-specific compliance challenges in cybersecurity, human oversight, transparency across multi-party action chains, and runtime behavioral drift. We propose a twelve-step compliance architecture and a regulatory trigger mapping connecting agent actions to applicable legislation. We conclude that high-risk agentic systems with untraceable behavioral drift cannot currently satisfy the AI Act’s essential requirements, and that the provider’s foundational compliance task is an exhaustive inventory of the agent’s external actions, data flows, connected systems, and affected persons.

arXiv.orgLuca Nannini

You might also like

Not all AI agents carry the same legal risk

Observational memory changes the AI governance equation