Your AI reviewed the contract. But you can't cross-examine a log

As agents start doing real legal work, we have no clean way to prove what they actually did

2 min read
Your AI reviewed the contract. But you can't cross-examine a log

That's the line at the top of my poster for the AI for Law workshop at ICML 2026.

AI agents are starting to do real legal work. They review contracts, triage compliance, screen decisions, route approvals. The capability is arriving faster than the accountability. When an agent does something that matters and someone later asks you to show what it did, under whose authority, and whether a human reviewed it, the honest answer today is a log. Logs are editable, provider-controlled, and easy to dispute. They were built to help engineers debug a system, not to hold up when a decision is challenged in a courtroom or a regulatory inquiry.

We solved a version of this problem once already. The value of an electronic signature was never the click. It was the record around the click: who signed, when, in what order, sealed so it holds up later when someone disputes it. That record is why a signed agreement survives scrutiny. My paper asks the same question one layer earlier. Before the document gets signed, an agent did the work that produced it. What is the record of that work, and will it hold up?

I call the answer the Certificate of Action. It's a tamper-evident record of what an agent did: the decision it made and the policy it operated under, whether its reasoning was checked across independent passes, who reviewed it and when, and the outcome it produced, all sealed together. The part I care most about is that it needs no new cryptography. It uses the same primitives that already make an electronic signature stand up in court: a hash chain, a trusted timestamp, a public verifiable log. Modify any entry and the chain breaks. Only a fingerprint goes public, so a third party can confirm a record existed and is intact without ever reading its contents.

The Certificate of Action proves what the agent did. It does not prove the agent was right. Those are two different jobs. Whether a decision was correct belongs to evaluation and to human judgment. This record proves the process, so the person who is accountable can show it.

Tamper-evidence protects what was sealed, but it says nothing about what was never sealed, so completeness is a real question for any workflow one party runs unsupervised. There's also a genuine tension in how much reasoning to record without exposing privileged analysis. I'd rather work these out in the open with the people building this field than present a tidy answer that hasn't been stress-tested.

Delegating a task to an agent does not delegate the duty that comes with it. What's been missing is a way to prove the duty was met.

If you work on AI governance, evaluation, or legal AI, come find me at the poster session at AI for Law, ICML 2026, in Seoul on July 10.

Link for More