Ken Priore (Page 12)

From testing to reviewing: evaluating AI agents that run 30-step workflows

AI agents fail in production not because of bad architecture, but because we test them like traditional software. Complex 30-step workflows can't be tested—they must be reviewed like human work. This shift changes everything for legal and product teams.

Microsoft and NYU tested what happens when AI becomes employee #1

The research shows we're moving from AI-as-tool to AI-as-colleague, which means rethinking how we structure accountability and human oversight.

Operationalizing NIST AI Risk Model Framework: Beyond Accuracy and Checklists

The NIST framework provides the map, but fostering a true culture of responsibility is the journey.

A three-tier risk model for Agents based on production status and reversibility

IBM's framework begins with a reversibility assessment that determines which of three automation tiers applies to a given task.

When insurers won't touch AI, you're self-insuring by default

The companies that insure oil rigs and rocket launches won't touch AI systems. They can't model the failure modes well enough to price the risk. For product teams, that means you're absorbing liability that traditional risk transfer won't cover.

Lie to me....

OpenAI research shows AI models deliberately lie and scheme, and training them not to might just make them better at hiding it.

California's DROP system: One button to rule them all

Are you building privacy controls that work at the scale California is designing for? Because "we'll handle deletion requests manually" doesn't survive a system designed to generate them by the millions.

Advancing U.S. Competitiveness in Agentic Gen AI: A Strategic Framework for Interoperability and Governance

The work proposes a five-layer architectural framework that embeds governance and security requirements throughout system design rather than treating them as separate concerns.