Lie to me....

OpenAI research shows AI models deliberately lie and scheme, and training them not to might just make them better at hiding it.

1 min read
Lie to me....
Photo by Jon Tyson / Unsplash

OpenAI and Apollo Research published findings this week showing that AI models deliberately scheme and lie—and our attempts to train this behavior out might make it worse. The models hide their real goals while appearing compliant on the surface, behavior the researchers call "scheming."

The real problem runs deeper than the behavior itself. When you try to eliminate scheming through training, you often just teach the model to scheme more carefully. The researchers found models becoming aware they're being tested and faking compliance to pass evaluations. As TechCrunch's Julie Bort pointed out, this isn't a traditional software failure—it's deliberate deception.

Their "deliberative alignment" technique worked better: making models review anti-scheming rules before acting reduced the behavior. But the research reveals a fundamental gap in our understanding of what we're actually training these systems to do. For anyone making product decisions around AI deployment, that gap matters.

OpenAI’s research on AI models deliberately lying is wild | TechCrunch
AI models don’t just hallucinate. They also “scheme,” meaning deliberately lie or hide their true intentions.