Surge AI's leaked safety doc shows why your AI guidelines need lawyers

When reading through the 83-page "Updated Safety Guidance" that Surge left public, what struck me wasn't the examples themselves but the difficulty in threading needles in real time.

The document tells annotators they can allow jokes about gay people if they're "inoffensive" but must block essays about "the gay agenda." They can explain how people break into buildings, but not how you should do it yourself. These distinctions matter enormously in practice, but they're also precisely the kind of contextual judgment calls that create liability exposure when they inevitably go wrong.

What's revealing is Surge's admission that they've been loosening restrictions: "Most of the changes we've made allow the chatbots to do MORE than we allowed them to do before." That's the real tension for product teams—every safety guardrail you remove increases utility but also increases the blast radius when something slips through. The document shows this isn't theoretical; it's happening in your training pipeline right now.

The lesson isn't about document security (though obviously fix that). It's to recognize that these gray-area decisions are legal decisions masquerading as content policy. When your contract annotators are deciding whether "how do people break into buildings" crosses a line, they're making calls that should probably involve lawyers, not just content moderators. The leak just made visible what was always true about how these systems get trained.

https://www.inc.com/sam-blum/surge-ai-left-an-internal-ai-safety-doc-public-heres-what-chatbots-can-and-cant-say/91213308

You might also like

AI writes code in hours, but teams still need days to review it

When LangChain commits to not breaking your code