About

Poseidon Research is an independent AI safety lab based in New York City, focused on deception and hidden information. We do foundational science to advance the field with benefits too diffuse for any one company to capture.

ABOUT

Our work intersects control, mechanistic interpretability, information theory, and AI security. We develop evaluations with genuine predictive validity, particularly in settings where models may conceal or encode information, build open-source tools the broader research community can use, and publish findings that shape the field.

Our team has been heavily involved in the maintenance of TransformerLens, the community's primary open-source mechanistic interpretability framework.

FOCUS

The science of AI safety is nascent, and current results rarely generalize. We believe progress requires a rigorous science of what AI systems actually learn, with results you can stake real decisions on.

Our focus is deception in AI systems: the gap between what models internally represent and what they express, including steganographic and covert channels in outputs that can evade detection.

Understanding when and why evaluations fail under hidden information flow is critical for detecting deception and building reliable monitoring systems under realistic and adversarial conditions.

We develop standards for studying model behavior, including new measurements, evaluation methods/benchmarks, and model-organism-style approaches to controlled analysis that are robust across contexts, rather than narrowly optimized to specific systems or tasks.

Our goal is to establish scientific foundations for safety claims about frontier AI that are genuinely trustworthy. We collaborate with leading labs and academic institutions, and our work has been cited by Anthropic, DeepMind, Meta, Microsoft, MILA, and the UK AISI.

Team

We're bringing together people from many disciplines to make advanced AI systems secure, safe, and trustworthy through deep technical research in interpretability, control, and secure monitoring.