Research Scientist, Machine Learning

Research Scientist (Machine Learning)

‍Location Remote or New York City, US
‍Organization Poseidon Research
‍Compensation $100,000–$150,000 annually; or higher, depending on experience
‍Type One year contract
‍
This position is funded through a charitable research.

APPLY NOW

About Poseidon Research

Poseidon Research is an independent AI safety laboratory based in New York City. Our mission is to make advanced AI systems transparent, trustworthy, and governable through deep technical research in interpretability, control, and secure monitoring.

We investigate how models think, hide, and reason- from understanding encoded reasoning and steganography in reasoning models to building open-source monitoring tools that preserve human oversight. Our research spans mechanistic interpretability, reinforcement learning, control, information theory, and cryptography, bridging the theoretical and the practical.

You could be a cog in a big lab and gamble with humanity’s future. Or you could own your entire research agenda at Poseidon Research, pioneering our understanding of AI’s inner workings to build a safe, secure, and prosperous future.

The Role

We are seeking a Research Scientist to help design, execute, and publish cutting-edge research on how advanced models represent, encode, and conceal information.

This is a high-autonomy position suited to those who want to pursue fundamental research with immediate practical implications- bridging theory, experiment, and deployment.

You will collaborate closely with research engineers to turn conceptual ideas into reproducible systems by building pipelines, datasets, and model organisms that make opaque behaviors measurable and controllable.

Responsibilities

We’re looking for a creative, rigorous scientist who thrives at the intersection of machine learning, theory, and safety. You will:

Design and conduct experiments on base LLMs and reasoning models (e.g., DeepSeek-R1 and V3, GPT-OSS, QwQ) to study phenomena like encoded reasoning, steganography, and reward hacking.
Develop and analyze model organisms- controlled, interpretable LLMs that exhibit key properties such as hidden communication or deceptive reasoning.
Contribute to interpretability tools and pipelines for whitebox monitoring using frameworks like TransformerLens.
Formalize security and information-theoretic bounds on steganographic or deceptive behaviors in LLMs.
Collaborate across domains- from RL and interpretability to cryptography and complexity theory- to unify empirical and theoretical insights.
Publish and communicate findings through open-source releases, benchmarks, and papers aimed at improving AI governance and safety evaluation.

Ideal Candidate

Core Technical Skills

Core ML / AI Safety: Reinforcement learning, interpretability, or model evaluations.
Theoretical Foundations: Information theory, cryptography, or complexity theory.
Applied Research: Developing reproducible ML experiments, model organisms, or interpretability pipelines.
Systems & Tools: PyTorch, Hugging Face, and TransformerLens.
Reproducibility & Engineering: Strong Python proficiency, Git, and experiment tracking (W&B, MLflow, etc.).
‍Prior publications or strong research engineering experience in interpretability or control.

Nice to have

Familiarity with concepts like steganography, chain-of-thought faithfulness, or reward hacking.
Background in formal methods, information security, or RL-based training regimes.

Mindset

Excited by deep technical challenges with high safety implications.
Values open science, clarity, and reproducibility.
Comfortable working in a small, fast-moving research team with high autonomy.‍
Conscientiousness, honesty, agentic disposition.

Why Join Poseidon Research?

Mission-Driven Research: Every project contributes directly to AI safety, transparency, and governance.
Ownership: Lead your own research agenda with mentorship, not micromanagement.
Interdisciplinary Collaboration: We regularly work with top researchers from DeepMind, Anthropic, other AI safety startups, and academic partners.
Impact: Develop techniques, open-source tools and benchmarks that shape global standards for safe AI deployment. Work from our staff has already been cited by Anthropic, DeepMind, Meta, Microsoft and MILA.
Lean, fast, and serious: We move quickly, publish openly, and care deeply about getting it right.

Application

Please Include

A short research statement (what problems you’d be most excited to work on and why).
CV, and Google Scholar link if applicable.
Links to code or papers.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.