<- all open roles
Research Engineer (Machine Learning)

Location Remote or New York City, US
Organization  Poseidon Research
Compensation $100,000–$150,000 annually; depending on experience
Type One year contract

This position is funded through a charitable research grant and is open to applicants authorized to work in the United States.

APPLY NOW
About Poseidon Research

Poseidon Research is an independent AI safety laboratory based in New York City. Our mission is to make advanced AI systems transparent, trustworthy, and governable through deep technical research in interpretability, control, and secure monitoring.

We investigate how models think, hide, and reason- from understanding encoded reasoning and steganography in reasoning models to building open-source monitoring tools that preserve human oversight. Our research spans mechanistic interpretability, reinforcement learning, control, information theory, and cryptography, bridging the theoretical and the practical.

You could be a cog in a big lab and gamble with humanity’s future. Or you could own your entire research platform at Poseidon Research, pioneering the infrastructure needed to accelerate AI safety to build a safe, secure, and prosperous future.

The Role

We are hiring a Research Engineer to implement and scale experiments studying encoded reasoning and steganography in modern reasoning models.

This is a hands-on, highly technical position focused on experiment design, model evaluation, and engineering platforms.

You will collaborate closely with research scientists to turn conceptual ideas into reproducible systems by building pipelines, datasets, and model organisms that make opaque behaviors measurable and controllable.

Responsibilities

We’re looking for a creative, rigorous engineer who loves to build in order to understand how safety issues intersect with reality. You will:
  • Implement and reproduce prior work on encoded reasoning and steganography, extending it to current open-weight reasoning models (e.g., DeepSeek-R1 and V3, GPT-OSS, QwQ).
  • Develop and maintain modular experiment pipelines for evaluating steganography, encoded reasoning, and reward hacking.
  • Build and test finetuning workflows (SFT or RL-based) to study emergent encoded reasoning and reward hacking behaviors.
  • Collaborate with our research leads to design safety cases and control agenda monitoring mechanisms suitable for countering various types of unsafe chain of thought.
  • Extend interpretability infrastructure, including probing, feature ablation, and sparse autoencoder (SAE) analysis pipelines using frameworks like TransformerLens.
  • Engineer datasets and evaluation suites for robust paraphrasing, steganography cover tasks, and monitoring robustness metrics.
  • Collaborate with scientists to identify causal directions and larger-scale mechanisms (via standard interp, DAS, MELBO, targeted LAT, and related methods) underlying encoded reasoning.
  • Ensure reproducibility through clean code, experiment tracking, and open-source releases.
  • Contribute to research communication by preparing writeups, visualizations, and benchmark results for research vignettes and publications.
Ideal Candidate

Core Technical Skills

  • Strong Python and PyTorch experience.
  • Experience with LLM experimentation using frameworks like Hugging Face Transformers, TransformerLens, or equivalent.
  • Building reproducible ML pipelines including data preprocessing, logging, visualization, and evaluation.
  • RL finetuning or training small-to-mid-scale models through frameworks like TRL, verl, OpenRLHF or equivalents.
  • Proficiency with experiment tracking tools such as Weights & Biases or MLflow, and Git.
  • Active proficiency and/or intellectual curiosity working with AI-assisted coding and research tools, such as Claude Code, Codex, Cursor, Roo, Cline or equivalents.

Nice to have

  • Familiarity with interpretability methods such as probing, activation patching, or feature attribution.
  • Understanding of encoded reasoning, steganography, or information-theoretic approaches to model communication; or some background in formal cryptography, information theory, or offensive cybersecurity.
  • Experience with mechanistic interpretability such as feature visualization, direction ablation, SAEs, crosscoders, and circuit tracing.
  • Background in information security, control, or formal verification.
  • Prior publications.
Mindset
  • Excited by deep technical challenges with high safety implications.
  • Values open science, clarity, and reproducibility.
  • Comfortable working in a small, fast-moving research team with high autonomy.‍
  • Conscientiousness, honesty, agentic disposition.
Why Join Poseidon Research?
  • Mission-Driven Research: Every project contributes directly to AI safety, transparency, and governance.
  • Ownership: Lead your own research platform with mentorship, not micromanagement.
  • Interdisciplinary Collaboration: We regularly work with top researchers from DeepMind, Anthropic, other AI safety startups, and academic partners.
  • Impact: Develop techniques, open-source tools and benchmarks that shape global standards for safe AI deployment. Work from our staff has already been cited by Anthropic, DeepMind, Meta, Microsoft and MILA.
  • Lean, fast, and serious: We move quickly, publish openly, and care deeply about getting it right.
Application

Please Include

  • A short research statement about what problems in AI safety interest you and how these intersect with Poseidon aims.
  • CV, and Google Scholar link if applicable.
  • Links to code or papers if not in CV.
Max file size 10MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.