← All open roles
AfterQuery logoAfterQueryafterquery.com
Raised $150K

Research Scientist - Frontier Data

San Francisco, CAIn-personFull-Time

About

About the role

This is a hands-on, high-leverage research role. You will design the datasets and evaluation frameworks that shape how frontier models are trained and measured. Working directly with research teams at the world's top AI labs, you will experiment with data collection strategies, diagnose model failure modes, and develop the metrics that determine whether a model is actually getting better. This is not a theorizing role. You will quickly move from hypothesis to a live experiment, and your output will directly influence model training runs at scale. The team is small, the impact is outsized, and individual contributors here have a direct line to how the next generation of models learns and improves. Design data slices and explore data shapes that expose meaningful model failure modes across domains, including finance, code, and enterprise workflows Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines Model annotator behavior and run experiments to improve different model capabilities Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications Move fast from hypothesis to experiment, extract actionable insights from messy results, and iterate quickly

Requirements

Must-have

  • Strong quantitative instincts with familiarity with LLM training pipelines, RLHF or RLVR, or evaluation methodology. Does not need a PhD but must have the research depth of a strong undergrad or master's researcher
  • Genuine obsession with how data structure, selection, and quality drive model behavior. This is the core of the work and must be intrinsically motivated
  • Ability to design lightweight experiments, move fast, and extract actionable insights from messy and incomplete results
  • Comfort working across domains, the work touches finance, software engineering, policy, and more. Must be able to context-switch and reason clearly across all of them
  • Bias toward building over theorizing. Ships experiments and iterates, does not get stuck in design

Nice-to-have

  • Prior work or internship at RL environment companies, AI safety organizations, or benchmarking organizations such as METR or Artificial Analysis
  • Background in evaluation methodology, benchmark design, or dataset curation at a lab or research organization
  • Exposure to annotator modeling, reward signal design, or alignment-related research

Benefits & perks

  • Equity
  • Bonus
  • Work with founding team from top companies

Interview process

  1. 1Application Review
  2. 2Initial Screen
  3. 3Take Home
  4. 4Take Home Review
  5. 5Onsite
  6. 6Offer
  7. 7Hired

Drop your CV for this role.

One PDF and your email. We read it, score your fit for this role at AfterQuery, and route the introduction through us.

How should we use your CV?

Free for engineers, always. By applying you agree to roles.cc holding your CV to match you. AfterQuery never sees your identity until you have agreed to an introduction.