← All open roles
AfterQuery logoAfterQueryafterquery.com
Raised $200K

Software Engineer - RL Environments

San Francisco, CAIn-personFull-Time1-4 years

About

About the role

As a SWE (Environments), you will design the datasets and evaluation rubrics that directly influence how frontier models learn. You'll work hands-on with research teams at top AI labs, experimenting with data-collection strategies, diagnosing model failure modes, and developing metrics to determine whether a model is actually improving. You'll go from hypothesis to live experiment quickly, and your output will feed directly into model training runs at scale. Day to day, you will design data slices that expose meaningful failure modes across domains like finance, code, and enterprise workflows. You will build and refine reward signals for RLHF and RLVR pipelines. You will develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on alignment and capability. You will partner with lab research teams to translate their training objectives into concrete data and evaluation specifications. Compensation is $200K base plus profit share of roughly 150% of base, bringing expected total cash to around $500K, plus competitive equity. Design data slices and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines Model annotator behavior and run experiments to improve different model capabilities Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability Create and manage both real-world and synthetic data pipelines Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications

Requirements

Must-have

  • 1–4 years of software engineering experience with strong technical depth
  • Genuine obsession with how data structure, selection, and quality drive model behavior
  • Ability to design lightweight experiments, move fast, and extract actionable insights from messy results
  • Comfort working across domains, finance, software engineering, policy, and more
  • Track record of shipping, bias toward building, not theorizing

Nice-to-have

  • Prior work or internship at an RL environment company, AI safety org, or benchmarking org (METR, Artificial Analysis, or equivalent)
  • Former founder or early engineer at an early-stage startup
  • Experience building data pipelines (real-world + synthetic)
  • Familiarity with RLHF / RLVR training pipelines

Benefits & perks

  • Profit share around 150% of base
  • Competitive equity
  • Direct impact on frontier AI model development
  • Work with world's leading AI labs

Interview process

  1. 1Application Review
  2. 2Initial Screen
  3. 3Second Round
  4. 4Take Home
  5. 5Post Take Home
  6. 6Work Trial
  7. 7Offer
  8. 8Hired

Drop your CV for this role.

One PDF and your email. We read it, score your fit for this role at AfterQuery, and route the introduction through us.

How should we use your CV?

Free for engineers, always. By applying you agree to roles.cc holding your CV to match you. AfterQuery never sees your identity until you have agreed to an introduction.