davidjoseph-co
New

Research Engineer — Judgment Labs

OtherResearch Engineer
0 views0 saves0 applied

Quick Summary

Overview

Research Engineer — Judgment Labs Location: Chinatown, San Francisco, CA (Onsite, 5.5 days/week) Compensation: $225,000 – $400,000 base + competitive equity Visa Sponsorship: H-1B supported Experience Level: 1–4 years Employment Type: Full-Time About Judgment Labs Judgment Labs builds…

Requirements Summary

1–4 years of industry experience in applied AI or generative AI Experience building and evaluating AI agents in production Strong problem-solving ability with high agency and intellectual curiosity Comfortable handling large-scale, messy, real-world…

Technical Tools
ab-testingmachine-learning

Location: Chinatown, San Francisco, CA (Onsite, 5.5 days/week)
Compensation: $225,000 – $400,000 base + competitive equity
Visa Sponsorship: H-1B supported
Experience Level: 1–4 years
Employment Type: Full-Time

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM), helping organizations evaluate and monitor AI agent performance in production environments. Their platform identifies behavioral anomalies such as instruction drift, retrieval degradation, and reliability failures across complex workflows.

The company has raised more than $30M from investors including Lightspeed, SV Angel, Valor Equity Partners, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, and Kevin Hartz.

About the Role

~1 min read

Judgment Labs is seeking Research Engineers to build AI systems focused on analyzing agent interaction data, evaluating long-running agent behaviors, and improving autonomous systems through feedback and optimization workflows. This is a highly hands-on applied AI engineering role focused on production systems rather than pure academic research. Engineers will work directly with real-world agent data and deploy systems into production environments supporting finance, legal, operations, and other high-stakes domains.

  • Build systems to aggregate, index, and analyze large-scale agent interaction data
  • Develop agent-based systems for evaluating long-running agent behaviors
  • Design post-training and optimization workflows for AI agents
  • Build tooling and infrastructure for experimentation, analysis, and training
  • Work with retrieval systems, evaluation harnesses, and production AI infrastructure
  • Own projects end to end with significant autonomy
  • Collaborate closely with engineering and research teams

Requirements

~1 min read
  • 1–4 years of industry experience in applied AI or generative AI
  • Experience building and evaluating AI agents in production
  • Strong problem-solving ability with high agency and intellectual curiosity
  • Comfortable handling large-scale, messy, real-world datasets
  • Experience with retrieval systems, search algorithms, or evaluation harnesses
  • Ability to work onsite in San Francisco 5.5 days per week

Nice to Have

~1 min read
  • Experience with sandboxed or autonomous evaluation environments
  • Agent trajectory analysis or long-running behavior evaluation
  • Self-improving or continual learning systems
  • Experience at fast-moving AI startups or applied AI organizations
  • Reinforcement learning or machine learning systems expertise
  • Those who require heavily structured task management
  • Profiles with limited production AI experience
  • Pure research backgrounds without shipped systems
  • Candidates who cannot relocate or work onsite in San Francisco
  1. Initial approval review
  2. Founder vibe check and technical discussion
  3. Technical interview and problem-solving round
  4. Work trial
  5. Offer stage
  • Role is onsite in San Francisco, 5.5 days/week — please only apply if you can commit to this
  • H-1B visa sponsorship available

Shortlisted candidates will be contacted by David Joseph & Co., the recruiting partner managing this search on behalf of Judgment Labs.

Location & Eligibility

Where is the job
San Francisco, United States
On-site at the office
Who can apply
US

Listing Details

First seen
May 7, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
51%
Scored at
May 7, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

davidjoseph-coResearch Engineer — Judgment Labs