Research Engineer — Judgment Labs

United States·San Franciscomid

OtherResearch Engineer

0 views0 saves0 applied

Apply Now

Quick Summary

Overview

Research Engineer — Judgment Labs Location: Chinatown, San Francisco, CA (Onsite, 5.5 days/week) Compensation: $225,000 – $400,000 base + competitive equity Visa Sponsorship: H-1B supported Experience Level: 1–4 years Employment Type: Full-Time About Judgment Labs Judgment Labs builds…

Requirements Summary

1–4 years of industry experience in applied AI or generative AI Experience building and evaluating AI agents in production Strong problem-solving ability with high agency and intellectual curiosity Comfortable handling large-scale, messy, real-world…

Technical Tools

ab-testingmachine-learning

Location: Chinatown, San Francisco, CA (Onsite, 5.5 days/week)
Compensation: $225,000 – $400,000 base + competitive equity
Visa Sponsorship: H-1B supported
Experience Level: 1–4 years
Employment Type: Full-Time

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM), helping organizations evaluate and monitor AI agent performance in production environments. Their platform identifies behavioral anomalies such as instruction drift, retrieval degradation, and reliability failures across complex workflows.

The company has raised more than $30M from investors including Lightspeed, SV Angel, Valor Equity Partners, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, and Kevin Hartz.

About the Role

~1 min read

Judgment Labs is seeking Research Engineers to build AI systems focused on analyzing agent interaction data, evaluating long-running agent behaviors, and improving autonomous systems through feedback and optimization workflows. This is a highly hands-on applied AI engineering role focused on production systems rather than pure academic research. Engineers will work directly with real-world agent data and deploy systems into production environments supporting finance, legal, operations, and other high-stakes domains.

Build systems to aggregate, index, and analyze large-scale agent interaction data
Develop agent-based systems for evaluating long-running agent behaviors
Design post-training and optimization workflows for AI agents
Build tooling and infrastructure for experimentation, analysis, and training
Work with retrieval systems, evaluation harnesses, and production AI infrastructure
Own projects end to end with significant autonomy
Collaborate closely with engineering and research teams