I
Ifm Us12h ago
New
USD 150000–450000/yr

Eval360 - Error Analysis Engineer

United StatesUnited States·SunnyvaleFull-timemid
OtherEngineer
0 views0 saves0 applied

Quick Summary

Overview

About the Institute of Foundation Models The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models.

Technical Tools
OtherEngineer

The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mission is to advance AI research, support the next generation of AI builders, and develop impactful systems that improve how frontier models are trained, evaluated, deployed, and governed.

As part of our team, you will work closely with researchers, machine learning engineers, data scientists, software engineers, and product teams on some of the most important challenges in AI development. You will contribute to systems that help measure model quality, identify failure modes, and improve the reliability, safety, and readiness of model releases.

We are looking for an Eval360 - Error Analysis Engineer to help build, improve, and operate Eval360, an evaluation service that serves as a quality gate for AI models. This person will focus specifically on error analysis: understanding where models fail, why they fail, how those failures should be categorized, and how evaluation systems can better detect, measure, and prevent these issues before models are released.

You will collaborate with researchers, machine learning engineers, product managers, data scientists, and platform teams to develop AI evaluation applications and internal tools based on next-generation AI research. You will be part of a cross-functional team responsible for the full software development lifecycle, from requirements gathering and system design to implementation, deployment, monitoring, debugging, documentation, and continuous improvement.

The ideal candidate is comfortable working across the stack, including front-end interfaces for reviewing errors, back-end evaluation pipelines, data analysis workflows, model evaluation infrastructure, databases, dashboards, and APIs. This person should have strong software engineering skills, excellent analytical judgment, and the ability to turn ambiguous model failures into structured insights that improve evaluation quality.

  Collaborate with researchers, machine learning engineers, data scientists, product managers, and internal stakeholders to implement innovative software solutions for Eval360 and related model evaluation workflows.

  Build and improve Eval360 as an evaluation service that acts as a quality gate for model development, model comparison, and model release decisions.

  Perform deep error analysis on model outputs, including identifying failure patterns, categorizing issues, tracing root causes, and proposing improvements to evaluation methodology.

  Develop tools, workflows, and dashboards that make it easier for researchers and engineers to inspect model failures, compare model behavior, and understand quality regressions.

  Design and implement client-side and server-side architecture for evaluation review systems, error analysis interfaces, reporting tools, and internal evaluation applications.

  Develop responsive, usable interfaces that support error triage, annotation review, evaluation debugging, and model quality investigation.

  Build and maintain back-end services, APIs, data pipelines, and integrations that support evaluation execution, results storage, analysis, and reporting.

  Test software to ensure responsiveness, correctness, reliability, and efficiency across evaluation workflows.

  Troubleshoot, debug, and upgrade evaluation systems, including identifying issues in data processing, evaluation metrics, model output handling, job orchestration, and user-facing analysis tools.

  Create and maintain security, access control, and data protection settings for evaluation data, model outputs, annotations, and internal tooling.

  Write clear technical documentation for Eval360 systems, error taxonomies, evaluation workflows, debugging procedures, and user-facing tools.

  Work with researchers, data scientists, analysts, and machine learning engineers to improve evaluation quality, model diagnostics, and failure-mode visibility.

  Keep track of new development tools, evaluation frameworks, model analysis methods, data quality techniques, and architectures relevant to AI evaluation systems.

  Contribute to the design of error taxonomies, evaluation rubrics, quality thresholds, regression detection methods, and model readiness criteria.

  Help ensure Eval360 produces reliable, interpretable, and actionable signals for model quality gates.

  Contribute to research publications, technical reports, internal knowledge sharing, and external presentations where appropriate.

  Contribute to intellectual property and thought leadership in AI evaluation, error analysis, model quality measurement, and evaluation infrastructure.

  Perform all other duties as reasonably directed by the line manager that are aligned with these functional objectives.

  Bachelor's degree in Computer Science, Machine Learning, Data Science, Software Engineering, Statistics, or a related technical field required.

  Master's or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, Data Science, or a related field preferred.

  Proven experience as a Software Engineer, Full Stack Developer, Machine Learning Evaluation Engineer, Data Scientist, AI Engineer, or similar role.

  Experience building software systems for AI, machine learning, data analysis, evaluation, annotation, experimentation, or model monitoring.

  Experience working with AI algorithms and the ability to develop systems that accommodate AI-related requirements.

  Experience performing error analysis, model evaluation, data quality analysis, or failure-mode investigation for machine learning or language model systems.

  Experience developing internal applications, dashboards, review tools, or web-based workflows for technical users.

  Familiarity with common software stacks, including front-end frameworks, back-end services, databases, APIs, and cloud or internal infrastructure.

  Familiarity with GitHub, Git, CI/CD workflows, and collaborative software development practices.

  Knowledge of front-end languages and libraries such as HTML, CSS, JavaScript, TypeScript, React, Angular, or similar technologies.

  Knowledge of back-end languages and frameworks such as Python, Java, C#, Node.js, FastAPI, Flask, Django, or similar technologies.

  Familiarity with databases such as MySQL, PostgreSQL, MongoDB, or other structured and unstructured data stores.

  Familiarity with evaluation frameworks, experiment tracking systems, data pipelines, or machine learning infrastructure is strongly preferred.

  Ability to analyze complex model outputs and translate qualitative failures into structured, measurable categories.

  Strong problem-solving and troubleshooting skills, especially for ambiguous technical issues involving models, data, metrics, and software systems.

  Effective communication and collaboration skills, with the ability to work across research, engineering, data, and product teams.

  Strong attention to detail and a high bar for evaluation quality, reliability, and interpretability.

  Experience with large language models, foundation models, multimodal models, or model evaluation systems.

  Experience designing or using error taxonomies, evaluation rubrics, benchmark datasets, human evaluation workflows, or automated grading systems.

  Experience with Python-based data analysis tools such as pandas, NumPy, Jupyter, or similar.

  Experience with visualization or dashboarding tools for model quality analysis.

  Experience with distributed systems, job queues, workflow orchestration, or large-scale data processing.

  Experience working in a research environment or with fast-moving AI product and model teams.

This position is eligible for visa sponsorship.

What We Offer

~1 min read

  Comprehensive medical, dental, and vision benefits

  Bonus

  401K plan

  Generous paid time off, sick leave, and holidays

  Paid parental leave

  Employee assistance program

  Life insurance and disability insurance

Location & Eligibility

Where is the job
Sunnyvale, United States
On-site at the office
Who can apply
US

Listing Details

Posted
June 19, 2026
First seen
June 19, 2026
Last seen
June 20, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
79%
Scored at
June 19, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

I
Eval360 - Error Analysis EngineerUSD 150000–450000