Engineering Lead - QA Systems

San Francisco Officefull-timelead

OtherEngineering Lead

0 views0 saves0 applied

Apply Now

Quick Summary

Overview

Think Different. Build the Future. 🚀 Our Mission Build everyday AGI. Trustworthy, consumer-grade agents that redefine human–AI collaboration for millions.

Technical Tools

OtherEngineering Lead

Build everyday AGI. Trustworthy, consumer-grade agents that redefine human–AI collaboration for millions. Software shouldn’t wait for commands; it should partner with you, amplifying what you can do every single day.

We’re a stealth team of elite founders and AI researchers, with backgrounds spanning Stanford, OpenAI, and DeepMind. We’re industry leaders in mobile and computer-use agents, bringing these capabilities to consumer scale.

Grounded in years of agent research, our AI is designed with trustworthiness and reliability as core pillars, not afterthoughts.

We are supported by tier-1 investors who funded the first generation of AI giants; now they’re backing us to build the next: everyday AGI. (Watch the demo)

If you see possibility where others see limits, read on.

You'll own quality for an AI product that is non-deterministic, runs on hardware you don't control, and ships into partner builds with hard launch dates. This is for the engineer who finds existential satisfaction in catching the bug before a user does — and the partner exec finds out from your dashboard, not from their inbox.

The testing systems that gate every release — automated agent test suites, on-device regression harnesses, model version compatibility matrices, and the device farm that runs them
The bug pipeline — triage, repro, root-cause, and the post-mortems that keep the same bug from shipping twice
The dashboards and SLAs that tell the team, in real time, whether what we shipped yesterday still works today

Research, on what to test about model behavior
Product engineers, on what to test about agent reliability
Forward-deployed engineers, on what partners actually care about in their environment

How to test a system that gives a different answer every time
How to build test infrastructure that scales from one shipped device to millions

Eval drift, locale-specific failures, hardware-class regressions, and the rest of the long tail of QA-ing AI in production
What shipping consumer AI at OEM scale actually requires
Reliable agentic systems from the people who published the canonical papers on it

After 30 days — You've audited every test we run today and produced a sharp doc on what's automated, what's manual, and what's nothing at all. You've stood up at least one piece of regression coverage that should have existed already.

After 60 days — You've shipped a real testing system — automated agent regressions, an on-device test farm, or a partner build verification harness — that the team relies on. Bug triage runs on rails you set up.

After 90 days — Your systems have caught real regressions before they shipped. Engineers across research, product, and FDE write code differently because of the harness you built. You're shaping the next quarter's quality roadmap.