Engineering Lead - QA Systems
Quick Summary
Think Different. Build the Future. 🚀 Our Mission Build everyday AGI. Trustworthy, consumer-grade agents that redefine human–AI collaboration for millions.
Build everyday AGI. Trustworthy, consumer-grade agents that redefine human–AI collaboration for millions. Software shouldn’t wait for commands; it should partner with you, amplifying what you can do every single day.
We’re a stealth team of elite founders and AI researchers, with backgrounds spanning Stanford, OpenAI, and DeepMind. We’re industry leaders in mobile and computer-use agents, bringing these capabilities to consumer scale.
Grounded in years of agent research, our AI is designed with trustworthiness and reliability as core pillars, not afterthoughts.
We are supported by tier-1 investors who funded the first generation of AI giants; now they’re backing us to build the next: everyday AGI. (Watch the demo)
If you see possibility where others see limits, read on.
You'll own quality for an AI product that is non-deterministic, runs on hardware you don't control, and ships into partner builds with hard launch dates. This is for the engineer who finds existential satisfaction in catching the bug before a user does — and the partner exec finds out from your dashboard, not from their inbox.
The testing systems that gate every release — automated agent test suites, on-device regression harnesses, model version compatibility matrices, and the device farm that runs them
The bug pipeline — triage, repro, root-cause, and the post-mortems that keep the same bug from shipping twice
The dashboards and SLAs that tell the team, in real time, whether what we shipped yesterday still works today
Research, on what to test about model behavior
Product engineers, on what to test about agent reliability
Forward-deployed engineers, on what partners actually care about in their environment
How to test a system that gives a different answer every time
How to build test infrastructure that scales from one shipped device to millions
Eval drift, locale-specific failures, hardware-class regressions, and the rest of the long tail of QA-ing AI in production
What shipping consumer AI at OEM scale actually requires
Reliable agentic systems from the people who published the canonical papers on it
After 30 days — You've audited every test we run today and produced a sharp doc on what's automated, what's manual, and what's nothing at all. You've stood up at least one piece of regression coverage that should have existed already.
After 60 days — You've shipped a real testing system — automated agent regressions, an on-device test farm, or a partner build verification harness — that the team relies on. Bug triage runs on rails you set up.
After 90 days — Your systems have caught real regressions before they shipped. Engineers across research, product, and FDE write code differently because of the harness you built. You're shaping the next quarter's quality roadmap.
What We Offer
~1 min readCompetitive cash and meaningful equity. Top-tier relocation and immigration support. SF, in person.
Location & Eligibility
Listing Details
- Posted
- May 27, 2026
- First seen
- May 27, 2026
- Last seen
- May 27, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 52%
- Scored at
- May 27, 2026
Signal breakdown
Please let agi-inc know you found this job on Jobera.
3 other jobs at agi-inc
View all →Explore open roles at agi-inc.
Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.