Jailbreaking Lead (Red Team)

Remote (International), Berkeley Office, Remote (US)Remotefull-timelead

OtherLead

0 views0 saves0 applied

Apply Now

Quick Summary

Overview

About FAR.AI FAR.AI is a non-profit AI research institute dedicated to ensuring advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response.

Requirements Summary

Strong candidates for this role typically have many (but not necessarily all) of the following: Personally developed universal or near-universal jailbreaks against at least one leading frontier model; Demonstrated ability to discover non-obvious,…

Technical Tools

anthropicopenaicybersecuritymentoringpair-programming

FAR.AI is a non-profit AI research institute dedicated to ensuring advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response.

Since our founding in July 2022, we've grown quickly to 45+ staff, producing over 40 influential academic papers, and establishing leading AI Safety events. Our work is recognized globally, with publications at premier venues such as NeurIPS, ICML, and ICLR, and features in the Financial Times, Nature News and MIT Technology Review. Additionally, we help steer and grow the AI safety field throughdeveloping research roadmaps with renowned researchers such as Yoshua Bengio; running FAR.Labs, an AI safety-focused co-working space in Berkeley housing 40 members; and supporting the community through targeted grants to technical researchers.

FAR.AI’s red team is building toward a simple outcome: materially raising the bar for safety and security of the most widely deployed and capable AI systems in the world. We intend to be the tip of the spear in AI safety: the team that consistently finds the failures others miss, resulting in real mitigations, and setting the standard that labs and governments converge on. We also leverage our in-depth understanding of weaknesses in frontier models to advise frontier developers on mitigations, to guide our own research and grantmaking for improving model security, and to inform the public of key AI risks.

We are already one of the leading independent red-teaming organizations. Our work has helped most Western frontier model developers improve safeguards through pre- and post-deployment testing (e.g., we have directly influenced safeguards at major frontier developers like OpenAI and Anthropic), and we are increasingly embedded in high-leverage government efforts (e.g., leading a consortium building CBRN evaluations for the European Commission/EU AI Office, and collaborating with the UK AI Security Institute).

"FAR.AI's pre-deployment testing of GPT-5 series models identified failure modes and mitigations, improving the security of our model releases." – Senior Technical Program Manager, OpenAI

“FAR.AI have been a trusted and thoughtful collaborator for us, and they have progressed the state of frontier red-teaming through research like STACK. We expect this to be a high impact role and are excited to explore collaborations with the successful candidate.” – Xander Davies, Technical Lead, Red Team at UK AISI

You will be the senior technical owner of our jailbreaking practice reporting to Kellin Pelrine with a dotted line to Edward Yee. In 2026, we are scaling from a strong team with standout wins into a new level of impact for any AI red team globally:

Red-teaming all major frontier model releases (closed and open-weight) within days/weeks of release;
Expanding strategic engagements with governments and conducting pre-deployment testing with most frontier labs;
Deepening our testing of key risk areas like CBRN, cyber, and agents, and exploring new ones like AI control and alignment;
Building tools, agents, and insights that raise the global standard for red-teaming.

About the Role

~5 min read

Jailbreaking is the core technical engine of the red team. As Jailbreaking Lead, you own that engine. You are the person who personally breaks the hardest targets, sets the bar the rest of the team pushes toward, and makes sure we keep discovering the highest severity, universal vulnerabilities – the most important vulnerabilities to fix – in the most heavily defended frontier models on the planet, faster than anyone else.

We expect you to spend at least 50-70% of your time hands-on across 2026: breaking models, chaining novel attack classes through defense-in-depth stacks, helping to invent new techniques when existing ones fail, and setting the standard for what constitutes a significant vulnerability and a credible mitigation. The remaining time will go to managing/mentoring ICs, helping to shape the jailbreaking research agenda with Kellin, and making sure our findings land with frontier labs, governments, and the broader field. The rest of the red team will empower your work, whether through direct collaboration and support, novel research and red-teaming infrastructure, or toolkits and agent build-outs.

This is a senior IC role by default, intended to attract a world-class jailbreaker whose personal mission is to find critical jailbreaks in the most heavily defended domains of the leading frontier AI models, and who has a track record of repeatedly doing so. We are open to a management track for candidates who want to hire and lead a jailbreaking team over time. We will not water down the IC bar to support the management track: both versions of this role require you to be, or be on a clear trajectory to being, one of the best jailbreakers in the world.

In practice, this role spans:

Lead jailbreaking on the highest-stakes engagements:
- Personally develop universal and near-universal jailbreaks against frontier closed- and open-weight models, in CBRNE, cyber, agentic security, extreme persuasion, and emerging risk domains;
- Systematically dismantle defense-in-depth stacks (input filters, model-level refusal and safe completion, reasoning monitors, output filters, account-level moderation), chaining novel and established techniques;
- Escalate initial vulnerabilities to expose their most severe form, maximising universality, success rate, and capability of elicited output;
- Own the technical bar for vulnerability severity and generality on every major engagement.
Push the frontier of jailbreaking techniques:
- Invent new attack classes when existing techniques fail (e.g., we have recently shipped novel attacks against Constitutional Classifiers and fine-tuning APIs);
- Monitor and rapidly incorporate state-of-the-art methods from the literature, and build our own proprietary portfolio;
- Shape the jailbreaking research agenda in partnership with Kellin, ensuring our toolkit stays ahead as defences evolve;
- Stress-test novel affordances (innovations in agents, tool use, long context, multimodal, reasoning, etc.) as frontier systems evolve.
Raise the technical bar across the team:
- Set the standard for rigour, creativity, and precision in jailbreaking across the red team;
- Mentor ICs on attack craft, running pairing sessions, post-engagement retros, and internal writeups that turn your craft into team capability;
- Review major red-teaming deliverables for technical quality, severity judgment, and clarity;
- If on the management track: hire, manage, and grow a jailbreaking team without sacrificing your personal technical edge.
Translate jailbreaks into real-world impact:
- Work directly with frontier labs and government agencies so that findings lead to real mitigations, not just disclosed vulnerabilities;
- Contribute to public reports, benchmarks, and the FAR.AI safety leaderboard that shape industry norms;
- Make precise, calibrated technical judgments about what is universal, what is reliable, and what a capable threat actor could actually do with a finding.

This role would be a great fit if you:

Obsess over frontier model jailbreaks the way elite security researchers obsess over zero-days. If breaking the newest, most heavily defended models is already what you do for fun, you are the person we are looking for;
Have a track record (public, private, or both) of finding non-obvious, high-severity vulnerabilities in frontier AI systems, including universal or near-universal jailbreaks in the most heavily defended risk domains;
Combine deep technical craft with the judgment to know which vulnerabilities actually matter and how defences are put together across different frontier models. Have the communication skills to make frontier labs and governments act on them;
Are excited by high-stakes, real-world technical work where success is measured by mitigations adopted and standards shifted, not papers published;
Want to work with leading AI companies, governments, and academics. We're a lean organisation and leverage impact through strategic partnerships;
Value independence and the ability to publish and speak honestly about risks;
Care deeply about AI safety and impacting how advanced AI systems are deployed;
Has a “get shit done” attitude and is willing to do whatever it takes to change the world;
Thrive in fast-moving, ambiguous environments with shifting threat models and defences.

This role would be a poor fit if you:

Prefer narrowly scoped research problems with clear academic metrics of success;
Want to prioritise foundational research disconnected from red-teaming outcomes. Our Research Scientist position may be a better fit;
Are primarily motivated by equity upside or compensation;
Wish to operate within clearly defined bounds. FAR.AI is planning to double in size in the next 12-18 months, and the red team will grow even faster. A lot will change and navigating uncertainty is core to the role;
Are not willing to move at the velocity we need or want a slow-moving, highly structured environment with fixed problem definitions;
Are looking for a pure management role where you stop shipping jailbreaks yourself. Even on the management track, this role requires you to remain a top-tier hands-on jailbreaker.
Are not willing to be relentless.

Strong candidates for this role typically have many (but not necessarily all) of the following:

Personally developed universal or near-universal jailbreaks against at least one leading frontier model;
Demonstrated ability to discover non-obvious, high-severity vulnerabilities in frontier AI systems, complex software systems, or other hardened adversarial targets;
Deep, hands-on jailbreaking experience with demonstrated success against modern frontier models with layered defences, including chaining multiple attack techniques through defense-in-depth stacks;
Experience with black-box optimisation methods, multimodal attacks, and/or agentic red-teaming;
Deep understanding of large language model architectures, training processes, and failure modes, including how these factors influence model behavior under adversarial conditions.
Strong existing track record in AI, adversarial ML, security, or another highly technical subject (e.g. CS, cybersecurity, math, physics);
Have thrived in rapidly evolving environments where techniques go obsolete fast and you have to invent your way forward;
Invented novel attack classes;
Demonstrated drive for mission/impact and desire to create real impact on frontier AI systems;
Demonstrated relentlessness in achieving ambitious goals.

It is a strong plus (but not required) if you have:

Prior collaboration with AI labs, security teams, or government safety institutes;
A track record in top CTF teams, offensive security research, or adversarial ML research;
Published work in AI safety, security, or robustness;
Can communicate technical findings and recommended mitigations to both technical and non-technical audiences, including frontier lab safety teams and senior policymakers;
Prior experience mentoring technical ICs or leading a small technical team (required only for the management track).

If you are earlier in your career but have a standout jailbreaking track record, we still encourage you to apply as we are also hiring for less senior positions on the team. If you are missing the hands-on jailbreaking depth but have adjacent strengths, we encourage you to consider our other open roles on the red team.

If based in the USA or Singapore, you will be an employee of FAR.AI (501(c)(3) research non-profit / non-profit CLG). Outside the USA or Singapore, you will be employed via an EOR organisation on behalf of FAR.AI.

Location: Remote globally. We can sponsor US or Singapore visas.
Hours: Full-time. Expect up to one trip per month for convenings, government meetings, or team gatherings.
Compensation: USD 170,000–250,000, depending on experience. Exceptional candidates may be offered more.

We know these roles are rare and the skill combination is unusual. If you're uncertain whether your background fits but are excited by the mission and challenges, we encourage you to apply – we're looking for excellence and potential, not a perfect resume match.