Manager, Software Engineering (Resilience Engineering)
Quick Summary
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.
We are seeking a seasoned Engineering Manager to lead our Resilience Engineering team. This role is critical in ensuring the safety and reliability of our production systems through proactive validation techniques, including production load testing and chaos engineering.
You will lead the development of systems and practices that allow engineers to safely test system behavior under stress and failure conditions in production, ensuring issues are discovered and mitigated before they impact real users.
Responsibilities
~1 min read- Define and drive the vision for resilience engineering at Affirm, with a focus on production load testing and chaos engineering as first-class engineering practices.
- Lead and mentor a team of engineers building platforms and tooling for safe production experimentation.
- Partner with infrastructure, product, and security leadership to embed resilience validation into the software development lifecycle.
- Establish best practices for safely testing system limits and failure scenarios in production.
- Own the design and evolution of platforms that enable safe, controlled production load testing and fault injection.
- Ensure strong safeguards are in place, including isolation boundaries, approval workflows, and automated rollback mechanisms to protect real users.
- Build systems that provide end-to-end observability, traceability, and auditability for all resilience experiments.
- Drive reliability improvements by systematically identifying weaknesses through load testing and chaos experiments.
- Establish monitoring, alerting, and incident response practices tailored to proactive resilience validation.
- Work closely with engineering teams to design and execute production load tests and chaos experiments safely.
- Partner with infrastructure teams to build guardrails around tests and experimentations.
- Enable teams to adopt resilience practices by providing reusable tooling, frameworks, and standardized workflows.
- Identify systemic weaknesses and lead cross-functional efforts to improve reliability and fault tolerance.
- Evangelize a culture of “test failure before failure tests you” across the organization.
-
Proven experience leading engineering teams in reliability, infrastructure, or distributed systems.
-
Hands-on experience with production load testing, chaos engineering, or large-scale system validation.
-
Experience with leveraging a chaos engineering vendor such as Gremlin, Harness, or something similar.
-
Strong understanding of failure modes in distributed systems, including latency, partial failure, and cascading outages.
-
Experience building or operating systems with strong safety guarantees (isolation, rate limiting, guardrails, auditability).
-
Familiarity with cloud-native environments (AWS, Kubernetes) and observability tooling.
-
Strong programming background (e.g., Python, Kotlin, Java, or similar).
-
Excellent problem-solving skills and the ability to balance long-term resilience investments with immediate business needs.
-
Strong communication and leadership skills, with a track record of influencing engineering practices across teams.
- This position requires either equivalent practical experience or a Bachelor’s degree in a related field.
What We Offer
~2 min readLocation & Eligibility
Listing Details
- Posted
- April 30, 2026
- First seen
- April 30, 2026
- Last seen
- May 4, 2026
Posting Health
- Days active
- 4
- Repost count
- 0
- Trust Level
- 76%
- Scored at
- May 4, 2026
Signal breakdown

We’re excited to announce that Affirm is now a remote-first company!
View company profilePlease let Affirm know you found this job on Jobera.
4 other jobs at Affirm
View all →Explore open roles at Affirm.
Similar Software jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.