Staff Site Reliability Engineer
Quick Summary
Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more.
Reliability Strategy & Architecture - Define and lead long-term reliability strategy across services. Establish end-to-end system visibility frameworks and guide architecture for observability, detection, and resilience.
5+ years in SRE, Production Engineering, or related roles 3+ years operating at a senior or technical leadership level (Staff or equivalent scope) Deep expertise in: AWS and/or GCP Kubernetes and Helm Observability stacks (Prometheus, Grafana, or…
As a Staff SRE at Obsidian, you will define and drive the company-wide reliability vision for a complex, multi-tenant SaaS platform serving enterprise and financial customers. You will operate as a strategic partner to DevOps and Platform Engineering leadership, shaping a unified reliability strategy that scales across the organization.
Your core mandate: ensure Obsidian detects, diagnoses, and communicates system issues before customers are impacted—consistently and predictably.
This is a hands-on technical role that involves architecting and leading the implementation of systems that handle real-world complexity, including upstream SaaS dependencies, sparse and noisy signals, and mission-critical enterprise workloads.
Responsibilities
~1 min read- →Reliability Strategy & Architecture - Define and lead long-term reliability strategy across services. Establish end-to-end system visibility frameworks and guide architecture for observability, detection, and resilience.
- →Cross-Org Leadership - Partner across teams to embed reliability, standardize SLI/SLOs, and serve as a technical escalation expert.
- →Detection & Observability - Build intelligent detection systems (anomaly detection, connector health models) and enable self-service observability.
- →Incident Management - Define and evolve a tiered incident communication strategy, improve response practices, and lead postmortems to strengthen reliability and customer trust.
- →Execution - Contribute hands-on to system design, monitoring, and debugging across distributed systems and data pipelines.
Requirements
~1 min read- 5+ years in SRE, Production Engineering, or related roles
- 3+ years operating at a senior or technical leadership level (Staff or equivalent scope)
- Deep expertise in:
- AWS and/or GCP
- Kubernetes and Helm
- Observability stacks (Prometheus, Grafana, or equivalent)
- CI/CD systems (GitLab CI/CD, ArgoCD, etc.)
- Proven experience designing and scaling reliability systems for multi-tenant SaaS platforms
- Strong debugging and systems thinking across distributed microservices and legacy systems
- Demonstrated ability to lead initiatives that improve incident detection, response, and system resilience
- Hands-on engineering approach with a track record of building—not just configuring—reliability systems
Requirements
~1 min read- Experience in B2B SaaS serving enterprise or financial customers
- Familiarity with third-party SaaS connector architectures and ingestion patterns
- Experience building anomaly detection or intelligent alerting systems
- Experience designing customer-facing status pages and incident communication frameworks
- Drive org-wide reliability strategy
- Own and build new detection & observability systems
- Tackle complex distributed systems challenges
- Safeguard critical infrastructure for financial customers
- Issues caught and resolved before customer impact
- Reliability is measurable and continuously improving
- Teams self-serve observability with scalable tools
- Clear, proactive incident communication builds trust
- Reliability becomes a competitive advantage
What We Offer
~1 min readPlease note that the base pay range is a guideline and for candidates who receive an offer, the base pay will vary based on factors such as work location, as well as the knowledge, skills and experience of the candidate. In addition to a competitive base salary, this position is eligible for equity awards and may be eligible for sales commission or incentive compensation based on the role or function within the company.
At Obsidian, we are proud to be an equal-opportunity employer. We value diversity and hire for talent, passion, and compassion. In compliance with federal law, all persons hired will be required to submit satisfactory proof of identity and legal authorization. If you have a need that requires accommodation, please contact accommodations@obsidiansecurity.com
Information collected and processed as part of any job applications you choose to submit is subject to Obsidian’s Applicant Privacy Policy.
Location & Eligibility
Listing Details
- Posted
- May 6, 2026
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 60%
- Scored at
- May 6, 2026
Signal breakdown
Please let Obsidiansecurity know you found this job on Jobera.
3 other jobs at Obsidiansecurity
View all →Explore open roles at Obsidiansecurity.
Similar Staff Site Reliability Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.