USD 210000-270000/yr

Staff Site Reliability Engineer

United StatesSilicon Valleylead
OtherDevOps & InfrastructureSite Reliability EngineerStaff Site Reliability EngineerInfrastructure & Cloud
0 views0 saves0 applied

Quick Summary

Overview

Our Mission Healthcare should work for patients, but it doesn’t. In their time of need, they call down outdated insurance directories. Then wait on hold. Then wait weeks for the privilege of a visit.

Technical Tools
OtherDevOps & InfrastructureSite Reliability EngineerStaff Site Reliability EngineerInfrastructure & Cloud

Healthcare should work for patients, but it doesn’t. In their time of need, they call down outdated insurance directories. Then wait on hold. Then wait weeks for the privilege of a visit. Then wait in a room solely designed for waiting. Then wait for a surprise bill. In any other consumer industry, the companies delivering such a poor customer experience would not survive. But in healthcare, patients lack market power. Which means they are expected to accept the unacceptable.

 

Zocdoc’s mission is to give power to the patient. To do that, we’ve built the leading healthcare marketplace that makes it easy to find and book in-person or virtual care in all 50 states, across +200 specialties and +12k insurance plans. By giving patients the ability to see and choose, we give them power. In doing so, we can make healthcare work like every other consumer sector, where businesses compete for customers, not the other way around. In time, this will drive quality up and prices down. 

 

We’re 18 years old and the leader in our space, but we are still just getting started. If you like solving important, complex problems alongside deeply thoughtful, driven, and collaborative teammates, read on.

 

As a Staff Site Reliability Engineer (SRE) at Zocdoc, you will shape how we operate safe, observable, and scalable systems across the company. You’ll lead initiatives that improve incident response, define reliability patterns, and drive organization-wide operational excellence—helping us build systems that fail gracefully, recover quickly, and scale efficiently.

You won’t just respond to incidents—you’ll help design the systems, tools, and practices that teams rely on to avoid them. Your work will clarify ownership, improve on-call quality, and strengthen our observability posture. By embedding best practices into how we build and run services, you’ll enable every engineering team at Zocdoc to move faster, safer, and with greater confidence.

  • Stay composed and clear during incidents, and use them as catalysts for systemic improvement
  • Treat observability as a strategic capability that enables better decisions, not just better dashboards
  • Build scalable, default-safe patterns and tools that support resiliency and reliability
  • Build strong cross-functional relationships and navigate complex systems to drive scalable, reliable outcomes
  • Are endlessly curious—about how systems fail, how teams operate, and how to make both better
  • Share knowledge generously and help others build with confidence and operational rigor
  • Participate in and influence high-impact incident response efforts, contributing calm decision-making and retrospective-driven learning
  • Define and evolve org-wide incident practices, retrospectives, and reliability tooling 
  • Architect and evolve observability platforms that offer actionable insight into system health, business-critical paths, and failure modes
  • Lead the development of reliability and observability practices, including alerting hygiene, SLOs, and deployment safeguards
  • Guide teams in building resilient, fault-tolerant services through consultative design, operational reviews, and safety-focused defaults
  • Partner with Product, Platform, and Security teams to ensure new systems are operable and scalable from day one
  • Design and implement internal tools that improve deployment safety, incident coordination, and production readiness
  • Mentor engineers across teams in operational rigor, reliability principles, and system debugging
  • 8+ years of experience operating and scaling production infrastructure in cloud-native environments
  • Deep expertise in incident response, debugging distributed systems, and driving reliability improvements
  • Strong working knowledge of observability stacks (metrics, logs, traces), alerting strategy, and SLO design
  • Experience implementing fault isolation, graceful degradation, and chaos engineering practices
  • Proficiency with infrastructure-as-code and config management (e.g., Terraform, CDK, etc.)
  • A proven ability to influence teams through standards, tooling, and culture—not just code
  • A growth mindset and strong communication skills for mentoring, influencing, and aligning across teams

What We Offer

~2 min read
Flexible, hybrid work environment
Unlimited Vacation
100% paid employee health benefit options (including medical, dental, and vision)
Commuter Benefits
401(k) with employer funded match
Corporate wellness program with Wellhub
Sabbatical leave (for employees with 5+ years of service)
Competitive paid parental leave and fertility/family planning reimbursement
Cell phone reimbursement
Catered lunch everyday along with beverages and snacks
Employee Resource Groups and ZocClubs to promote shared community and belonging
Great Place to Work Certified

Listing Details

First seen
April 3, 2026
Last seen
April 27, 2026

Posting Health

Days active
23
Repost count
0
Trust Level
41%
Scored at
April 27, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Zocdoc
Zocdoc
greenhouse

Zocdoc is an online medical care scheduling service that allows people to find and book in-person or telemedicine appointments for medical or dental care. It also functions as a physician and dentist rating and comparison database.

Employees
750
Founded
2007
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

ZocdocStaff Site Reliability Engineer USD 210000-270000