Staff Site Reliability Engineer
Quick Summary
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020,
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, our platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial. Become a Smartian and help optimize the way the healthcare system works for everyone. Learn more at smarterdx.com/careers.
We are seeking a Staff Site Reliability Engineer (SRE) to lead the reliability, scalability, and operational excellence of our production systems. This role is responsible for defining and driving SRE practices across the organization, including SLIs/SLOs, incident management, capacity planning, and resilience engineering. You will design and implement automation that reduces toil, improve observability and performance across our Kubernetes and AWS environments, and ensure our systems are highly available and fault-tolerant.
The ideal candidate is a deeply technical engineer with strong distributed systems expertise, a passion for operational rigor, and a track record of improving reliability through thoughtful engineering, automation, and data-driven decision-making.
**This role is fully remote within the US**
Responsibilities
~1 min read- →Define and evolve reliability standards for the SmarterDx platform, including SLIs, SLOs, and error budgets that align engineering work with customer impact.
- →Implement a “reliability” platform using Terraform and infrastructure-as-code best practices.
- →Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
- →Lead incident response, drive blameless postmortems, and implement systemic improvements to prevent recurrence.
- →Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
- →Provide production support for the SmarterDx platform, applying SRE principles to ensure availability, performance, and data durability.
- →Research, prototype, and advocate for new reliability practices, tooling, and architectural improvements across the engineering organization.
- 10+ years of software and software reliability engineering experience, with significant time spent operating and scaling distributed systems in production environments.
- 3+ years of hands-on experience running cloud-native infrastructure in AWS, including deep familiarity with containers, Kubernetes, monitoring, and alerting in live production systems.
- Proven experience defining and managing SLIs/SLOs, leading incident response, and driving postmortems and systemic reliability improvements.
- Strong expertise with Terraform and infrastructure-as-code practices for managing production infrastructure safely and reproducibly.
- Deep experience with Kubernetes architecture and operations, including workload reliability, cluster scaling, networking, and failure modes.
- Experience working in security-conscious, compliance-oriented environments where reliability and data protection are first-class concerns.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent practical experience operating large-scale systems.
Nice to Have
~1 min read- Reliability engineering experience with production database systems (e.g. Postgres)
- AWS
- Terraform
- Kubernetes
- Go, Python, Typescript
- Postgres
What We Offer
~1 min read$230K to $250K base salary
#LI-DNI
What We Offer
~1 min readListing Details
- Posted
- April 6, 2026
- First seen
- March 26, 2026
- Last seen
- April 19, 2026
Posting Health
- Days active
- 24
- Repost count
- 0
- Trust Level
- 40%
- Scored at
- April 19, 2026
Signal breakdown

SmarterDx leverages AI to enhance hospital revenue integrity by accurately analyzing patient data and uncovering missed revenue opportunities.
View company profilePlease let Smarterdx know you found this job on Jobera.
4 other jobs at Smarterdx
View all →Explore open roles at Smarterdx.
Similar Site Reliability Engineer jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.