Site Reliability Engineer

SpainSpainRemotefull-timemid
EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

About HappyRobot HappyRobot is the infrastructure for enterprises to build and orchestrate AI workforces. Our AI workers don't just communicate - they make decisions, take action,

Technical Tools
EngineeringDevops Engineer

HappyRobot is the infrastructure for enterprises to build and orchestrate AI workforces. Our AI workers don't just communicate - they make decisions, take action, and run operations autonomously across voice, email, and enterprise systems. Born in Y Combinator (S23) and backed by a16z and Base10 with over $60M raised, we power critical operations for global enterprises worldwide.

Our platform is battle-tested in the most demanding environments - where AI has real consequences. We started in logistics, built our own voice stack, models, and orchestration layer from the ground up, and are now bringing that infrastructure to every enterprise that runs the real economy. Learn more about our vision in our manifesto.

 

About the Role

~1 min read

We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.

This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.

 
  • 4+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)

  • Strong problem-solving skills and ability to dive into unfamiliar backend codebases

  • Strong Go and Kubernetes experience.

  • Familiarity with observability and monitoring tools (e.g., Grafana, Prometheus, Sentry)

  • Clear, calm communication under pressure — especially during live incidents

     
  • Experience working with distributed systems or services at scale

  • Built or maintained internal tooling for on-call teams or reliability workflows

  • Familiarity with deployment pipelines, CI/CD, or infra-as-code

  • Experience improving system observability (e.g., custom metrics, traces, log pipelines)

     

What We Offer

~4 min read
Opportunity to work at a high-growth AI startup, backed by top investors.

Location & Eligibility

Where is the job
Spain
Remote within one country
Who can apply
Open to applicants worldwide

Listing Details

Posted
May 20, 2026
First seen
May 21, 2026
Last seen
June 21, 2026

Posting Health

Days active
31
Repost count
0
Trust Level
24%
Scored at
June 22, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

happyrobot.aiSite Reliability Engineer