avalara
avalara1mo ago

Lead Site Reliability Engineer

RomaniaRomaniaRemotelead
EngineeringDevops Engineer
1 views0 saves0 applied

Quick Summary

Key Responsibilities

Role Summary As Avalara continues to scale its global SaaS platform and accelerate toward an AI-first operating model, we must fundamentally transform how reliability, deployment, and operational excellence are engineered.

Requirements Summary

B.S. in Computer Science or Engineering 10+ years of experience in SaaS, distributed systems, or reliability engineering Strong programming experience in Go, Java and Python Deep expertise in observability tools (Prometheus, Grafana, OpenTelemetry,…

Technical Tools
awsazuredockergcpgografanajavakubernetesprometheuspulumipythonterraformci-cddistributed-systemslinuxmentoringnetworkingperformance-optimizationsaas

Responsibilities

~1 min read

You will lead how reliability is engineered across Avalara's global SaaS platform as we scale and move toward an AI-first operating model. You will focus on building a modern, automation-first reliability ecosystem that improves system stability, reduces operational risk, and enables faster, safer product delivery. You will work across multi-cloud environments to design self-healing systems, advance observability, and modernise deployment practices. As a senior individual contributor, you will also raise the technical bar by shaping standards, mentoring engineers, and driving measurable improvements in reliability and performance.

#LI-REMOTE

  • Own and evolve the reliability strategy for distributed SaaS systems across multi-cloud platforms
  • Design and implement AI-driven operations, including predictive monitoring, anomaly detection, and automated root cause analysis
  • Build and scale observability solutions using tools such as Prometheus, Grafana, and OpenTelemetry
  • Create self-healing systems and automation frameworks that reduce manual operational work
  • Improve deployment practices using feature flags, progressive delivery, and safe rollout strategies
  • Ensure reliability and performance of CI/CD pipelines and infrastructure as code environments
  • Strengthen system availability, scalability, and fault tolerance across Kubernetes-based platforms
  • Lead incident response, improve recovery times, and implement lasting fixes through post-incident reviews
  • Integrate AI-driven workflows into incident detection, triage, and resolution to improve operational efficiency
  • Mentor engineers and drive adoption of automation-first and AI-first reliability practices
  • 10+ years of experience in SaaS, distributed systems, or site reliability engineering
  • Programming skills in Go, Java, or Python
  • Deep experience with observability tools such as Prometheus, Grafana, and OpenTelemetry
  • Hands-on experience with Kubernetes, containerisation, and multi-cloud platforms (AWS, GCP, Azure, or OCI)
  • Strong understanding of Linux systems, networking, and cloud-native architectures
  • Proven ability to design automation, improve system reliability, and apply AI or machine learning to operational workflows

AI is embedded in our workflows, decision-making, and products.  Success here requires embracing AI as an essential capability.

  • You’ll bring experience using AI and AI-related technologies, ready to thrive here.

  • You’ll apply AI every day to business challenges - improving efficiency, contributing solutions, and driving results for your team, our company, and our customers.

  • You’ll grow with AI by staying curious about new trends and best practices, and by sharing what you learn so others can benefit too.

Total Rewards 

In addition to a great compensation package, paid time off, and paid parental leave, many Avalara employees are eligible for bonuses. 

Health & Wellness 
Benefits vary by location but generally include private medical, life, and disability insurance. 

Inclusive culture and diversit
Avalara strongly supports diversity, equity, and inclusion, and is committed to integrating them into our business practices and our organizational culture. We also have a total of 8 employee-run resource groups, each with senior leadership and exec sponsorship. 

Requirements

~1 min read

We’re defining the relationship between tax and tech.

We’ve already built an industry-leading cloud compliance platform, processing over 54 billion customer API calls and over 6.6 million tax returns a year. Our growth is real - we're a billion dollar business - and we’re not slowing down until we’ve achieved our mission - to be part of every transaction in the world.

We’re bright, innovative, and disruptive, like the orange we love to wear. It captures our quirky spirit and optimistic mindset. It shows off the culture we’ve designed, that empowers our people to win. We’ve been different from day one. Join us, and your career will be too.

Supporting diversity and inclusion is a cornerstone of our company — we don’t want people to fit into our culture, but to enrich it. All qualified candidates will receive consideration for employment without regard to race, color, creed, religion, age, gender, national orientation, disability, sexual orientation, US Veteran status, or any other factor protected by law. If you require any reasonable adjustments during the recruitment process, please let us know.

Location & Eligibility

Where is the job
Romania
Remote within one country
Who can apply
RO

Listing Details

Posted
May 12, 2026
First seen
May 12, 2026
Last seen
June 27, 2026

Posting Health

Days active
44
Repost count
0
Trust Level
24%
Scored at
June 26, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

avalaraLead Site Reliability Engineer