Kaseya
Kaseya9d ago
New

Site Reliability Engineer

CanadaCanada·Torontomid
EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

About Kaseya Kaseya is the leading provider of AI-powered IT management and cybersecurity software, serving Managed Service Providers (MSPs) and internal IT organizations worldwide.

Key Responsibilities

Set, monitor, and enforce SLOs, SLIs, and error budgets that keep our systems reliable Lead incident response, troubleshooting, and blameless postmortems that produce real fixes Build and maintain automated deployment, configuration management, and…

Requirements Summary

4 to 5 years of AWS production experience IaC ownership with Terraform or CloudFormation, including state management AWS ECS production experience (or strong Kubernetes background willing to ramp) Active on call rotation with incidents led and…

Technical Tools
ansibleanthropicawschatgptdatadogelasticsearchkubernetesmysqlpostgresqlterraformcybersecurity

Kaseya is the leading provider of AI-powered IT management and cybersecurity software, serving Managed Service Providers (MSPs) and internal IT organizations worldwide. Our comprehensive platform helps organizations efficiently manage, secure, and automate their IT environments, driving operational efficiency and long-term business success.

Backed by Insight Partners, a leading global software investor, Kaseya has experienced sustained double-digit growth and continues to expand its global footprint. Today, Kaseya supports customers in more than 20 countries and manages over 15 million endpoints worldwide.

Founded in 2000, Kaseya has built a culture centered around innovation, accountability, and results. We are a high-growth, high-performance organization that values individuals who are driven, adaptable, and committed to delivering exceptional outcomes for our customers and teammates alike.

At Kaseya, success comes from embracing challenges, moving with urgency, and continuously raising the bar. 

Kaseya is hiring a Site Reliability Engineer to keep our production systems healthy as we scale. You'll own the reliability of services that thousands of MSPs depend on every day. That means defining the SLOs we hold ourselves to, leading incidents when they happen, and building the automation that keeps things stable as we ship. The work is hands on, the on call rotation is real, and the environment runs heavily on AWS. If you treat reliability as a product instead of a chore, you'll fit in well here.

Responsibilities

~1 min read
  • Set, monitor, and enforce SLOs, SLIs, and error budgets that keep our systems reliable
  • Lead incident response, troubleshooting, and blameless postmortems that produce real fixes
  • Build and maintain automated deployment, configuration management, and infrastructure provisioning using Infrastructure as Code
  • Manage cloud and hybrid infrastructure with Terraform or CloudFormation, balancing cost, scalability, and resilience
  • Improve observability across systems through proactive monitoring, alerting, and dashboards that surface issues early
  • Partner with development teams to bake reliability into the SDLC, including deployment automation, capacity planning, and chaos engineering
  • Cut operational toil through automation, systems that recover themselves, and engineering solutions that scale
  • Support containerized and serverless workloads so they stay highly available and fault tolerant in production
  • Stay current on SRE, cloud, and observability practices and bring what works back to the team

Requirements

~1 min read
  • 4 to 5 years of AWS production experience
  • IaC ownership with Terraform or CloudFormation, including state management
  • AWS ECS production experience (or strong Kubernetes background willing to ramp)
  • Active on call rotation with incidents led and postmortems written
  • Working fluency with SLOs, SLIs, and error budgets in production
  • Kubernetes production experience
  • Broader observability tooling (Datadog, Dynatrace, CloudWatch, Elasticsearch/Kibana)
  • Chaos engineering
  • AWS Lambda or serverless workloads
  • Ansible, Chef, or Puppet
  • DevSecOps work (vulnerability scanning, secrets management, SOC2 or ISO 27001)
  • Production database support (RDS, PostgreSQL, MySQL)
  • Open source contributions or public technical portfolio

The expected annual base salary for this role is CAD $115,000 to CAD $130,000. Final offer will depend on experience, skills, and internal equity. This posting is for an existing vacancy.

 

Additional information
Kaseya provides equal employment opportunity to all employees and applicants without regard to race, religion, age, ancestry, gender, sex, sexual orientation, national origin, citizenship status, physical or mental disability, veteran status, marital status, or any other characteristic protected by applicable law.

Location & Eligibility

Where is the job
Toronto, Canada
On-site at the office
Who can apply
Open to applicants worldwide

Listing Details

Posted
May 18, 2026
First seen
May 18, 2026
Last seen
May 26, 2026

Posting Health

Days active
1
Repost count
0
Trust Level
67%
Scored at
May 20, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Kaseya
Kaseya
greenhouse

Kaseya is a leading provider of comprehensive IT management and cybersecurity solutions for businesses seeking to optimize their IT infrastructure and security practices.

Employees
5k+
Founded
2000
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

KaseyaSite Reliability Engineer