Certifid
Certifid2mo ago
From $300/yr

Senior Site Reliability Engineer

RemoteFull-timesenior
EngineeringDevOps & InfrastructureSite Reliability EngineerDevops EngineerInfrastructure & Cloud
0 views0 saves0 applied

Quick Summary

Overview

Cybercrime is rising, reaching record highs in 2024. According to the FBI's IC3 report, total losses exceeded $16 billion. With investment fraud and BEC scams at the forefront,

Technical Tools
EngineeringDevOps & InfrastructureSite Reliability EngineerDevops EngineerInfrastructure & Cloud
Cybercrime is rising, reaching record highs in 2024. According to the FBI's IC3 report, total losses exceeded $16 billion. With investment fraud and BEC scams at the forefront, the message is clear: the real estate sector remains a lucrative target for cybercriminals. At CertifID, we take this threat seriously and provide a secure platform that verifies the identities of parties involved in transactions, authenticates wire transfer instructions, and detects potential fraud attempts. Our technology is designed to mitigate risks and ensure that every transaction is conducted with confidence and peace of mind.

We know we couldn’t take on this challenge without our incredible team. We have been recognized as one of the Best Startups to Work for in Austin, made the Inc. 5000 list, and won Best Culture by Purpose Jobs three years in a row. We are guided by our core values and our vision of a world without wire fraud. We offer a dynamic work environment where you can contribute to meaningful impact and be part of a team dedicated to enhancing security and fighting fraud.

We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability improvements across our production SaaS environment. You’ll play a critical role in building scalable infrastructure patterns, advancing observability, improving incident response, and partnering with engineering teams to embed reliability into system design and delivery.
 
This role is ideal for an experienced Sr. SRE who enjoys solving complex operational problems, building automation, and mentoring others.
  • Reliability & Platform Operations: Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets.
  • AI Agent Enablement:  Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.).
  • Incident Response: Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems.
  • Automation & Infrastructure: Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform.
  • Observability: Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal.
  • Collaboration & Mentorship: Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing.
  • Strategic Architect: You look beyond the "what" to understand the "why," providing insights that influence our GTM and technical direction.
  • Startup Veteran: You are comfortable moving fast and staying proactive in an environment where the playbook is still being written.
  • Relatable & Adaptable: You can navigate different personalities across the organization, from high-energy sales teams to analytical engineering partners.
  • Lifelong Learner: You have a thirst for learning, keeping up with emerging technologies and industry trends.
  • Experience: 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
  • Cloud Expertise: Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP.
  • Technical Stack: Strong Linux, networking, and distributed systems troubleshooting skills.
  • Containers: Strong experience with containers and orchestration (Kubernetes/EKS/AKS).
  • IaC & Tooling: Expertise with Infrastructure-as-Code (Terraform strongly preferred).
  • Programming: Strong scripting/programming skills in Python, Go, Bash, or C#/.NET.
  • Observability: Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry.
  • Flexible vacation
  • 12 company-paid holidays
  • 10 paid sick days
  • No work on your birthday
  • Health, dental, and vision Insurance (including a $0 option)
  • 401(k) with matching, and no waiting period
  • Equity
  • Life insurance
  • Generous parental paid leave
  • Wellness reimbursement of $300/year
  • Remote worker reimbursement of $300/year
  • Professional development reimbursement
  • Competitive pay
  • An award-winning culture
  • Location & Eligibility

    Where is the job
    Worldwide
    Fully remote, anywhere in the world
    Who can apply
    Same as job location
    Listed under
    Worldwide

    Listing Details

    Posted
    February 9, 2026
    First seen
    March 27, 2026
    Last seen
    April 28, 2026

    Posting Health

    Days active
    31
    Repost count
    0
    Trust Level
    43%
    Scored at
    April 28, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Certifid
    Employees
    125
    Founded
    2017
    View company profile
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    CertifidSenior Site Reliability EngineerFrom $0k