Site Reliability Engineer (SRE) / DevOps Engineer
Quick Summary
SRE, Reliability & Automation Apply Site Reliability Engineering (SRE) principles to ensure availability, performance, latency, efficiency, change management, monitoring, emergency response,
Docker, Kubernetes, AWS, and/or GCP. (AWS / GCP certifications are preferred). Infrastructure as
We help companies go from myth to reality with custom technology. Handling billions of transactions and terabytes of data, our technology solutions have enabled companies like The New York Times, AppNexus, and NYU to achieve their goals for over a decade. Whether we’re rolling out a custom CMS that populates over 2,000 websites, providing complex loyalty offers or creating something completely custom, our expertise falls into three main categories:
- Building the tech solutions companies need to achieve their goals
- Improving and optimizing the technology companies already have
- Supporting in-house development teams with our own experts & resources.
We work with companies across all verticals, including martech, ad tech, fintech, ecommerce, and more.
Responsibilities
~1 min read- Apply Site Reliability Engineering (SRE) principles to ensure availability, performance, latency, efficiency, change management, monitoring, emergency response, and capacity planning.
- Act as a lead for service reliability, scalability, and performance for a set of products.
- Define and monitor SLA / SLO / SLI for each service and component.
- Troubleshoot and resolve moderate to high-complexity issues independently.
- Integrate and leverage AI tools and generative models to enhance SRE productivity, including automated runbook generation, intelligent log analysis, and predictive incident management.
- Build, maintain, and optimize fully automated CI/CD pipelines (Jenkins, GitHub Actions, Azure DevOps, or similar) for code deployment, test automation, code quality, and telemetry.
- Proactively identify and address vulnerabilities at the application layer to prevent security incidents.
- Implement continuous deployment solutions and ensure seamless integration with development workflows.
- Design, build, and maintain reliable and scalable cloud-native infrastructure and platforms.
- Manage and optimize containerized environments using Docker and guide teams on orchestration using Kubernetes/OpenShift (including K8s cluster management).
- Automate provisioning and configuration management using Infrastructure as Code tools such as Terraform and Ansible.
- Manage, improve, and monitor cloud infrastructure including shared services and landing zones across multi-cloud environments (AWS / GCP).
- Design effective monitoring, alerting, and log aggregation solutions using modern observability tools (Grafana, Prometheus, Loki, Splunk, DataDog, or similar).
- Build and improve monitoring and observability systems, including metrics, logs, and alerting.
- Collaborate with engineering teams to improve system performance and reliability.
- Coach software engineers on DevSecOps best practices, automation, and cloud adoption.
- Participate in incident response and root cause analysis.
Requirements
~1 min read- 4+ years of commercial experience in DevSecOps / SRE implementation and/or cloud computing/migration (AWS, Azure, GCP).
- Experience with superscale, supporting production systems in high-availability, resiliency, and fault-tolerant environments.
- Bachelor’s degree in Computer Science, Software Engineering, or a similar technology degree.
- Containers & Cloud Platforms: Docker, Kubernetes, AWS, and/or GCP. (AWS / GCP certifications are preferred).
- Infrastructure as Code (IaC) & Automation: Terraform, Ansible, and Shell scripting (Bash).
- CI/CD & Build Automation: Extensive experience with CI/CD pipelines (Jenkins, GitHub Actions, etc.).
- Monitoring & Analytics: Experience with platforms and tools such as Grafana, Prometheus, Loki, Tracing, Splunk, DataDog, etc..
- Security: Familiarity with infrastructure and application security best practices and vulnerability management.
- Scripting & Operating Systems: Unix Administration and basic backend development skills in Java or/and Python.
- Experience with utilizing and integrating AI/ML and AIOps platforms for automation, predictive analytics, or productivity enhancement in SRE/DevSecOps workflows (e.g., smart alerting, anomaly detection, LLM-assisted scripting).
- Strong troubleshooting skills and understanding of infrastructure and system design principles.
What We Offer
~1 min readRequirements
~1 min read
Listing Details
- Posted
- March 27, 2026
- First seen
- March 26, 2026
- Last seen
- April 20, 2026
Posting Health
- Days active
- 25
- Repost count
- 0
- Trust Level
- 23%
- Scored at
- April 20, 2026
Signal breakdown
Please let Lineate know you found this job on Jobera.
2 other jobs at Lineate
View all →Explore open roles at Lineate.
Similar Site Reliability Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.
