Lineate24d ago

Site Reliability Engineer (SRE) / DevOps Engineer

Georgian Officemid

EngineeringDevOps & InfrastructureSite Reliability Engineer

0 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

SRE, Reliability & Automation Apply Site Reliability Engineering (SRE) principles to ensure availability, performance, latency, efficiency, change management, monitoring, emergency response,

Requirements Summary

Docker, Kubernetes, AWS, and/or GCP. (AWS / GCP certifications are preferred). Infrastructure as

Technical Tools

EngineeringDevOps & InfrastructureSite Reliability Engineer

We help companies go from myth to reality with custom technology. Handling billions of transactions and terabytes of data, our technology solutions have enabled companies like The New York Times, AppNexus, and NYU to achieve their goals for over a decade. Whether we’re rolling out a custom CMS that populates over 2,000 websites, providing complex loyalty offers or creating something completely custom, our expertise falls into three main categories:

Building the tech solutions companies need to achieve their goals
Improving and optimizing the technology companies already have
Supporting in-house development teams with our own experts & resources.

We work with companies across all verticals, including martech, ad tech, fintech, ecommerce, and more.

Responsibilities

~1 min read

Apply Site Reliability Engineering (SRE) principles to ensure availability, performance, latency, efficiency, change management, monitoring, emergency response, and capacity planning.
Act as a lead for service reliability, scalability, and performance for a set of products.
Define and monitor SLA / SLO / SLI for each service and component.
Troubleshoot and resolve moderate to high-complexity issues independently.
Integrate and leverage AI tools and generative models to enhance SRE productivity, including automated runbook generation, intelligent log analysis, and predictive incident management.

Build, maintain, and optimize fully automated CI/CD pipelines (Jenkins, GitHub Actions, Azure DevOps, or similar) for code deployment, test automation, code quality, and telemetry.
Proactively identify and address vulnerabilities at the application layer to prevent security incidents.
Implement continuous deployment solutions and ensure seamless integration with development workflows.

Design, build, and maintain reliable and scalable cloud-native infrastructure and platforms.
Manage and optimize containerized environments using Docker and guide teams on orchestration using Kubernetes/OpenShift (including K8s cluster management).
Automate provisioning and configuration management using Infrastructure as Code tools such as Terraform and Ansible.
Manage, improve, and monitor cloud infrastructure including shared services and landing zones across multi-cloud environments (AWS / GCP).

Design effective monitoring, alerting, and log aggregation solutions using modern observability tools (Grafana, Prometheus, Loki, Splunk, DataDog, or similar).
Build and improve monitoring and observability systems, including metrics, logs, and alerting.

Collaborate with engineering teams to improve system performance and reliability.
Coach software engineers on DevSecOps best practices, automation, and cloud adoption.
Participate in incident response and root cause analysis.

Requirements

~1 min read

4+ years of commercial experience in DevSecOps / SRE implementation and/or cloud computing/migration (AWS, Azure, GCP).
Experience with superscale, supporting production systems in high-availability, resiliency, and fault-tolerant environments.
Bachelor’s degree in Computer Science, Software Engineering, or a similar technology degree.
Containers & Cloud Platforms: Docker, Kubernetes, AWS, and/or GCP. (AWS / GCP certifications are preferred).
Infrastructure as Code (IaC) & Automation: Terraform, Ansible, and Shell scripting (Bash).
CI/CD & Build Automation: Extensive experience with CI/CD pipelines (Jenkins, GitHub Actions, etc.).
Monitoring & Analytics: Experience with platforms and tools such as Grafana, Prometheus, Loki, Tracing, Splunk, DataDog, etc..
Security: Familiarity with infrastructure and application security best practices and vulnerability management.
Scripting & Operating Systems: Unix Administration and basic backend development skills in Java or/and Python.
Experience with utilizing and integrating AI/ML and AIOps platforms for automation, predictive analytics, or productivity enhancement in SRE/DevSecOps workflows (e.g., smart alerting, anomaly detection, LLM-assisted scripting).
Strong troubleshooting skills and understanding of infrastructure and system design principles.