Capital2mo ago

Head of SRE and Infrastructure

Poland·WarsawHybridexecutive

OtherInfrastructure

5 views0 saves0 applied

Apply Now

Quick Summary

Overview

Technical Tools

argocdawskubernetesterraformci-cdfintechmentoringperformance-management

We are a leading trading platform that is ambitiously expanding to the four corners of the globe. Our top-rated products have won prestigious industry awards for their cutting-edge technology and seamless client experience. We deliver only the best, so we are always in search of the best people to join our ever-growing team.

The Head of SRE and Infrastructure will play a critical role in shaping the reliability, scalability, and resilience of our infrastructure as we continue to grow globally. This is a senior technology leadership role - you will be responsible for an organization of approximately 40 individuals spread across SRE, DevOps, DBA, developer experience and technical support teams. You will own the development and execution of our SRE and infrastructure strategy, and the build-out of reliable high-load systems. This role combines strategic leadership with deep technical understanding of modern cloud infrastructure, DevOps practices, observability, and operational excellence.

Leadership and Strategy: Develop and execute the SRE and infrastructure strategy to support the organisation’s technology roadmap, product growth, and global expansion. Lead the continued evolution of existing DevOps and infrastructure capabilities into a mature SRE framework with documented SLOs, error budgets, and operating standards adopted across every engineering tribe.

Cloud Infrastructure and Automation: Oversee the design, automation, and optimisation of our cloud infrastructure (AWS, Kubernetes/EKS, Terraform, Helm, infrastructure-as-code). Drive the migration of remaining on-premise workloads into the cloud and the build-out of a multi-cloud disaster recovery footprint with backup on on-premise servers.

GitOps and Continuous Delivery Platform: Own the GitOps platform end-to-end. Consolidate the existing FluxCD estate, evaluate and execute the move to ArgoCD with progressive / canary delivery, and ensure secrets, image signing, environment promotion, and policy enforcement are uniform across all tribes.

Platform Reliability and Resilience: Build and maintain a reliable, scalable platform for regulated, multi-jurisdiction trading. Define and enforce reliability standards (SLIs, SLOs, SLAs, error budgets). Own the firm-wide disaster recovery strategy, including recovery-site selection, RTO/RPO targets per service tier, regular DR drills with business and risk stakeholders, and the playbooks that turn DR from theory into a tested capability.

Monitoring and Observability: Define and operate a single observability standard (metrics, logs, traces) that every engineering team consumes - including SLO instrumentation, golden signals, alerting hygiene, and on-call ergonomics. Make observability a product, not a side-effect of deployment.

Incident Management and Continuous Improvement: Work closely with incident, problem management, engineering, and operations teams to improve incident response, post-incident analysis, and long-term prevention with clear escalation criteria, P0/P1 acknowledgement SLAs, change-quality gates inside CI/CD pipelines, and DR readiness with clear DORA metrics. Drive a learning culture that turns recurring incident themes into systemic prevention.

Team Leadership and Development: Lead, hire, and develop SRE, DevOps, DBA, developer experience and technical support teams. Foster a strong engineering culture based on accountability, ownership, technical excellence, and continuous improvement.

Cross-functional Collaboration: Partner with development, security, compliance, risk, release, and business teams to ensure infrastructure and reliability priorities are aligned with product delivery, client experience, and regulatory obligations across all our operating jurisdictions.

Demonstrated experience as a Head of SRE, SRE Director, Infrastructure Director, Engineering Director, or similar senior leadership role in a major technology, fintech, or financial services company, or equivalent experience with high-load platform environments (low latency, high throughput, in-memory systems).

A strong background in SRE, DevOps, infrastructure engineering, cloud platforms, and operating complex, high-availability systems.

Hands-on technical understanding of modern infrastructure technologies, including AWS, Kubernetes, Terraform, FluxCD/ArgoCD, CI/CD tools, monitoring and alerting systems, and infrastructure-as-code practices.

Deep understanding of SRE principles, including SLOs, SLIs, SLAs, error budgets, incident management, observability, automation, and resilience engineering.

Experience as a manager of managers, with the ability to inspire people and hold them to account in equal measure, and in hiring, mentoring, performance management, and building strong engineering culture.

A proven ability to work collaboratively with various teams and to adeptly discuss technical details with engineering teams as well as translate these details into actionable language for non-technical stakeholders.

Strong analytical skills and the ability to use metrics and analytics to guide technical decisions and improvements.

A pragmatic approach and the ability to prioritize outcomes over process when necessary to drive effective and actionable results.