Quick Summary
We are a leading trading platform that is ambitiously expanding to the four corners of the globe.
Leadership and Strategy: Develop and execute the SRE and infrastructure strategy to support the organisation’s technology roadmap, product growth, and global expansion. Lead the continued evolution of existing DevOps and infrastructure capabilities into a mature SRE framework with documented SLOs, error budgets, and operating standards adopted across every engineering tribe.
Cloud Infrastructure and Automation: Oversee the design, automation, and optimisation of our cloud infrastructure (AWS, Kubernetes/EKS, Terraform, Helm, infrastructure-as-code). Drive the migration of remaining on-premise workloads into the cloud and the build-out of a multi-cloud disaster recovery footprint with backup on on-premise servers.
GitOps and Continuous Delivery Platform: Own the GitOps platform end-to-end. Consolidate the existing FluxCD estate, evaluate and execute the move to ArgoCD with progressive / canary delivery, and ensure secrets, image signing, environment promotion, and policy enforcement are uniform across all tribes.
Platform Reliability and Resilience: Build and maintain a reliable, scalable platform for regulated, multi-jurisdiction trading. Define and enforce reliability standards (SLIs, SLOs, SLAs, error budgets). Own the firm-wide disaster recovery strategy, including recovery-site selection, RTO/RPO targets per service tier, regular DR drills with business and risk stakeholders, and the playbooks that turn DR from theory into a tested capability.
Monitoring and Observability: Define and operate a single observability standard (metrics, logs, traces) that every engineering team consumes - including SLO instrumentation, golden signals, alerting hygiene, and on-call ergonomics. Make observability a product, not a side-effect of deployment.
Incident Management and Continuous Improvement: Work closely with incident, problem management, engineering, and operations teams to improve incident response, post-incident analysis, and long-term prevention with clear escalation criteria, P0/P1 acknowledgement SLAs, change-quality gates inside CI/CD pipelines, and DR readiness with clear DORA metrics. Drive a learning culture that turns recurring incident themes into systemic prevention.
Team Leadership and Development: Lead, hire, and develop SRE, DevOps, DBA, developer experience and technical support teams. Foster a strong engineering culture based on accountability, ownership, technical excellence, and continuous improvement.
Cross-functional Collaboration: Partner with development, security, compliance, risk, release, and business teams to ensure infrastructure and reliability priorities are aligned with product delivery, client experience, and regulatory obligations across all our operating jurisdictions.
Demonstrated experience as a Head of SRE, SRE Director, Infrastructure Director, Engineering Director, or similar senior leadership role in a major technology, fintech, or financial services company, or equivalent experience with high-load platform environments (low latency, high throughput, in-memory systems).
A strong background in SRE, DevOps, infrastructure engineering, cloud platforms, and operating complex, high-availability systems.
Hands-on technical understanding of modern infrastructure technologies, including AWS, Kubernetes, Terraform, FluxCD/ArgoCD, CI/CD tools, monitoring and alerting systems, and infrastructure-as-code practices.
Deep understanding of SRE principles, including SLOs, SLIs, SLAs, error budgets, incident management, observability, automation, and resilience engineering.
Experience as a manager of managers, with the ability to inspire people and hold them to account in equal measure, and in hiring, mentoring, performance management, and building strong engineering culture.
A proven ability to work collaboratively with various teams and to adeptly discuss technical details with engineering teams as well as translate these details into actionable language for non-technical stakeholders.
Strong analytical skills and the ability to use metrics and analytics to guide technical decisions and improvements.
A pragmatic approach and the ability to prioritize outcomes over process when necessary to drive effective and actionable results.
Location & Eligibility
Listing Details
- Posted
- May 15, 2026
- First seen
- May 15, 2026
- Last seen
- May 15, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 62%
- Scored at
- May 15, 2026
Signal breakdown
Please let Capital know you found this job on Jobera.
3 other jobs at Capital
View all →Explore open roles at Capital.
Similar Infrastructure jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.