GoDaddy
GoDaddy2mo ago

Software Development Engineer

EngineeringOtherDevOps & InfrastructureSite Reliability EngineerSoftware Development EngineerDevops EngineerInfrastructure & Cloud
7 views0 saves0 applied

Quick Summary

Overview

Location Details: At GoDaddy the future of work looks different for each team. Some teams work in the office full-time, others have a hybrid arrangement (they work remotely some days and in the office some days) and some work entirely remotely.

Technical Tools
ansiblegrafanajavascriptprometheuspythonci-cdlinuxperformance-optimization

At GoDaddy the future of work looks different for each team. Some teams work in the office full-time, others have a hybrid arrangement (they work remotely some days and in the office some days) and some work entirely remotely.

This is a remote position, so you’ll be working remotely from your home. You may occasionally visit a GoDaddy office to meet with your team for events or meetings.  

GoDaddy is looking for a Software Development Engineer / Site Reliability Engineer to join our Monitoring and Observability team. In this hybrid SDE+SRE role, you'll design and build scalable software solutions while also owning the reliability, performance, and availability of systems serving millions of customers worldwide. You'll focus on developing high-quality applications and platforms that enable proactive monitoring, deep insights, and rapid troubleshooting — and you'll go a step further by operating those platforms, responding to incidents, and driving continuous reliability improvements across cloud and on-prem environments.

Responsibilities

~1 min read
  • Design, develop, and maintain scalable observability and monitoring platforms using Python and modern software engineering practices, including systems for metrics, logging, tracing, and visualization (e.g., Loki, Grafana, Tempo and Mimir(LGTM), Prometheus, ICINGA2, Site24x7 and BigPanda).
  • Build and enhance production-grade software services, APIs, and tooling that improve system visibility, reliability, and developer experience.
  • Collaborate with cross-functional teams to define requirements, architect solutions, and deliver robust, maintainable code.
  • Develop automation and self-service tools to streamline workflows and improve engineering productivity.
  • Implement and evolve infrastructure-as-code and configuration management using tools such as Terraform, Ansible, Puppet, or Chef.
  • Manage and troubleshoot containerized workloads across Docker, Kubernetes (including EKS/ECS), and Fargate, ensuring configuration consistency and operational reliability.
  • Contribute to system design, code reviews, testing strategies, and performance optimization for large-scale distributed systems.
  • Support and enhance CI/CD pipelines, ensuring efficient, high-quality software delivery.
  • Implement SLIs, SLOs, and error budgets to define and track service health and reliability targets, balancing reliability with feature velocity.
  • Build and maintain dashboards and alerting that provide actionable insights and minimize alert fatigue; champion SLO-based alerting and noise reduction.
  • Respond to automated alerts and production incidents, participating in on-call rotations supporting global operations.
  • Partner with engineering teams to resolve availability, performance, and security issues.
  • Lead blameless postmortems and root cause analysis (RCA), converting findings into durable fixes, runbooks, and repeatable automation.
  • Troubleshoot complex system issues using advanced diagnostics (e.g., strace, tcpdump, systemd) and partner with reliability and infrastructure teams to improve application resilience and performance.
  • 3+ years of professional experience in software development, building and delivering scalable, production-grade applications or platforms.
  • 3+ years of experience with observability platforms (metrics, logging, tracing, and visualization).
  • 3+ years of experience with event correlation or incident management platforms (e.g., BigPanda, Site24x7, ServiceNow, PagerDuty).
  • 2+ years of hands-on incident response experience, including on-call participation and postmortem facilitation.
  • 2+ years of professional experience with containerization and orchestration technologies in a production SRE context.
  • Strong programming experience in Python (and/or JavaScript, Go, or similar languages) with a focus on writing clean, maintainable, and testable code.
  • Experience designing and building distributed systems, APIs, or developer platforms.
  • Familiarity with observability concepts (metrics, logging, tracing) and tools such as Open Telemetry(OTel), LGTM, Prometheus, or similar.
  • Solid understanding of Linux/Unix environments, full stack engineering, including debugging and performance optimization from an application perspective.
  • Experience with CI/CD pipelines, version control systems, and modern development workflows.
  • Exposure to containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Experience building internal tools, platforms, or services that improve developer productivity or system reliability.
  • Strong problem-solving skills and ability to debug complex issues in distributed systems.
  • Configuration management experience with tools such as Ansible, Puppet, Chef, or SaltStack.
  • Practical understanding of SLIs, SLOs, SLAs, and error budgets as reliability engineering concepts.
  • Experience writing and maintaining runbooks, SOPs, and operational documentation to ensure knowledge continuity.

Nice to Have

~1 min read
  • Experience building platforms or SDKs for observability or monitoring.
  • Deep, hands-on expertise with cloud platforms (AWS, Azure, GCP) and cloud-native application design.
  • Familiarity with infrastructure-as-code practices or DevOps tooling.
  • Experience with capacity planning, forecasting, and cost governance for large-scale cloud infrastructure.
  • Familiarity with compliance and audit-ready operations (e.g., PCI-DSS, WebTrust).
  • Passion for mentoring junior engineers and driving a culture of reliability and continuous improvement.

Requirements

~1 min read

Location & Eligibility

Where is the job
Bulgaria
On-site within the country
Who can apply
Open to applicants worldwide
Listed under
Bulgaria

Listing Details

Posted
April 3, 2026
First seen
April 3, 2026
Last seen
June 10, 2026

Posting Health

Days active
68
Repost count
0
Trust Level
29%
Scored at
June 10, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
GoDaddy
GoDaddy
greenhouse

GoDaddy helps the world easily start, confidently grow, and successfully run an online presence.

Employees
5k+
Founded
1997
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

GoDaddySoftware Development Engineer