Kobie
Kobie29d ago

Lead Observability Engineer

BangaloreFull-Timelead
Observability EngineerInfrastructure & Cloud
0 views0 saves0 applied

Quick Summary

Overview

Join our India Tech Hub – Be among the first hires! Kobie, a 35-year veteran of the loyalty industry, a multi-year Forrester Leader,

Technical Tools
Observability EngineerInfrastructure & Cloud
Join our India Tech Hub – Be among the first hires! 

Kobie, a 35-year veteran of the loyalty industry, a multi-year Forrester Leader, and USA Top Workplace is expanding its global footprint by establishing a Tech Hub in India. Kobie partners with global brands to build deep connections with their customers through personalized, data-driven loyalty experiences and has a mission of growing enterprise value through loyalty. The Tech Hub will serve as a Global Capabilities Center for a broad range of technology roles, and this is your chance to play a pivotal role in shaping our presence in India. Join us as we continue to lead in loyalty, delivering innovative customer experiences for some of the world’s most recognized brands while working alongside some of the best and brightest in loyalty. 

About the Team and What We’ll Build Together

You are a Lead Observability Engineer who will drive the strategy, adoption, and evolution of observability across all production and delivery environments. You will play a critical role in ensuring system reliability, performance visibility, and proactive issue resolution across our platforms.

You will operate at the intersection of Engineering, DevOps, and Production Support, bringing structure, standardization, and intelligence to how we monitor and manage systems. You will lead the shift from reactive operations to proactive, AI-driven observability and automated reliability.

In this role, you will:

  • Own and evolve the observability platform (e.g., New Relic) to provide end-to-end visibility across applications and infrastructure
  • Establish standards for monitoring, alerting, dashboards, and telemetry (logs, metrics, traces)
  • Leverage AIOps capabilities to improve anomaly detection, reduce noise, and accelerate root cause analysis
  • Drive automation and self-healing workflows to minimize manual intervention and improve system resilience
  • Collaborate across teams to ensure systems are observable by design and aligned with reliability goals
  • Continuously analyze system behavior and incident patterns to improve performance, scalability, and uptime

You will be part of a team focused on building a highly reliable, data-driven, and scalable operational ecosystem, where observability is a core foundation for engineering excellence.

Lead the observability strategy and execution, ensuring comprehensive visibility across all production and delivery environments.

· Own and govern the enterprise observability platform (New Relic or equivalent tools such as Datadog or Dynatrace) and ensure consistent monitoring standards across systems.

· Explore and adopt AI-driven monitoring capabilities (AIOps) to automate anomaly detection, reduce alert fatigue, and enable predictive problem management.

· Collaborate closely with Production Support (L1/L2), DevOps, CloudOps, Software Engineering, and Database teams to triage complex production issues and accelerate incident resolution.

· Act as the operational coordinator during service-impacting events, organizing workflows, managing cross-team dependencies, and providing structured updates to leadership.

· Design and implement automated remediation workflows and self-healing mechanisms for recurring incidents.

· Analyze telemetry data (logs, metrics, traces) to identify incident patterns and systemic anomalies, and continuously refine alert thresholds and routing logic.

· Develop and maintain dynamic dashboards that reflect real-time system health, application performance, and infrastructure behavior.

· Define and track reliability metrics such as SLOs, SLIs, MTTD, and MTTR to improve service reliability.

· Ensure clear, timely communication with stakeholders during incidents and operational events.

· Drive organization-wide adoption of observability best practices through documentation, training, and knowledge sharing.

8–10+ years of experience in observability, site reliability engineering (SRE), DevOps, or advanced production operations in large-scale enterprise environments.

· Expert-level hands-on experience implementing and optimizing observability platforms such as New Relic, Datadog, Dynatrace, or Splunk.

· Strong understanding of monitoring fundamentals including logs, metrics, traces, and alerting strategies.

· Experience working with cloud-native architectures (AWS preferred).

· Familiarity with containerized environments and orchestration platforms such as Kubernetes.

· Experience integrating observability practices into CI/CD pipelines to ensure applications are observable by design.

· Strong understanding of incident management, problem management, and change management practices (ITIL concepts).

· Demonstrated ability to analyze telemetry data to identify patterns, detect anomalies, and improve operational reliability.

· Strong leadership and collaboration skills with the ability to coordinate across engineering, DevOps, and operations teams.

· Excellent communication skills and a strong focus on operational excellence and continuous improvement.

Nice to Have

· Experience implementing AI/ML capabilities within observability tools for anomaly detection and predictive monitoring.

· Familiarity with AIOps platforms and automated remediation workflows.

· Experience with event streaming platforms such as Kafka for telemetry ingestion or real-time data processing.

· Basic understanding of application architecture and troubleshooting distributed systems.

· Experience with automation frameworks or serverless workflows (e.g., AWS Lambda, scripting, or infrastructure automation).

Listing Details

Posted
March 24, 2026
First seen
March 26, 2026
Last seen
April 22, 2026

Posting Health

Days active
27
Repost count
0
Trust Level
33%
Scored at
April 22, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Kobie
Kobie
lever
Employees
350
Founded
1990
Domain
kobie.com
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

KobieLead Observability Engineer