Quick Summary
Key Responsibilities
agents, integrations, logs pipelines, APM/tracing (including OpenTelemetry), RUM, synthetics, dashboards, monitors, service catalogs, tagging strategies. * ServiceNow: Event Management,
Technical Tools
Other
Datadog Administration and Operations (Servicenow)
Description -
We’re seeking a Datadog administration and operations expert who will be responsible for managing our observability platform to ensure comprehensive monitoring, alerting, and performance analytics across infrastructure and applications. This role is critical for maintaining system reliability, improving incident response, and supporting DevOps and engineering teams with actionable insights. This is an exciting opportunity to get in on the ground floor implementing and scaling the tools, processes, and governance at HP.
Responsibilities
* Observability Architecture & Ownership
* Design and implement an enterprise‑grade observability strategy spanning Datadog (metrics, logs, traces/APM, synthetics, RUM, network performance, cloud cost) and integrations with ServiceNow.
* Define monitoring standards, tagging conventions, dashboards, SLOs/SLIs, and alerting policies for infra and apps (on‑prem, cloud, containers).
* Datadog Implementation & Scale
* Deploy and manage Datadog agents, integrations (AWS/Azure/GCP, Kubernetes, NGINX, DBs, messaging), and service catalog coverage.
* Build golden dashboards, standardized monitors, and runbooks for infra components (compute, storage, network), platforms (Kubernetes), and critical apps.
* ServiceNow Integration & Event Management
* Implement and optimize Datadog → ServiceNow event routing, correlation rules, deduplication, and Incident/Problem auto‑creation with enriched context.
* Maintain CI relationships in ServiceNow CMDB, drive discovery mapping, and align alerts with CI ownership and support groups.
* Enable closed‑loop remediation using IntegrationHub, workflows, and change controls; contribute to Change Advisory Board (CAB) standards.
* Reliability Engineering & Operational Excellence
* Maintain SLOs, error budgets, and escalation policies. Reduce alert noise; drive actionable, tiered alerts.
* Partner with App, Infra, SecOps, and NOC teams to improve MTTR and post‑incident reviews with telemetry‑backed corrective actions.
* Automation & IaC
* Automate provisioning of monitors, dashboards, synthetics, tags, and service owner mapping.
* Build runbooks, remediation scripts, and service workflows; integrate with CI/CD to promote consistent monitoring across environments.
* Governance, Compliance & Cost Optimization
* Implement data retention policies, access controls, RBAC, and tagging for chargeback/showback.
* Optimize Datadog usage (APM sampling, log pipelines/archives, metric volumes) while protecting critical visibility.
Preferred Education & Experience
* Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent experience.
* 5–8+ years in Infrastructure/Platform/SRE/Observability roles for enterprise environments.
* Expert hands‑on Datadog: agents, integrations, logs pipelines, APM/tracing (including OpenTelemetry), RUM, synthetics, dashboards, monitors, service catalogs, tagging strategies.
* ServiceNow: Event Management, Incident/Problem/Change, CMDB design, Discovery, integration patterns (webhooks, APIs, IntegrationHub), event correlation and enrichment.
* Strong experience across Linux/Windows/Unix (cluster and workload monitoring).
* Proficiency with scripting (Python/PowerShell/Bash), Datadog/ServiceNow APIs, and Git‑based workflows.
* Demonstrated capability to design SLOs/SLIs, reduce false positives, and measurably improve MTTR and service reliability.
* Excellent communication; able to drive standards across multiple engineering teams.
Additional Qualifications
* Experience across AWS/Azure/GCP, Kubernetes, Terraform
* Prior ownership of enterprise observability programs (>500 nodes/services; multi‑account/multi‑subscription cloud).
* Network (e.g., NPM/NTA) and database monitoring expertise (e.g. Postgres/SQL Server/Oracle/MySQL).
* Experience with message brokers (Tibco), API gateways, and distributed tracing for microservices.
* Basic experience in administering and maintaining relational and/or non-relational databases.
* Security/Compliance awareness (SOX, HIPAA, PCI), log retention/archival strategies.
* Experience with cost governance in Datadog (metrics vs. logs vs. traces), custom metrics, and sampling strategies.
* ITIL v4 Foundation, Datadog Certifications, and ServiceNow Admin/Developer certifications.
Knowledge & Skills
* Systems thinking, reliability engineering mindset, data‑driven decision making.
* Strong stakeholder collaboration (Infra, AppDev, SecOps, NOC).
* Documentation and enablement: clear runbooks, patterns, standards.
* Bias for automation, consistency, and measurable outcomes.
Job -
Software
Schedule -
Full time
Shift -
No shift premium (India)
Travel -
Relocation -
Equal Opportunity Employer (EEO) \-
HP, Inc. provides equal employment opportunity to all employees and prospective employees, without regard to race, color, religion, sex, national origin, ancestry, citizenship, sexual orientation, age, disability, or status as a protected veteran, marital status, familial status, physical or mental disability, medical condition, pregnancy, genetic predisposition or carrier status, uniformed service status, political affiliation or any other characteristic protected by applicable national, federal, state, and local law(s).
Please be assured that you will not be subject to any adverse treatment if you choose to disclose the information requested. This information is provided voluntarily. The information obtained will be kept in strict confidence.
For more information, review HP’s EEO Policy or read about your rights as an applicant under the law here: “Know Your Rights: Workplace Discrimination is Illegal"
Location & Eligibility
Where is the job
—
Location terms not specified
Listing Details
- Posted
- June 25, 2026
- First seen
- June 25, 2026
- Last seen
- June 25, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 51%
- Scored at
- June 25, 2026
Signal breakdown
freshnesssource trustcontent trustemployer trust
3 other jobs at hp
View all →Explore open roles at hp.
Similar Other jobs
View all →Senior Integration Reliability Engineer, Technical Operations
Risk Strategist, Onboarding and Compliance
Remote
Risk Operations Associate (CDMX) - User Policy Operations
Product Support Specialist
Product Support, Bridge
Product Manager, Link - Consumer
Remote
Newsletter
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
A
B
C
D
No spam. Unsubscribe at any time.