
Site Reliability Engineer (5009) (US, DC, Tampa, San Antonio) (Secret)
Quick Summary
The SMX Space and Intelligence (S&I) Business Unit (BU) is on the ground floor across the future remote sensing ecosystem for all orbital regimes (LEO, MEO, HEO, and GEO). We build, integrate,
Responsibilities
~3 min read- →Support the availability, reliability, and performance of IaaS services supporting mission systems
- →Monitor infrastructure health using metrics, logs, and alerts; respond to and resolve incidents
- →Perform root-cause analysis for infrastructure and service outages; implement corrective and preventative actions Improve system reliability through automation, standardization, and proactive engineering
- →Support capacity planning, performance analysis, and scaling of infrastructure services
- →Maintain and enhance monitoring, logging, and alerting solutions
- →Participate in incident response, on-call rotations (as required), and post-incident reviews
- →Collaborate with network, systems, platform, and application teams to resolve cross-stack issues
- →Support infrastructure lifecycle activities including upgrades, patches, and configuration changes
- →Apply security best practices and support compliance requirements in a regulated environment
- →Develop and maintain runbooks, procedures, and operational documentation
- →Contribute to CI/CD and Infrastructure-as-Code workflows supporting IaaS services
- →Participate in Agile ceremonies and operational planning activities
- →Perform other duties as assigned
Required Skills & Experience
- →Secret clearance
- →5+ years of professional experience in systems engineering, SRE, DevOps, or infrastructure operations
- →Strong experience administering Linux systems
- →Experience supporting on-prem, cloud, or hybrid infrastructure environments
- →Hands-on experience with monitoring, logging, and alerting systems
- →Strong troubleshooting skills across compute, storage, networking, and OS layers
- →Experience scripting or automating tasks using Bash, Python, or similar languages
- →Familiarity with Infrastructure as Code concepts and tooling
- →Strong verbal and written communication skills
- →Detail-oriented, self-motivated, and able to own issues through resolution
- →Ability to work on-site at the customer location
Desired Skills & Experience
- →Experience working on an IaaS or platform operations team
- →Experience with virtualization platforms (e.g., VMware vSphere)
- →Experience supporting container platforms (Kubernetes, OpenShift)
- →Experience with cloud environments (AWS, Azure, or GovCloud)
- →Familiarity with SRE concepts such as SLIs, SLOs, error budgets, and toil reduction
- →Experience with configuration management or automation tools (Ansible, Terraform)
- →Experience with CI/CD pipelines (GitLab CI, Jenkins, or similar)
- →Experience operating systems in government or secure environments
- →Experience with incident management and operational readiness reviews
Application Deadline: June 22, 2026
#CJPost
#LI-onsite
The SMX salary determination process takes into account a number of factors, including but not limited to, geographic location, Federal Government contract labor categories, relevant prior work experience, specific skills, education and certifications. At SMX, one of our Core Values is to Invest in Our People so we offer a competitive mix of compensation, learning & development opportunities, and benefits. Some key components of our robust benefits include health insurance, paid leave, and retirement.
Listing Details
- Posted
- March 30, 2026
- First seen
- March 25, 2026
- Last seen
- April 12, 2026
Posting Health
- Days active
- 17
- Repost count
- 0
- Trust Level
- 52%
- Scored at
- April 12, 2026
Signal breakdown
Please let Space know you found this job on Jobera.
4 other jobs at Space
View all →Explore open roles at Space.
Similar Site Reliability Engineer jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.