Nebius~18d ago
Senior HPC Cluster Engineer
OtherEngineer
2 views0 saves0 applied
Quick Summary
Overview
Why work at Nebius Nebius is leading a new era in cloud computing to serve the global AI economy.
Technical Tools
OtherEngineer
What We Offer
~1 min read✓Tuning the performance of GPU clusters and InfiniBand networks to ensure optimal operation in HPC and GPU-based environments.
✓Analyzing and troubleshooting the root cause of issues related to GPUs and InfiniBand networks, and proposing corrective actions.
✓Integrating new hardware into the existing infrastructure, including support for new GPU hardware through software stacks like Kubernetes, QEMU, and KVM.
✓Enhancing automation systems for proactive monitoring, detecting, and resolving issues in GPU and InfiniBand environments.
✓Configuring and managing GPU devices and InfiniBand fabrics, ensuring efficient and reliable operation.
- 5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming).
- 3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning).
- In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems.
- Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).
Nice to Have
~1 min read- Experience with GPU end-to-end testing in a cluster environment using InfiniBand networking.
- Proven track record of analyzing and optimizing the performance of HPC workloads (e.g., simulations, data analysis, AI/ML workloads).
- Familiarity with RDMA, RoCE, and InfiniBand protocols for high-performance communication.
- Background in Software-Defined Networking (SDN) and experience with HPC cluster networking.
- Understanding of QEMU/KVM virtualization and managing virtualized environments.
- Experience with deep learning frameworks such as PyTorch and TensorFlow, and their integration with HPC systems.
- Familiarity with collective communication libraries like MPI and NCCL for distributed computing.
We offer competitive salaries ranging from $170k-$300k + equity based on your experience.
We conduct coding interviews as part of the process.
What We Offer
~1 min read✓Competitive salary and comprehensive benefits package.
✓Opportunities for professional growth within Nebius.
✓Flexible working arrangements.
✓A dynamic and collaborative work environment that values initiative and innovation.
Location & Eligibility
Where is the job
United States
Remote within one country
Who can apply
US
Listed under
United States
Listing Details
- First seen
- April 9, 2026
- Last seen
- April 28, 2026
Posting Health
- Days active
- 18
- Repost count
- 0
- Trust Level
- 44%
- Scored at
- April 28, 2026
Signal breakdown
freshnesssource trustcontent trustemployer trust
Nebius
greenhouse
Nebius is a cutting-edge AI cloud platform that offers scalable infrastructure for developing and deploying AI solutions.
View company profileExternal application · ~5 min on Nebius's site
Please let Nebius know you found this job on Jobera.
3 other jobs at Nebius
View all →Explore open roles at Nebius.
Similar Engineer jobs
View all →iOS Engineer, Mobile
Remote
Water-Wastewater Engineer
USD 80000–122450
Full Time (40)
Lead Engineer- dotnet
Software Engineer - Motion Planning (Fallback Stack)
Software Engineer – Motion Planning and Control
$149k–$245k/yr
Senior Engineer - HD Maps
Browse Similar Jobs
Manager2.6kFitness & Wellness2.1kData Collector1.9kAssistant Manager1.7kDirector1.5kAssociate1.3kConsultant1.1kBehavioral Health1.1kSocial Work & Counseling1kSocial Worker967Assistant940Social777Technician674Analyst645Operations Associate553Coordinator545Psychiatric Mental Health Nurse Practitioner493Staff Engineer471Development470Human Resources (legacy human-resources)415
Newsletter
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
A
B
C
D
No spam. Unsubscribe at any time.