GPU Cluster Architect
Quick Summary
Cluster Design : Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes. Performance Modeling : Analyze AI/ML workloads (e.
Cluster Design : Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes. Performance Modeling : Analyze AI/ML w
Responsibilities
~1 min read- →Cluster Design: Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes.
- →Performance Modeling: Analyze AI/ML workloads (e.g. LLM training, inference) to inform design tradeoffs across latency, bandwidth, and GPU density.
- →Network Architecture: Align with network architect relevant design and validate low-latency, high-throughput interconnects (e.g., InfiniBand HDR/NDR, RoCEv2) at POD and DC scale.
- →Storage Integration: Work with storage teams to optimize performance for training datasets, checkpointing, and others.
- →Reliability & Monitoring: Understand and analyze signal from monitoring systems to the detect flows in design
- →Collaboration: Partner with site reliability, networking, storage, and DC engineering teams to operationalize and scale your architecture.
- 5+ years of experience designing clusters.
- Deep understanding of modern GPU architecture (NVIDIA, AMD, etc.).
- Experience with HPC interconnects (InfiniBand & RoCE).
- Solid background in systems architecture, networking, and hardware reliability.
- Experience in scripting for automation and telemetry pipelines (Python, Go, etc.)
What We Offer
~1 min readWhat We Offer
~1 min readWe offer competitive salaries ranging from $184K to $318K OTE, which includes base salary and performance bonus. Equity in the form of RSUs may be available at certain salary grades.
What We Offer
~1 min readWhat We Offer
~1 min readListing Details
- First seen
- April 3, 2026
- Last seen
- April 26, 2026
Posting Health
- Days active
- 23
- Repost count
- 0
- Trust Level
- 51%
- Scored at
- April 26, 2026
Signal breakdown
Nebius is a cutting-edge AI cloud platform that offers scalable infrastructure for developing and deploying AI solutions.
View company profilePlease let Nebius know you found this job on Jobera.
4 other jobs at Nebius
View all →Explore open roles at Nebius.
Similar Architect jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.