AI Platform Engineer, Training and Inference
Quick Summary
manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store,
Responsibilities
~1 min read
• Experience in ML engineering with time in an ML platform or MLOps role
• Production Ray depth: Ray Train, Serve, Core, and Data — debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
• LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton — PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
• Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
• RL working knowledge: PPO, policy gradient, or RLHF — able to translate an algorithm into distributed compute primitives
• Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation
• Vector databases: Pgvector or Qdrant — ANN index strategies, embedding upsert, and query latency tuning under inference load
• Strong Python and PyTorch; Flyte or equivalent ML orchestrator
• Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
• Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience
We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs.
You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.
Location & Eligibility
Listing Details
- Posted
- May 18, 2026
- First seen
- May 18, 2026
- Last seen
- May 18, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 70%
- Scored at
- May 18, 2026
Signal breakdown

Saviynt is a leading provider of cloud-native identity and governance platform solutions, empowering enterprises to secure their digital transformation, safeguard critical assets, and meet regulatory compliance.
View company profilePlease let Saviynt know you found this job on Jobera.
3 other jobs at Saviynt
View all →Explore open roles at Saviynt.
Similar Ai Platform Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.