Zoox
Zoox18d ago
USD 242000–290000/yr

Senior AI Inference Engineer - Model Optimization & Deployment

Data ScienceOtherEngineerDeployment
0 views0 saves0 applied

Quick Summary

Overview

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Model Optimization & Deployment Engineer,

Technical Tools
Data ScienceOtherEngineerDeployment
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.
  • Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference workflows, and parameter-efficient fine-tuning (LoRA, QLoRA).
  • Architect and implement model conversion and compilation pipelines using TensorRT and TensorRT-LLM for edge deployment.

  • Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.

  • Write and optimize custom CUDA kernels and TensorRT Plugins to maximize memory bandwidth and minimize latency on AI accelerators.

  • Write production-level, highly concurrent, and memory-safe C++ and Python code for real-time inference on vehicle SOCs.


  • Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference workflows (INT8, FP8, INT4, BF16/FP16).

  • Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) utilizing KV-cache optimization (e.g., PagedAttention), Speculative Decoding, and Efficient Attention mechanisms (FlashAttention, Linear Attention).

  • Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and performing rigorous parity/latency benchmarking.

  • Proficiency in low-level programming for AI accelerators, specifically writing and optimizing custom CUDA kernels and TensorRT Plugins.

  • Production-level C++ (14/17/20) and Python programming skills, with experience writing concurrent, memory-safe, real-time inference code for edge devices.
  • Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.

  • Familiarity with autonomous driving perception stacks (temporal 3D object detection, BEV, 3D Occupancy Networks) and processing multi-modal sensor streams (Vision, LiDAR, Radar).

  • Understanding of end-to-end autonomous driving paradigms (VLA models, closed-loop simulation validation).
  • Location & Eligibility

    Where is the job
    Foster City, United States
    Hybrid — some on-site time required
    Who can apply
    US
    Listed under
    United States

    Listing Details

    Posted
    April 11, 2026
    First seen
    April 11, 2026
    Last seen
    April 29, 2026

    Posting Health

    Days active
    18
    Repost count
    0
    Trust Level
    49%
    Scored at
    April 29, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Zoox
    Zoox
    lever

    Zoox, a subsidiary of Amazon, designs fully autonomous vehicles focusing on making urban transportation safer and more efficient.

    Employees
    3k+
    Founded
    2014
    Domain
    zoox.com
    View company profile
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    ZooxSenior AI Inference Engineer - Model Optimization & DeploymentUSD 242000–290000