Staff AI Inference and Acceleration Engineer

United States·San Joselead

OtherEngineer

1 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

Own the on-board inference architecture — mapping models to available accelerators (NPU, GPU, DSP, CPU) based on latency, power, and memory budgets.

Technical Tools

OtherEngineer

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.

We are looking for a Staff AI Inference & Acceleration Engineer to join the Platform Software team and own the on-board inference architecture for Figure’s humanoid robots. You will be the technical authority on how AI workloads are mapped, optimized, and executed across the robot’s compute hardware — driving down power consumption and cost while meeting the strict latency and reliability demands of a real-time autonomous system.

Responsibilities

~1 min read

→Own the on-board inference architecture — mapping models to available accelerators (NPU, GPU, DSP, CPU) based on latency, power, and memory budgets.
→Partition inference workloads across heterogeneous compute resources, balancing real-time performance with power and thermal constraints.
→Define and maintain a system-level compute budget across all inference tasks running on the robot.
→Evaluate next-generation acceleration hardware and contribute to the definition of future compute platform requirements.
→Optimize inference toolchains end-to-end — from model export through runtime execution — for target hardware.
→Apply quantization (INT8, INT4, mixed-precision), pruning, operator fusion, and other compression techniques to reduce compute, memory, and power footprint.
→Profile inference pipelines to identify and eliminate bottlenecks in latency, memory bandwidth, and power consumption.
→Optimize kernel scheduling, memory layout, and data movement across the compute hierarchy.
→Partner closely with the AI/ML team to define model architecture constraints that are hardware-friendly from the outset.
→Work with the Platform Software team on runtime integration, scheduling, and power management.
→Engage with silicon vendors and research teams to track the accelerator landscape and influence hardware roadmaps.

Requirements

~2 min read

M.S. or Ph.D. in Computer Engineering, Electrical Engineering, Computer Science, or a related field — or equivalent industry experience.
At least 8 years of industry experience in hardware acceleration, ML systems, or compute architecture.
Deep understanding of AI/ML inference — model formats (ONNX, TFLite, etc.), inference runtimes, and deployment pipelines.
Hands-on experience optimizing models for edge or embedded hardware using quantization, pruning, and operator-level tuning.
Strong understanding of computer architecture — memory hierarchies, data movement, and heterogeneous compute.
Experience profiling and benchmarking inference workloads across CPU, GPU, NPU, DSP.
Familiarity with low-level toolchains and compilation frameworks (e.g. TVM, MLIR, TensorRT, Torch, SNPE/QNN, JAX, CUDA, ROCm).
Solid software engineering skills in C++ and Python.
Strong cross-functional communication skills — able to work effectively across hardware, software, and AI/ML teams.

Knowledge of real-time operating constraints and their impact on inference scheduling.
Track record of co-designing model architectures with ML teams to meet hardware constraints.

The US base salary range for this full-time position is between $180,000 - $275,000 annually.

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.