Lead Research Engineer
Quick Summary
Graph-level (e.g., operator fusion, kernel scheduling, memory planning) Kernel-level (CUDA, Triton, custom operators for specialized hardware) System-level (distributed training across GPUs/TPUs,
Graph-level (e.g., operator fusion, kernel scheduling, memory planning) Kernel-level (CUDA, Triton, custom operators for specialized hardware) System-level (distributed training across GPUs/TPUs,
Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.
Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.
We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
-
Move Fast: We act with speed and precision, breaking down big challenges into achievable steps.
-
Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision.
-
Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best.
-
Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft.
-
Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters.
We are seeking a highly skilled Lead Research Engineer to work on optimizing training and inference workloads on compute accelerators and clusters, through the Lightning Thunder compiler and the broader PyTorch Lightning ecosystem. This role sits at the intersection of deep learning research, compiler development, and large-scale system optimization. You’ll be shaping technology that pushes the boundaries of model performance and efficiency, creating foundational software that will impact the entire machine learning ecosystem.
You will be joining the Engineering Team and report to our Tech Lead. This is a hybrid role based in our New York City, San Francisco, or London office, with an in-office requirement of two days per week. The salary range for this role is $225,000-$275,000.
Responsibilities
~1 min read- →Develop performance-oriented model optimizations at multiple levels:
- →Graph-level (e.g., operator fusion, kernel scheduling, memory planning)
- →Kernel-level (CUDA, Triton, custom operators for specialized hardware)
- →System-level (distributed training across GPUs/TPUs, inference serving at scale)
- →Advance the Thunder compiler by building optimization passes, graph transformations, and integration hooks to accelerate training and inference workloads.
- →Work across the software stack to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with PyTorch Lightning.
Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies. - →Collaborate with hardware vendors and ecosystem partners to ensure Thunder runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
- →Contribute to open-source projects by developing new features, improving documentation, and supporting community adoption.
- →Engage with researchers and engineers in the community, providing guidance on performance tuning and advocating for Thunder as the go-to optimization layer in ML workflows.
- →Work cross-functionally with Lightning’s product and engineering teams to ensure compiler and optimization improvements align with the broader product vision.
- Strong expertise with deep learning frameworks such as PyTorch
- Hands-on experience with model optimization techniques, including graph-level optimizations, quantization, pruning, mixed precision, or memory-efficient training.
- Knowledge of distributed systems and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling).
- Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems.
- Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors.
- Bachelor’s degree in Computer Science, Engineering
- Experience with CUDA, Triton, or other GPU programming models for developing custom kernels.
- Deep understanding of deep learning compiler internals (IR design, operator fusion, scheduling, optimization passes) or proven work in performance-critical software.
- Proven track record contributing to open-source projects in ML, HPC, or compiler domains.
- Advanced degree (Master’s or PhD) in machine learning, compilers, or systems highly preferred.
What We Offer
~1 min readListing Details
- Posted
- April 17, 2026
- First seen
- March 26, 2026
- Last seen
- April 17, 2026
Posting Health
- Days active
- 21
- Repost count
- 0
- Trust Level
- 83%
- Scored at
- April 17, 2026
Signal breakdown
Please let Lightningai know you found this job on Jobera.
4 other jobs at Lightningai
View all →Explore open roles at Lightningai.
Similar Lead Research Engineer jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.