Research Scientist: Pretraining
Quick Summary
About the Role You will build the base intelligence layer for robotics. We train large-scale robot foundation models from massive multimodal datasets spanning video, proprioception, action traces, language, and more.
About the Role
~1 min readYou will build the base intelligence layer for robotics. We train large-scale robot foundation models from massive multimodal datasets spanning video, proprioception, action traces, language, and more. You will design and run the core large-scale training efforts that give our models fundamentally new general capabilities across embodiments, tasks, and environments. You will “live and breathe” all forms of robot data.
Designing and executing large-scale pretraining runs for robot foundation models (transformer- and diffusion-based architectures)
Defining model architectures, objectives, and training curricula across multimodal robotic data (vision, action, state, language)
Developing scalable data mixtures and sampling strategies across petabyte-scale datasets
Guiding data collection operations towards new directions, as well as sourcing new datasets
Running ablations to understand scaling laws, data quality effects, and architecture tradeoffs
Collaborating closely with ML Infra and Systems to push cluster utilization, throughput, and reliability
Turning raw robotic interaction data into generalizable model capabilities
Have deep experience training large transformer or diffusion models at scale (for generative models e.g. including language models, audio models, or video models)
Have led or significantly contributed to multi-node, multi-GPU distributed training efforts
Have worked on scaling laws, optimization dynamics, and large-model failure modes
Have strong PyTorch fundamentals and comfort debugging at every layer of the stack
Care about both empirical rigor and raw iteration speed
Are excited about building general-purpose robot intelligence from first principles
At Generalist, we are on a mission to make general-purpose robots a reality. We believe the industries and homes of the future will depend on humans and machines working together in new ways. Robots can help us build more and get more done.
We build embodied foundation models, starting with a focus on dexterity. This requires advancing the frontiers of data, models, and hardware, to enable robots to intelligently interact with the physical world.
The company embraces both large-scale AI and robotics as core to its DNA. Our team of researchers, roboticists, and company builders come from OpenAI, Boston Dynamics, Google DeepMind, and other frontier labs—with a track record of shipping AI breakthroughs. Before Generalist, we pioneered large embodied multimodal models and vision-language-action models (PaLM-E, RT-2, Gemini Robotics), launched and scaled ChatGPT and GPT-4 to hundreds of millions of users, engineered the foundations of autonomous driving, built next-generation robots (Atlas, Spot, Stretch) and pushed the limits of what they can do (from parkour to manipulation, and testing robustness).
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
Location & Eligibility
Listing Details
- Posted
- February 12, 2026
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 14%
- Scored at
- May 6, 2026
Signal breakdown
Please let generalist know you found this job on Jobera.
4 other jobs at generalist
View all →Explore open roles at generalist.
Similar Data Scientist jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.