cuspai1mo ago

Data Engineer

United Kingdom·Londonfull-timemid

Data EngineerData

3 views0 saves0 applied

Apply Now

Quick Summary

Overview

About CuspAI CuspAI is the frontier AI company on a mission to solve the breakthrough materials needed to power human progress. While nature took billions of years to perfect molecules, we are harnessing AI to unlock trillion-dollar materials breakthroughs in months, not millennia.

Key Responsibilities

As we grow, we are seeking a Data Engineer to play a crucial part in driving our research and development efforts forward. As a Data Engineer you will be part of the new team building the infrastructure that underpins and acts as the critical bridge…

Technical Tools

airflowdockerkubernetespythonci-cdetlmachine-learning

CuspAI is the frontier AI company on a mission to solve the breakthrough materials needed to power human progress. While nature took billions of years to perfect molecules, we are harnessing AI to unlock trillion-dollar materials breakthroughs in months, not millennia. Our founding team is the most cited in the world, comprised of world-class researchers in AI, chemistry and engineering.

We are working on some of the hardest and most important challenges including energy, clean water, the future of compute, and carbon capture, and this is just the start of what our 'search engine' for next-generation materials will unlock.

We invite you to be part of a diverse, innovative team at the intersection of AI and materials science, working to create impactful partnerships that drive innovation, scalability, and industry collaboration. This work matters. Your work matters.

We’re on the cusp of the on-demand materials era. Join us.

As we grow, we are seeking a Data Engineer to play a crucial part in driving our research and development efforts forward.

As a Data Engineer you will be part of the new team building the infrastructure that underpins and acts as the critical bridge between raw chemical data and our machine learning models.

Responsibilities

~1 min read

→
Design and build robust data pipelines for materials science datasets, experimental results, and computational chemistry outputs.
→
Develop processes to integrate diverse data sources including materials databases, literature, patent filings, and laboratory instruments.
→
Create automated workflows for processing crystallographic data, molecular structures, and materials properties (you don’t need to have direct domain experience - we can help bring you up to speed!).
→
Build scalable systems to handle high-throughput computational chemistry calculations and experimental data.

Partner closely with the scientific and research teams to implement automated quality checks for crystal structure data, chemical compositions, and experimental measurements.
Create standardisation protocols for materials nomenclature, units, and measurement conditions.
Build monitoring systems to ensure data integrity across all pipelines.

You will also be working hand in hand with ML researchers to understand data requirements for model training and inference.
Partner with materials scientists to ensure accurate representation of domain knowledge in data schemas.
Integrate with laboratory automation systems and computational chemistry software.
Support real-time data needs for AI-driven materials discovery experiments.

Requirements

~1 min read

You are someone who gets excited about the opportunity to enable scientists to work on world changing challenges in this domain, with a personal interest in the potential applications of the technology that Cusp is building.
You’re a builder of tools and infrastructure who enjoys making life as easy as possible for the teams, providing self-serve, reliable and scalable ingestion pipelines.
You have at least 3+ years experience in data engineering roles, preferably in scientific or research environments - you would be joining as a data engineering subject matter expert who can not only work autonomously but also provide guidance on best practice.
High level of proficiency in Python and databases with experience in large-scale data processing - as part of our engineering team you’ll be programming regularly, not just scripting.
You’re an advanced user of workflow orchestration tools (e.g. Airflow, Prefect, Dagster, Flyte or similar).
Solid experience with containerisation (Docker, Kubernetes) and CI/CD practices.

You have direct experience handling large/complex datasets and are interested in working with scientific packages.
You’re a fast learner when it comes to new tools/systems.
You enjoy (and have experience in) designing systems that scale with growing data volumes and user demands.
Understanding and appreciation of DevOps practices is also important.

Nice to Have

~1 min read

You’ve worked with data from scientific computing (simulations or experiments).
Knowledge of machine learning data requirements and MLOps practices, including pre-processing/processing as part of model training.
An academic background in Materials Science, Chemistry, Chemical Engineering, or related field.

Even more bonus points if you have an understanding of crystallography, materials properties, and computational chemistry concepts!