Founding Computational Protein Scientist (gn) @ Biotech Venture, Cambridge (UK)

United Kingdom·Cambridgemid

OtherScientist

0 views0 saves0 applied

Apply Now

Quick Summary

Overview

About DropCode DropCode is building the data engine for protein function. Starting with enzymes, we use our patented droplet microfluidics platform to capture exponentially more data on protein function than conventional methods, linking genotype to phenotype at per-droplet resolution, making every…

Key Responsibilities

We are looking for an exceptional founding computational scientist to lead our machine learning and protein modelling efforts.

Requirements Summary

Undergraduate grounding in hard science (mathematics, physics, or computer science), with that rigour subsequently applied to biological problems PhD in machine learning, deep learning, or a closely related computational discipline A track record of…

Technical Tools

deep-learningmachine-learning

DropCode is building the data engine for protein function. Starting with enzymes, we use our patented droplet microfluidics platform to capture exponentially more data on protein function than conventional methods, linking genotype to phenotype at per-droplet resolution, making every droplet a micro test tube. This data fuels machine learning models that learn in ever greater detail how sequence determines function. Our wedge is enzyme engineering for biocatalysis and industrial biotechnology, but our ambition is to make DropCode the definitive platform for protein function prediction.

We are Cambridge PhDs with deep expertise across microfluidics, biochemistry, machine learning, optics, and engineering. We believe the language of biology is machine learning, and that the fastest path to transformative models is not just better AI, it is better inputs.

We are looking for an exceptional founding computational scientist to lead our machine learning and protein modelling efforts. You will own the sequence–function modelling stack end to end: from processing large-scale functional datasets generated in our microfluidic runs, to training and deploying generative and predictive models that drive the next round of experiments. You will work in a tight loop with the biology and engineering teams, turning quantitative phenotypic data into closed-loop active learning systems that continuously improve our models.

This is a foundational role. You will be building the ML infrastructure from the ground up, and your architectural choices will shape DropCode for years.

Responsibilities

~1 min read

→
Design and train sequence–function models on deep mutational scanning datasets and high-throughput screening outputs from our microfluidics platform
→
Develop and iterate generative models (transformers, diffusion models, or equivalent) for enzyme sequence design and optimisation
→
Build closed-loop active learning pipelines that couple ML predictions with experimental design, shortening the design–build–test–learn cycle
→
Model protein fitness landscapes, including epistatic interactions, to navigate high-dimensional sequence space intelligently
→
Partner with the biology team to define the data collection strategy and ensure experimental outputs are ML-ready
→
Establish best practices for model evaluation, benchmarking, and uncertainty quantification in the context of functional prediction
→
Own and grow the computational stack as the team scales

Undergraduate grounding in hard science (mathematics, physics, or computer science), with that rigour subsequently applied to biological problems
PhD in machine learning, deep learning, or a closely related computational discipline
A track record of designing and building custom model architectures from scratch - not just fine-tuning or deploying off-the-shelf systems; ideally applied to biology, but strong work in any demanding applied domain is relevant
Demonstrated contribution to a meaningful breakthrough in protein design or sequence–function modelling
Proven hands-on experience with protein language models or generative models applied to biological sequences
Deep familiarity with deep mutational scanning, large-scale functional datasets, or comparable high-throughput data modalities
Strong understanding of fitness landscape theory and epistasis in the context of sequence optimisation
Experience building active learning or Bayesian optimisation systems that integrate ML with experimental feedback
Excitement at the prospect of working with large volumes of proprietary, quantitative functional data unavailable anywhere else
Comfortable operating in the ambiguity of early-stage R&D and motivated by the challenge of building foundational infrastructure

You are frustrated by the slow, artisanal nature of current biological engineering and believe the field needs a step-change in data scale and quality. You think quantitatively, treat every experiment as a data point for a model, and have strong opinions about what it takes to build the best protein design systems in the world. You thrive in collaborative, fast-moving environments where the pace is set by scientific urgency, not process.