My Research

While my current work is at the intersection of computational biology, statistics and machine learning, I have also spent some time doing research in theoretical physics.

Biology

Together with Mafalda Dias, I lead a research group at the Centre for Genomic Regulation, in Barcelona. Our research focuses on developing generative models of the effect genetic variation, with an emphasis on identifying variation in humans that impact the risk of disease. Our goals are to develop methods which will enable us to effectively utilise whole genome sequencing for diagnosis and preventative care and to maximise the bennefit to human health of sequencing all life on Earth. I am fascinated by how much we can learn about human health from the diversity of life on Earth and the complementary information which can be extracted from genetic variation within and across species. While it has long been appreciated that looking at conservation patterns in the genetic sequences from diverse organisms can inform us about disease, my view is that we have barely scratched the surface of what this genetic diversity can teach us about disease. More information about our group can be found here.

Cosmology and String Theory

Prior to my research in biology most of my work was on inflation. This is a theory of the very early universe involving a period of accelerated expansion, where the quantum fluctuations of one or more scalar fields gives rise to an initial distribution of gravitational wells. Matter sinking into these wells is thought to be the seeding process for the formation of all structure we observe today (galaxies, stars, people etc.). In my view, the most exciting consequence of this is that we can use the statistics of the distribution of matter in the universe to learn about particle physics at extremely high energies. Possibly much higher than we will ever be able to probe with terrestrial experiments. Indeed, cosmology may be one of our best hopes for testing string theory. As such, I enjoyed sitting between the string theory and cosmology communities, developing methods for computing predictions from string theory embeddings of inflation.

Publications

To see my publication record please check out my Google Scholar profile.

Software and Data

ProteinGym ProteinGym is an extensive set of Deep Mutational Scanning (DMS) assays curated to assess the ability of mutation effect predictors to predict the fitness of mutated proteins.

EVE predictions: Predictions for the clinicial significance of 36 million variants across 3,219 disease genes, using variational autoencoders trained on protein sequences from diverse organisms evemodel.org.

EVE code: Code repository for the paper “Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning”.

The Transport Method: The transport method is a technique for computing the two-point and three-point correlation functions of fluctuations generated in an inflationary epoch during the early universe. Our site transportmethod.com collects a number of tools written in Mathematica, C++ and Python.

MULTIMODECODE: MultiModeCode is a Fortran 95/2000 package for the numerical exploration of multifield inflation models, available from modecode.org.