Theoretical physicist learning from the large scale structure of the universe and the large scale structure of the genome.

While my current work is at the boarder between computational biology and statistics, I have also spent considerable time doing research in theoretical physics. I recently noticed there is a theme emerging common to much of my work both present and past, which is developing probabilistic models capable of making robust predictions. In my work in cosmology and string theory, this appeared in the form of using universality results, such as those found in random matrix theory, to make robust predictions for cosmological observables despite significant holes in our knowledge of the underlying theory (string theory). In my present work, this interest seems to be resurfacing once again but now in the form of robustness to model misspecification and Bayesian inference more generally.

I’m currently a postdoctoral researcher in Debbie Marks Lab, in the Systems Biology department of Harvard Medical School. My work centers on developing probabilistic models of biological sequences. A fundamental question in biology is how to extract the key properties of a biological sequence that determine its function. If we understand how changing amino acids affects the function of a protein *in vivo*, we could learn answers to central questions in both basic and applied biological research. This includes making *protein design* a reality for a slew of applications, not least in accelerating biotherapeutic development. There are endless ways to change the sequence of a protein but our ability to identify the few that optimize a desired function currently relies on blindly testing each possibility in turn. Statistical models of biological sequences are fast becoming powerful tools in overcoming this problem. I am particularly interested in the complementary information we can glean from modern experiments and statistical models built from databases of natural biological sequences – the end result of billions of years of real-life experiments. The key challenge in building these models is understanding how to extract the relevant information from evolution for a particular problem, such as knowing how to a modify a given protein to gain new properties without spoiling its original properties.

Prior to my research in biology most of my work was on inflation. This is a theory of the very early universe involving a period of accelerated expansion, where the quantum fluctuations of one or more scalar fields gives rise to an initial distribution of gravitational wells. Matter sinking into these wells is thought to be the seeding process for the formation of all structure we observe today (galaxies, stars, people etc.). In my view, the most exciting consequence of this is that we can use the statistics of the distribution of matter in the universe to learn about particle physics at extremely high energies. Possibly much higher than we will ever be able to probe with terrestrial experiments. Indeed, cosmology may be one of our best hopes for testing string theory. As such, I enjoyed sitting between the string theory and cosmology communities, developing methods for computing predictions from string theory embeddings of inflation.

To see my publication record please check out my INSPIRE profile or my Google Scholar profile.

**The Transport Method:**
The transport method is a technique for computing the two-point and three-point correlation functions of fluctuations generated in an inflationary epoch during the early universe. Our site transportmethod.com collects a number of tools written in Mathematica, C++ and Python.

**MULTIMODECODE:**
MultiModeCode is a Fortran 95/2000 package for the numerical exploration of multifield inflation models, available from modecode.org.

I am trying to get into the habit of posting my slides from previous presentations on GitHub’s Speaker Deck. My “Decks” can be found here.