Mootha Lab – Mitochondrial Proteomes & Pathway Mapping
Summer 2023 - Present
Over the past two summers, I have conducted computational biology research with the Mootha Lab at the Broad Institute of MIT and Harvard and Massachusetts General Hospital (~40 hours per week, 10 weeks per summer). I approached the lab in the spring of my sophomore year hoping to leverage my coding skills in a scientific setting and gain academic research experience. The lab is renowned for research into mitochondria, most recently the mitochondrial tree of life (MitoTOL) project which mapped mitochondrial proteomes of six eukaryotes. Professor Mootha connected me with a MD-PhD student seeking to leverage these newly mapped mitochondrial proteomes through computational approaches to identify drug targets, characterize novel proteins and their pathway assignments, and infer the proteome of the last eukaryotic common ancestor (LECA).
My specific contributions are described in the "Coevolutionary analysis of mitochondrial pathways and functions" section of the paper. I focused on investigating the parasites' proteomes using CLIME, a phylogenetic profiling method that groups proteins into evolutionarily conserved modules (ECMs). To better capture distant evolutionary relationships, we ran CLIME on phylogenetically-resolved orthogroups (PhROGs), sets of homologous proteins designed to retrieve remote homologs with additional phylogenetic resolution, instead of analyzing proteins individually. Unlike the MitoTOL proteomes, the human mitochondrial proteome is already very well studied: of the 854 orthogroups containing a human mitochondrial protein, 93% have a Pfam domain common to most members, and 77% are found in at least one other experimentally-defined mitochondrial proteome. Therefore we could leverage pathway mappings of human proteins to assign pathways and characterize novel MitoTOL species mitochondrial proteins.
I created a statistical method that overlaid the human mitochondrial pathways onto ECMs using the existing curated human MitoPathways dataset. I developed and implemented the pathway-mapping system in R Studio, writing custom scripts to process large protein datasets and optimize pathway assignment. After benchmarking against known human pathways, I decided to use F-scoring – the harmonic mean of protein recall and precision for a given pathway – for pathway assignment. My system was effective as evidenced by its ability to assign pathways that are present and relatively well-studied across all six novel species, such as iron-sulfur cluster biogenesis and TCA cycle (and others, Figure 4 B-K).
The pathway mapping system paired with CLIME results has immense potential to characterize proteins in eukaryotes which have limited Pfam domain information. The combination of CLIME and my pathway-mapping method is particularly powerful for identifying proteins that lack human orthologs but cluster within ECMs strongly associated with specific pathways. Such uncharacterized proteins can now be confidently predicted to participate in those pathways, providing valuable functional insights into poorly understood mitochondrial systems.
The lab continues to study these pathway-mapped CLIME results to find new biology for species-specific papers. I packaged the code to support flexible inputs, allowing users to incorporate alternative pathway mappings – such as non-human datasets – or to run new CLIME iterations on individual species proteomes.
I continue working on the project through final analyses and preparation for publication; it will be submitted to Cell this month.