Research Interests

Machine Learning in Structural Bioinformatics

We have been exploring the application of machine learning to prediction problems in structural biology, particularly the interactions of proteins with other proteins and small ligands.

Machine Learning in Phylogenetics

Probabilistic models of protein evolution generally assume independent amino acid substitution rates across sites, despite experimental and computational evidence suggesting that physical interactions introduce dependences between sites. We recently introduced a new model of protein evolution based on probabilistic graphical models that accounts for site-site correlations. Significantly, phylogenetic likelihoods can be efficiently calculated in this model using approximate inference methods, like Belief Propagation. When tested on sequence data for a number of protein families, the new model was found to fit the data better than traditional site-independent rate matrix models. Interestingly, the model also supported the significance of amino acid interactions across protein-protein interfaces in determining the evolutionary history for a family of multimeric enzymes. One potential application is as an improved null model for detection of evolutionary selection, which could aid in detecting disease-associated single nucleotide variants.

Clinical Informatics

Graft versus host disease is a potentially serious complication following transplant surgery. We are working with an indiscplinary team comprised of transplant surgeons and bioinformaticians to develop machine learning algorithms that predict transplant outcomes based on diverse data including degree of donor-patient immune system match (HLA typing), demographics, previous disease history, and pre-operation lab results. Preliminary analyses on a large-scale clinical study of liver transplant patients revealed which factors were most closely correlated with graft rejection and demonstrated that the prediction approach is accurate enough to guide clinical decisions in the future.

Peptide-Protein docking

The binding of protein fragments to class I and II MHC molecules is an essential step in immune surveillance by the adaptive immune system. The peptide-MHC complexes are exposed on the cell surface where they can activate a T-cell immune response by binding to T-cell receptors and associated co-receptors. Both classes of MHC are highly polymorphic, with hundreds of alleles in humans. Each MHC type generally binds a different set of peptides so that the number of peptide-MHC combinations is enormous. Knowledge of which peptides bind to MHCs has potential applications in vaccine discovery and understanding autoimmune disorders.

Sequence-based computational methods can rapidly predict which peptides bind a particular MHC type, however they require large amounts of experimental data, which is expensive to obtain and unavailable for most MHC types. Structure-based computational methods are slower but potentially more general because they predict binding affinities based on universal physical or statistical properties of the modeled peptide-MHC complex and thus are not limited to a single MHC type. We have previously demonstrated that all-atom docking of peptides to class I MHC yields accurate structures, which can be used for predicting the binding affinities. We also found that fitting the prediction model to experimental data for one MHC type gives comparable accuracy for prediction of peptide binding affinities for a different MHC type. This shows that the method can generalize to different MHC types for which sufficient experimental data may be unavailable.

Membrane Proteins

An estimated 25-30% of the proteins in a variety of organisms span a lipid membrane. Knowledge of such proteins in humans is important for drug discovery since ~40% of all current drugs target membrane proteins. Most of these drug targets are G protein-coupled receptors (GPCRs).

Computational methods are useful for modeling GPCR structures because only a few high-resolution experimental structures are available. We are working closely with experimental collaborators in order understand the function of GPCRs at the structural level. We are particularly interested in GPCR dimerization, which recent experiments show occurs for many GPCRs.