Project no.3

Functional predictions

One of the ambitious goals of modern biology is to disentangle the links between genome and phenome in living organisms. In other words, how does the information coded into an organism's DNA relate to the functions and forms performed and expressed by that organism? There are many complexities involved in answering that question. One successful approach was to look simply at patterns of gene presence and absence in a balanced way across the tree of life and use supervised clustering to define sets of genes linking distant organisms to a common function (see here). We are looking to extend these models to metagenomic inference and to go beyond proteins to understand gene regulation across diversity. 

The computational tool "PredictTrophicMode" was developed as one aspect of this project.

A heatmap of subfunction presence/absence

Heatmaps like this form the core of the predictive models. Each row is a cluster of proteins that act in a similar functional process. The colors represent a weighted score relating the number of proteins each organism (columns) has for that process. Predictions are based on machine learning inference of the patterns found in heatmaps like this.

Principal component analysis plot clustering organisms by genes related to their capacity for phagoc

Another way to visualize the multiple-dimensional data displayed in heatmaps is to reduce the dimensionality using principal components. When this is done, clusters of organisms emerge based on their capacity for a functional process.

A model for the origins of phagocytosis

We found that the genes that are predictive of phagocytosis in eukaryotes do not have a single evolutionary origin. They come from archaea, bacteria, and apparent eukaryote lineage innovation.

Image credit: Stephen Thurston, AMNH

We linked cell eating (phagocytosis) in eukaryotes to its prokaryote roots by searching across diversity.

 Proudly created with