Condition Specific Transcription Factor Binding Footprints

We are extending CENTIPEDE mixture model approach to combine data across tissues and to model binding sites that appear to be actively used together as modules by a complex of multiple factors. Paired-end information obtained from new DNase-seq and ATAC-seq protocols as well as new single cell approaches provide additional information that can be exploited towards this end. The predicted modules and the gene expression levels along the genome and across tissues can be used to identify common regulatory programs that dictate how genes are activated in a cell-type specific manner. Projects such as ENCODE and Roadmap Epigenomics have been generating large amounts of data, using different types of functional assays on a large collection of tissues. These projects are a unique resource for developing new statistical tools, and assessing which experimental assays are more adequate for answering a given biological question. The data collected by the Genotype-Tissue Expression (GTEx) project on genetic variants affecting gene expression levels across multiple tissues is also very useful in assessing the learned regulatory programs and in providing mechanisms by which genetic variants can affect gene expression and may ultimately lead a disease phenotype. We have started two collaborative projects with Luis Barreiro and Francesca Luca in which we are collecting ATAC-seq data using different in vitro systems to measure chromatin changes caused by environmental and genetic perturbations over time.