CENTIPEDE and dsQTLs

Much or our current projects start from Roger’s postdoctoral work at the University of Chicago. He developed a novel probabilistic framework, called CENTIPEDE, to predict tissue-specific regulatory sites for DNA-binding proteins using DNase-seq footprinting (Pique-Regi et al., 2011 Genome Res). A systematic comparison of CENTIPEDE and ChIP-seq data (publicly available from ENCODE) on LCLs and K562 cells demonstrated a remarkable agreement in classifying motif instances as bound or unbound. This work received two recommendations from Faculty of 1000 post-publication peer review (Faculty of 1000: 2011. F1000.com/9138956) which highlight: “In this fashion, hundreds of thousands of TF binding sites can be inferred from a single DNase data set, as compared to hundreds of ChIP experiments that would be required for measuring each TF individually”.

In subsequent work (Degner, Pai, Pique-Regi et al. 2012 Nature) they sought to answer the following question: Is disruption of transcription factor binding a major mechanism of gene expression regulation? To address this question, DNaseI sequencing was used to measure genome-wide chromatin accessibility in 70 Yoruba lymphoblastoid cell lines (LCLs), for which genome-wide genotypes and estimates of gene expression levels based on RNA-sequencing are also available. Quantitative Trait Loci (QTL) analysis of both DNase-I sensitivity (dsQTL) and gene expression (eQTLs) was then performed. A substantial fraction (16%) of dsQTLs were significantly associated with variation in the expression levels of nearby genes (i.e., eQTLs), suggesting that changes in transcription factor binding frequently lead to gene expression changes. Conversely, 22% of eQTL SNPs were also classified as dsQTLs and, accounting for incomplete power, it is estimated that the true fraction is at least 52%. This work provided the first direct results linking a large fraction of eQTLs to a single class of underlying mechanisms.