From Random Forest to Regulation: Interpreting Supervised Learners to Guide Biological Discovery

Date:

December 16, 2020

Time:

3:00 - 4:00pm

Place:

Zoom - Registry Link Below

Individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive biological processes presents a substantial statistical challenge. Regression sequentially to test multiplicative interaction terms is intractable for high-order interactions in genome-scale data. Building on fundamental principles of data science – predictability, computability, and stability – we developed the iterative random forests (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order, rule-based interactions with the same order of computational cost as the RF. We demonstrate the utility of iRF in two prediction problems: enhancer activity in the Drosophila embryo and red hair in the UK Biobank cohort. In the UK biobank cohort, we show both previously reported and novel interactions associated with hair color that represent forms of non-linearities not captured by logistic regression models. By decoupling the order of interactions from the computational cost of identification, iRF opens additional avenues of inquiry into the molecular mechanisms underlying genome biology.

Image preview Speaker: Karl Kumbier, PhD, Postdoctoral Researcher, UCSF

Register: http://eepurl.com/g1X35P

Event Type:

Biostatistics and Bioinformatics Seminar

Search form

You are here

From Random Forest to Regulation: Interpreting Supervised Learners to Guide Biological Discovery