Meta-Clustering of Genomic Data

Date: 
April 4, 2019
Time: 
3:00 to 4:00pm
Place: 
MH-2700

Yingying Wei, PhD, Assistant Professor, Department of Statistics, The Chinese University of Hong Kong

Like traditional meta-analysis which pools effect sizes across studies to inprove statistical power, it is of increasing interest to conduct clustering jointly across datasets to identify disease subtypes, fro bulk genomic data and discover cell types for single-cell RNA-sequencing (scRNA-seq) data.  Unfortunately, due to the prevalence of technical batch effects among high-throughput experiments, directly clustering samples from multiple datasets would lead to wrong results.  The recent emerging meta-clustering approaches require all datasets to contain all subtypes, which is no feasible for many experimental designs.

In this talk, I will present our Batch-effects-correction-with-Unknown-Subtypes (BUS) framework.  BUS is capable of correcting batch effects explicityly, grouping samples that share similar characteristics into subtypes, identifying features that distinguish subtypes, and enjoying a linear-order computational complexity.  We prove the identifiablility of BUS for not only bulk data but also scRNA-seq data whose dropout events suffer from missing not at random.  We mathematically show that under two very flexible and realistic experimental designs--the "reference panel" and the "chain-type" desings--true biological variability can also be separated from batch effects.  BUS outperforms existing methods on real data.

Event Type: 
Biostatistics and Bioinformatics Seminar