How to Tell the Difference Between Machine Learning and (Bio)Statistics

Date: 
January 8, 2025
Time: 
3 to 4 p.m. PT
Place: 
MH-2700 or via Zoom

Michael Baiocchi, PhD
Associate Professor, Epidemiology and Population Health, Stanford University
Jordan Rodu, PhD
Assistant Professor, Statistics, University of Virginia

We'll start this talk by discussing a couple of studies: (i) a randomized trial to evaluate a sexual assault prevention program in Nairobi, Kenya and (ii) a remote detection operation to find and disrupt labor trafficking in the Amazon rainforest. These are both "data science" projects but they are wildly different in how they work. What makes them so different? For a long time in (bio)statistics, we only had two fundamental ways of reasoning using data: warranted reasoning (e.g., randomized trials) and model reasoning (e.g., linear models). In the 1980s a new, extraordinarily productive way of reasoning about algorithms emerged: "outcome reasoning." Outcome reasoning has come to dominate areas of data science, but it has been under-discussed and its impact under-appreciated. In this talk, we will discuss its current use (i.e., as "the common task framework") and its limitations. We will then discuss a way to extend this type of reasoning for use in assessing algorithms for deployment (i.e., "in the real world"). We developed this new framework so both technical and non-technical people can discuss and identify key features of their prediction problem.

Register for Zoom

Event Type: 
Biostatistics and Bioinformatics Seminar