Opportunities and Challenges of Complex Biomedical Data: Introduction to the Science of "Big Data" (DATASCI 202)

Opportunities and Challenges of Complex Biomedical Data: Introduction to the Science of "Big Data" (DATASCI 202)

Summer 2026 (3 units)

This is an introduction to the opportunities and challenges of using large datasets for biomedical research. Topics to be covered include: What makes big data different? What big data can and cannot do. Phases of data science: getting data, merging and cleaning data, storing and accessing data, visualizing or telling stories with data, drawing conclusions from data. Introduction to supervised and unsupervised machine learning including detailed discussion of algorithms and model fitting.

Online Syllabus

Objectives

At the conclusion of this course, students will be able to:

  • Utilize public use (and non-public) sources of data such as NHANES and social media data.
  • Utilize software to manipulate and clean big data.
  • Generate effective graphical displays of data.
  • Describe the advantages and disadvantages of different approaches to both supervised (classification and regression) and unsupervised modeling (clustering and data reduction).
  • Describe challenges to fitting complex models on big data, particularly the risk of overfitting in the context of model generalization/transportability.
  • Describe the issues that arise when trying to use "big data"-based observational studies to derive causal conclusions.
Prerequisites

None

Faculty
Format

Twice weekly pre-recorded lectures introduce the substantive content for each module, which is subsequently reinforced in weekly applied homework problem sets. Weekly computer lab sessions give students guided problems to work through and the opportunity to learn to use the software, ask questions, and have more interaction with faculty.

Lectures: Monday and Thursdays, 1:10 to 2:00 PM, July 16 to August 27, 2027 (first session Thursday, July 16)
Formal review of recorded lecture followed by application of lecture material as well as question and answer discussion.

Computer Laboratories: Thursdays, 2:10 PM to 3:30 PM, July 16 to August 27, 2025
Students have access to course faculty for questions on current or prior curriculum, assignments, and software implementation.

In addition, all students will be required to submit a final project in which they manipulate, clean, and analyze data emanating from a large data source. Students will be given a choice of datasets and guidelines for performing the project.

All course materials and handouts will be posted on the course's online syllabus.

Materials

The free software suite Orange will be used throughout. Orange is a comprehensive, component-based software package with strengths in data visualization, data mining and machine learning.

Grading

Grades will be based on the Computer Lab assignments and the Final Project. Lab assignments will be due by the start of lecture the following week. Homework problem sets will account for 70% of the points for the course. The final project, based on course supplied data sets, will account for 30% of the points possible for the course.

Students must hand in all homework problem sets (even if late), complete a satisfactory Final Project, and receive at least 80% of the total number of points assigned during the quarter to receive a Satisfactory (if taking Satisfactory/Unsatisfactory) or B (if taking for a letter grade) in the course.

Official UCSF transcripts are not available for individual courses taken within the Department of Epidemiology and Biostatistics.  Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.

Only UCSF students (defined as individuals enrolled in UCSF degree or certificate programs) will receive academic credit for courses. Official transcripts are available to UCSF students only. A Certificate of Course Completion will be available upon request to individuals who are not UCSF students and satisfactorily pass all course requirements.

UCSF Graduate Division Policy on Disabilities

To Enroll