Programming for Health Data Science in R II
(BIOSTAT 214)

Fall 2023 (2-3 units)

Health-related data is generated daily at an increasing velocity, from a multitude of sources. Our ability to extract insights to advance basic biomedical science and clinical practice depends on our ability to effectively curate, transform, and analyze data as well as present and communicate findings. This course builds on students' core R language knowledge to cover skills in advanced data transformations, visualization, working with big (in-memory) data, automated and reproducible report-writing, and core statistical procedures.

Online Syllabus

Objectives

At the conclusion of this course, students will be able to:

  • Import, query, clean, transform, and analyze large datasets in R (using base R and data.table).
  • Write custom data processing pipelines to address their individual analytical needs.
  • Perform common statistical procedures.
  • Apply skills developed in class to complete a project using their own data/data of their choosing.

 

Prerequisites

BIOSTAT 213.   Exceptions to this prerequisite may be made with the consent of the Course Director, space permitting. Please review the BIOSTAT 213 syllabus to ensure that you are fully conversant with all of Biostat 213 materials before requesting an exception on that basis.

Faculty

Course Director: Stathis Gennatas, MBBS, PhD

Assistant Professor of Epidemiology & Biostatistics
email: [email protected]

Format

Weekly lectures with demonstration and hands-on exercises. Sessions will be held on Thursdays, 8:15 to 10:20 AM, September 14 through November 31. Using an interactive format, sessions begin with a review of the prior week's material and exercises. New material is introduced by reviewing code and with live demonstration in R. Participation is key to maximize learning for all students. Sessions end with labs where students have the opportunity to work on their weekly assignments, which are due two days before the following class at 5pm.

All course materials and handouts will be posted on the course's online syllabus.

Students join the class Discord server, where they can interact with each other, the TAs, and the instructor.

Materials

  • Programming for Data Science in R by E.D. Gennatas (2022): https://class.lambdamd.org/pdsr/
  • R version 4.2.1 or higher
  • RStudio Desktop version 2022.02.3-492 or higher (free Open Source License version); or
  • VS Code with vscode-R extension.

Prior to class, please review chapters 1-14 and 38-39 of PDSR (https://class.lambdamd.org/pdsr/)

  1. Introduction: Ch. 1-5
  2. Data Types & Data Structures: Ch. 6-7
  3. Indexing: Ch. 8
  4. Factors & Data I/O: Ch. 9-10
  5. Vectorization & control flow: Ch. 11-12
  6. Summarizing & aggregating data: Ch. 13-14
  7. Visualization I: Ch. 38-39

Grading

Final grades will be based on the weekly assignments (60%) and the final project (40%). The final project will be in the form of a brief article on your choice of a dataset to be written in Rmarkdown.

Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.

Only UCSF students (defined as individuals enrolled in UCSF degree or certificate programs) will receive academic credit for courses. Official transcripts are available to UCSF students only. A Certificate of Course Completion will be available upon request to individuals who are not UCSF students and satisfactorily pass all course requirements.

UCSF Graduate Division Policy on Disabilities

To Enroll

ATCR and MAS students use the Student Portal

Students taking individual courses:

Course Fees
How to pay (please read before applying)
Fall 2023 Course Schedule

Apply by September 15, 2023 for Fall quarter

Only one application needs to be completed for all courses desired during the quarter.