Programming for Health Data Science in R II
(DATASCI 214)
Formerly known as BIOSTAT 214
Fall 2025 (2-3 units)
Health-related data is generated daily at an increasing velocity, from a multitude of sources. Our ability to extract insights to advance basic biomedical science and clinical practice depends on our ability to effectively curate, transform, and analyze data as well as present and communicate findings. This course builds on students' core R language knowledge to cover skills in advanced data transformations, visualization, working with big (in-memory) data, automated and reproducible report-writing, and core statistical procedures.
Objectives
At the conclusion of this course, students will be able to:
- Import, query, clean, transform, and analyze large datasets in R (using base R and data.table).
- Write custom data processing pipelines to address their individual analytical needs.
- Perform common statistical procedures.
- Apply skills developed in class to complete a project using their own data/data of their choosing.
- Use AI coding assistants to generate, fix, and explain code (3-unit course)
Prerequisites
DATASCI 213. Exceptions to this prerequisite may be made with the consent of the Course Director, space permitting. Please review the DATASCI 213 syllabus to ensure that you are fully comfortable with all the material covered in DATASCI 213 before requesting an exception on that basis.
Faculty
Course Director: | Stathis Gennatas, MBBS, PhD Assistant Professor of Epidemiology & Biostatistics and Medicine (DoC-IT) email: [email protected] |
Format
Weekly lectures with demonstration and hands-on exercises. Sessions will be held on Thursdays 12:45 to 3:00 PM, September 18 through December 4. Using an interactive format, sessions begin with a review of the prior week's material and exercises. New material is introduced by reviewing code and with live demonstration in R. Participation is key to maximize learning for all students. Sessions end with labs where students have the opportunity to work on their weekly assignments, which are due two days before the following class at 5pm.
All course materials and handouts will be posted on the course's online syllabus.
Students join the class GitHub repository, where assignments and solutions are posted, and can access the course forum.
2-unit option: A 2-unit option is provided for students who cannot accommodate the full 3-unit course. The 2-unit option is 2 weeks shorter and does not include a final project. .
Materials
- Programming for Data Science in R by E.D. Gennatas (Updated 2025): https://rtemis.org
- The latest version of R (https://www.r-project.org)
- The latest version of Positron IDE (https://positron.posit.co/).
Prior to class, please review chapters 1-14, 17-20, and 43 of PDSR (https://pdsr.rtemis.org)
Grading
Final grades will be based on the weekly assignments (60%) and the final project (40%). The final project will be in the form of a brief article on your choice of a dataset to be written in Quarto.
Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.
Only UCSF students (defined as individuals enrolled in UCSF degree or certificate programs) will receive academic credit for courses. Official transcripts are available to UCSF students only. A Certificate of Course Completion will be available upon request to individuals who are not UCSF students and who satisfactorily pass all course requirements.
To Enroll
Health Data Science/Clinical Research (ATCR/MS) and PhD students use the Student Portal to add the course to their study list.
Students taking individual courses:
Fall 2025 Course Fees
Fall 2025 Course Schedule
Apply By September 12
Only one application needs to be completed for all courses desired during the quarter.