Use of Electronic Health Record Data for Research

Spring 2023 (3 units)

Electronic Health Record (EHR) data can be used for a variety of clinical, epidemiologic and translational research, and these data are becoming more accessible. This course introduces students to concepts, methods, and pitfalls related to the extraction, manipulation and analysis of data from EHRs. The course covers common EHR data structures and vocabularies, using that knowledge to inform research study design, and creation of patient cohorts and analytic extracts. We will cover both ambulatory and inpatient use cases. Students have the opportunity to design their own research projects during the course.


The objectives for this course are for participants to understand:

  • Relational database and data warehouse models as they pertain to EHR data;
  • Medical vocabularies and ontologies used in EHRs;
  • Construction of patient cohorts based on structured data, such as diagnosis codes, encounters, and procedures;
  • Extraction of relevant associated data for a specified patient cohort, including history, medical orders, laboratory tests, and medications;
  • Summarization of the description of patient cohorts from analytic files; and
  • Formulating research questions that benefit from the strengths and limit weaknesses of EHR data.


The course presumes students enter with familiarity and direct experience with a) relational databases; and b) manipulation of data with either Stata or R software. This can be achieved with:

a) Data Collection and Management Systems for Clinical Research (EPI 218) (or equivalent experience); and
b) Introduction to Statistical Computing in Clinical Research (BIOSTAT 212) or Introduction to Computing in the R Software Environment (BIOSTAT 213) (or  equivalent experience).

If you do not have the specific prerequisites to take this course or have equivalent experience, please reach out to the course directors to discuss options. The course does not presume familiarity with epidemiologic or biostatical methods nor does it teach these methods, but students with this background will be able to make links to the curriculum to inform decisions made in the construction of patient cohorts and extraction of relevant associated EHR data.


Course Director:

Anobel Odisho, MD, MPH

Assistant Professor, Department of Urology
email: [email protected]


Each week, new material is introduced via a recorded lecture and recommended readings. After beginning to study the lecture and reading on their own, the class gathers for a Large Group Discussion in which the lecture is briefly reviewed and students have the opportunity to pose questions to course faculty or prompt discussion on any aspect of the material. A Computer Laboratory session immediately follows, providing students with time to work on their problem sets with supervision and assistance from course leaders. A second Computer Laboratory occurs at the end of the weekly cycle, allowing students to finish their assignments, resolve outstanding questions, and receive advice about their own ongoing projects.

Large Group Discussion:
Brief review of lecture followed by question and answer discussion. Recorded lecture should be viewed prior to this session.
Time: Tuesdays, 8:45 to 9:15 AM, beginning April 4

Computer Laboratory A:
Course faculty are available to address questions regarding the weekly assignment. Students may participate either in the larger forum or in smaller group discussions.
Time: Tuesdays, 9:15 to 10:15 AM, beginning April 4

Computer Laboratory B:
Additional time for students to work on weekly assignments and ongoing projects with group-based and one-on-one assistance from course faculty.
Time: Fridays, 3:15 to 5:00 PM, beginning April 7

The weekly learning cycle, therefore,
a) begins on a Monday when the recorded lecture, detailing the weekly content, is released on video;
b) has curricular emphases noted and clarifications provided on Tuesday with a large group discussion and class time to begin work on the weekly assignment;
c) reinforces learning for the next several days as students continue to work on the assignment and their individual project via self-study and group study;
d) has additional faculty availability on Friday for questions and discussion; and
e) ends on the following Monday with the submission of the completed weekly assignment.


Prior to the beginning of the course, students should obtain access to the UCSF Research Analysis Environment (RAE), the UCSF de-identified Clinical Data Warehouse, and the UCSF Git (a system that allows you to share and collaborate on source code). Please begin the process of obtaining access at least 1 week prior to the course.

RAE also contains instances of R and Stata to manipulate and analyze data obtained from the de-identified EHR warehouse, but students may wish to use their own copies of this software outside of RAE.

All course materials and handouts will be posted on the course's online syllabus.


Grades will be based on total points achieved on the weekly problem set homework assignments (~70%) and the final project (~30%). Late assignments are not accepted. Answer keys to problem sets will be posted shortly after the turn-in deadline.

Students not in full-year TICR Programs who satisfactorily pass all course requirements will, upon request, receive a Certificate of Course Completion.

UCSF Graduate Division Policy on Disabilities

To Enroll

ATCR and MAS students use the Student Portal

Students taking individual courses:

Course Fees
How to pay (please read before applying)
Only one application needs to be completed for all courses desired during the quarter.

Spring Course Schedule (available January 15, 2023)

Apply (application for Spring application will be available January 15, 2023)