Advanced computational methods with UCSF clinical data on Information Commons

Place:Byers Hall 215
Time:3-5pm
Date:February 5, 2020

Getting ready to apply Machine Learning and other advanced computational methods to your research? You can do it with UCSF Information Commons, a high performance compute environment powered by AWS Apache Spark cluster. In this hands-on workshop, we will go through a real case study to explore de-identified UCSF Electronic Health Records using UCSF Information Commons. You will learn how to query UCSF clinical data and gain some of the skills necessary for building your own computational models in this environment.

Learning Objectives
In this workshop, you will learn how to do the following on Information Commons:

Run SQL queries to extract de-identified clinical data of interest
Manage your files on the cluster Launch JupyterHub and run Jupyter notebooks with Python, R or SparkSQL code
Train a machine learning model using Spark-based tools

Prerequisites
In order to benefit from this workshop, you must have an Information Commons account (see Accessing Information Commons) and permission to access UCSF de-identified clinical data (see Research Data and Tools Access Request). Please make sure that you do this by January 20, as this process can take up to 2 weeks. We also strongly advise that you are comfortable with Unix shell scripting, SQL, and Jupyter notebooks. Familiarity with AWS s3 commands, Python and concepts of machine learning will also be helpful. Tutorials are available on the Information Commons Wiki.

Be sure to bring your laptop to the workshop! Instructors Geoff Boushey is an Application Developer for the Data Science Initiative and Center for Knowledge Management in the UCSF Library

Angelo Pelonero is an Instructional Designer for the Data Science Initiative in the UCSF Library and for the Bakar Computational Health Sciences Institute at UCSF

Event Type

Conference