Data scientist and statistician Katie Pollard, PhD, director of the Gladstone Institute of Data Science and Biotechnology and chief of the bioinformatics division in the Department of Epidemiology and Biostatistics, has been elected to the National Academy of Medicine (NAM), one of the highest honors in health and medicine. Through its election process, the Academy recognizes individuals who have demonstrated outstanding professional achievement and commitment to service.
Pollard is perhaps best known for developing a novel statistical approach to identify human accelerated regions (HARs), which are stretches of DNA that rapidly changed when humans evolved from primate ancestors. Many of these regions of the human genome help determine when and where important genes—including those associated with diseases—are turned on or off.
Pollard is also being recognized for the creation of statistical models and open-source bioinformatics software, which are used by researchers worldwide to investigate gene activity, genome evolution, and the microbiome (the collection of microbes found in the human gut).
“As a statistician, I am honored that the National Academy of Medicine and my nominators value our contributions—and the contributions of data scientists more broadly—to biomedical research and medicine,” says Pollard. “I love coding and math, but what really motivates me is using these methods to understand how our bodies work and how they break in disease.”
Pollard, who is also an investigator at the Chan Zuckerberg Biohub, entered graduate school at the University of California, Berkeley, interested in using math and statistics for public health applications. She was moving from classwork to research when the human genome was sequenced for the first time.
“I immediately became interested in using the genome sequence to measure differences in gene activity between tissues and disease states, such as in tumors versus nearby healthy tissue,” she recalls. “I also wanted to develop statistical methods that could help me, and other researchers, get reliable results from the unprecedently large arrays of genomic data being produced.”
Since then, Pollard and her lab have made critical contributions to several other research areas, including decoding how genomes work by using comparative genomics; creating statistical models, open-source bioinformatics software, and machine-learning frameworks to better understand the human genome; and developing analytical tools to study the human microbiome.
Driving medical research with bioinformatic approaches
As Pollard started her postdoctoral work, the chimpanzee genome was sequenced. Because she had studied anthropology (including primatology) as an undergraduate, she understood the importance and potential applications of the new information, and performed one of the first genome-wide comparisons of human and chimpanzee DNA. That work led to the discovery of HARs.
“HARs are short pieces of DNA where chimpanzees and other non-human mammals have nearly identical sequences,” she explains. “But the human HARs are very different from the chimp’s, which makes HARs exciting candidates for understanding traits that are unique to humans, such as spoken language, HIV susceptibility, and psychiatric diseases.”
After scientists had been trying to figure out the function of HARs for nearly a decade, Pollard and her team made a significant breakthrough by using an innovative approach inspired by the fields of bioinformatics, stem cell biology, and genomics.
They discovered that the vast majority of HARs are not genes, but rather “enhancers” that turn the activity of nearby genes up or down. They also found that many HARs control genes involved in brain development and in psychiatric diseases that are uniquely human, such as autism and schizophrenia.
“What I’m most excited about is using predictive models to drive experiments and the development of new tools and technologies.”
In parallel, for the past 15 years, Pollard’s team has been developing new ways to analyze the hundreds of species of microbes that grow inside the human gut, which play many roles in health and disease. Their breakthroughs could lead to the development of therapies to maintain or improve gut health. They are also helping set the stage for using the microbiome in precision medicine.
“To make these discoveries, we first had to create the right bioinformatics tools to tackle the questions we wanted to answer,” says Pollard. “We then applied our tools to massive analyses of terabytes of publicly available data, bringing together datasets that were not originally collected for the same purpose. And we used these datasets to ask new questions beyond what was analyzed in the original studies.”
She helped create several computational methods to better analyze typical datasets, including an approach that allows researchers to carry out bigger and more precise analyses of the microbiome than ever before. Their approaches are also faster and cheaper than previous technologies, making them accessible to most labs—not only the ones that can afford high-performance computing power.
To Pollard, this is one of the most crucial aspects of technology development: creating tools that can be shared with, and used by, as many scientists and students as possible. That’s why she’s such a strong advocate of open science, and a world leader in open-source bioinformatics software.
“The machine-learning tools and statistical methods we develop can be used to study a wide range of diseases,” says Pollard. “It’s important to me that they can be made available to anyone who needs them, so that we can open the door to important discoveries by researchers all around the world, across a variety of fields.”
Expanding the role of data science
Looking ahead, Pollard would like to help expand the role of data science in modern biomedical research. Rather than its current function of supporting the analysis of experimental research that has already been conducted, she would like to see data science setting the direction of experimentation and technology development.
“What I’m most excited about is using predictive models to drive experiments and the development of new tools and technologies,” she says. “Data scientists being in the driver’s seat will also ensure that we are designing the experiments and machines that best address the questions we want to ask down the line.”
Pollard earned her BA at Pomona College and her Master’s degree and PhD in biostatistics from UC Berkeley. She is a Fellow of the American Institute for Medical and Biological Engineering, the California Academy of Sciences, and the International Society for Computational Biology. She is also a member of the American Society of Human Genetics and the American Statistical Association.
Pollard’s election was announced on October 17, 2022, by the NAM, which is part of the congressionally chartered National Academy of Sciences—a group of private, nonprofit institutions that provide objective advice on matters of science, technology, and health.