Dr. Charles McCulloch received his PhD in Statistics from Cornell University and returned to the faculty there after an appointment at Florida State University. Before coming to UCSF, he spent 18 years at Cornell, eventually becoming Professor and the founding Chair of the Department of Statistical Science. He conducts primary research in the areas of longitudinal data analysis, generalized linear mixed models, and latent class models. He is the co-author of four textbooks and author of the Institute of Mathematical Statistics monograph, "Generalized Linear Mixed Models." He is a fellow of the American Statistical Association, and an elected member of the International Statistical Institute. He was the primary lecturer for an NSF-CBMS Regional Research Conference in 1999 on the topic of generalized linear mixed models. He has over 30 years of statistical consulting and collaborative experience.
Title: small thoughts on Big Data
Abstract: I will begin my talk by describing some of my history with “big data,” much of it at Cornell, and what I think is and is not new about big data. In the second part of the talk I will discuss recent research that has been motivated by one type of big data, electronic medical records. The hope is that the wealth of clinical data and the realistic setting (compared to information derived from highly controlled and often artificial experiments like randomized trials) will aid in the investigation of determinants of disease and understanding of which treatments are effective for which patients. The hype is that the availability of information in such databases is often driven by how a patient feels, is therefore associated with the outcomes being analyzed, and can lead to severe bias. My goal is to understand the utility (or disutility) of research conducted using such data. I describe diagnostic methods for uncovering this outcome-dependence and theoretical results on bias that help split the hype from the hope.