Dan Kowal, Ph.D. ’17, is an associate professor of statistics and data science in the Cornell Ann S. Bowers College of Computing and Information Science. His appointment in Cornell Bowers CIS comes via Cornell’s College of Agriculture and Life Sciences (CALS), one of two other colleges that comprise the Department of Statistics and Data Science.
Kowal received his Ph.D. in statistics from Cornell. He was advised by David Ruppert, the Andrew Schultz Jr. Professor of Engineering in the School of Operations Research and Information Engineering (ORIE), and professor of statistics and data science, and David Matteson, associate department chair and professor of statistics and data science and social statistics.
Before arriving at Cornell, Kowal was the Dobelman Family Assistant Professor in the Department of Statistics at Rice University.
What is your academic focus?
Bayesian statistics, dependent (time series, spatial, functional, etc.) data, missing data.
Could you describe your research?
My research aims to provide reliable, scalable, and interpretable statistical inference for a variety of “messy” data. Such data might include many variables, temporal or spatial dependencies, missing data, nonlinear associations, irregular distributions, etc. – or some combination of these. I design Bayesian models and algorithms to handle these complexities, quantify uncertainties, and make predictions and decisions.
What past professional work are you most proud of and why?
I’m most proud of the work done by my Ph.D. students, but since I can’t choose among their projects, I’ll say this paper on semiparametric Bayesian regression. It’s a classic problem: We often transform our data to improve the model fit, but what transformation should we use? This work shows how to learn that transformation, account for its uncertainty, and provide (posterior) inference and prediction across a variety of models and data settings. Perhaps most surprisingly, we circumvent the usual computing challenges – i.e., Monte Carlo Markov Chain (MCMC) – for Bayesian models. For broad accessibility, we designed software and a website with examples and documentation. Finally, this work began when my coauthor, Bohan Wu, was an undergraduate, and he was instrumental in establishing our theoretical results.
What courses are you most looking forward to teaching?
I’m excited to teach a course on Bayesian statistics. Not only is this topic central to my research and incredibly useful in so many academic fields and industries, but also the course serves to unify and emphasize several central themes within a statistics curriculum: General probability theory and modeling; likelihood analysis; statistical computing and simulation analysis; and interpretable uncertainty quantification, among others. It presents a different paradigm for statistical inference and uses many advanced tools but still revisits and reinforces the foundational concepts and techniques in statistics and data science.
What scientific questions are you looking to answer next or areas you plan to explore?
While it has become increasingly convenient to record large amounts of data, a persistent challenge is that modern datasets are often not representative of the target population. For instance, there may be systemic biases about who is or is not included in a dataset, which variables are or are not measured for each individual, and so forth. Even (or especially) with sophisticated models, such datasets can lead to misguided and potentially harmful conclusions. I am interested in building flexible Bayesian models for this setting that (1) explicitly provide estimates, uncertainty quantification, and predictions/recommendations that are specific to the target population and (2) properly incorporate external data sources such as surveys, census data, or other expert information that can reconcile these sample-target incongruities.