Celebration of Statistics and Data Science

The Celebration of Statistics and Data Science conference, hosted by the Department of Statistics and Data Science, is a day-long event that celebrates Cornell’s rich history in statistics, data science, and mathematics, and showcases leading researchers who are driving innovation in the field.

The 2025 event will be held on Friday, September 5, at the Statler Hotel in the J. Willard Marriott Executive Education Center on the Cornell University campus in Ithaca, NY. Click here to Register.

Please be aware that seats are limited and available on a first-come, first-served basis.

Schedule of Events:

9:00 – 9:30 a.m. - Breakfast / Registration — Statler Hotel, Ballroom Foyer
9:30 – 9:35 a.m. - Opening remarks from Thorsten Joachims, Bowers Interim Dean
9:35 – 10:30 a.m. - Presentation of Distinguished Alumni Award by James Booth to Chris Jennison '82, University of Bath, followed by talk

Morning Session
See below for more information regarding each speaker and their talk.

10:30 – 11:05 a.m. - William Rosenberger, George Mason University
11:05 – 11:20 a.m. - Morning Break - Refreshments
11:20 – 11:55 a.m. - Judy Zhong, Weill Cornell Medicine
11:55 – 12:30 p.m. - Ajit Tamhane, Northwestern University (Ph.D. Cornell 1975)
12:30 – 2:00 p.m. - Lunch / Poster Session — Statler Hotel, Ballroom

Afternoon Session
See below for more information regarding each speaker and their talk.

2:05 – 2:35 p.m. - Iain Johnstone, Stanford University (Ph.D. Cornell 1981)
2:35 – 3:10 p.m. - Dan Kowal, Cornell University (Ph.D. Cornell 2017)
3:10 – 3:25 p.m. - Afternoon Break - Refreshments
3:25 – 4:00 p.m. - Cyrus Mehta, Harvard University
4:00 – 4:35 p.m. - Vadim Zipunnikov, Johns Hopkins Bloomberg School of Public Health (Ph.D. Cornell 2009)
4:35 p.m. - Closing Remarks

Distinguished Alumni: Chris Jennison '82

Bio: Christopher Jennison is Professor of Statistics at the University of Bath, UK. His PhD research at Cornell University concerned the sequential analysis of clinical trials and he has continued to work in this area for over 40 years.

His book with Professor Bruce Turnbull, "Group Sequential Methods with Applications to Clinical Trials", is a standard text and is widely used by practising statisticians. The second edition of this book, "Group Sequential and Adaptive Methods for Clinical Trials", has been extended to cover adaptive designs. This book is currently with the publishers and should be available to readers by the end of the year.

Professor Jennison's research is informed by experience of clinical trial analysis at the Dana Farber Cancer Institute, Boston, by service on data monitoring committees, and by a broad range of consultancy with pharmaceutical companies.

Title: Group Sequential and Adaptive Clinical Trial Designs: Achievements and Challenges

Abstract: I have worked in the area of clinical trial design since my time as a PhD student at Cornell. In a sequential trial, accumulating data are reviewed at interim analyses with a view to terminating the trial early if there are clear answers to the key questions. I shall describe methods for deriving stopping rules that minimize expected sample size while protecting type 1 and type 2 error rates and discuss the role of such optimised procedures in guiding the design of real clinical trials.

However, the goal of a study is not simply to answer to the question “Is the new treatment better than the current treatment?” It is also important to have a reliable estimate of the improvement offered by the new treatment so that payers may carry out cost-benefit analyses. The maximum likelihood estimate of the treatment effect after a sequential trial is typically biased. It is possible to apply the Rao-Blackwell method to construct a minimum variance unbiased estimate – but the resulting estimate can have unappealing properties. In a simple group sequential trial, more attractive “almost unbiased” estimates are available. In adaptive trials with multiple treatments or multiple patient subgroups, the Rao-Blackwell method has been used to derive minimum variance unbiased estimates of treatment effects: given the shortcomings of minimum variance unbiased estimates in simpler settings, it is timely to investigate these methods and explore “almost unbiased” alternatives.

Speakers

William Rosenberger

George Mason University

A color photo of a man smiling for a photo

Bio: William F. Rosenberger is Distinguished University Professor at George Mason University. He received his Ph.D. in mathematical statistics from George Washington University in 1992 and since then has spent much of his career developing statistical methodology for randomized clinical trials. He has two books on the subject, Randomization in Clinical Trials: Theory and Practice (Wiley, 2002), which won the Association of American Publishers Award for the best mathematics/statistics book published that year, and has recently been issued in a second edition (Wiley, 2016); and The Theory of Response-Adaptive Randomization in Clinical Trials (Wiley, 2006). He is a Fellow of the American Statistical Association (2005) and of the Institute of Mathematical Statistics (2011). An author of over 110 refereed papers, Prof. Rosenberger was named the 2012 Outstanding Research Faculty by the Volgenau School of Engineering, George Mason University, where he also served as Chairman of their Department of Statistics for 13 years, hiring 16 faculty and developing programs at the B.S., M.S. and Ph.D. levels. In 2014, he received a prestigious Fulbright scholarship to support his sabbatical at RWTH Aachen University in Germany. That same year he was promoted to the rank of University Professor (Distinguised University Professor, 2023), which is reserved for “eminent” individuals on the faculty “of great national or international reputation.” Only 32 out of 1400 faculty at George Mason have this distinction. In 2017 he was named the 15th Armitage Lecturer at the University of Cambridge, UK. He was elected the North American Editor of the tier-1 biostatistical methodology journal Biometrics, for 2021-2024. In 2024 he was named the 41st Fisher Memorial Lecturer by the Fisher Memorial Trust. He has supervised 20 doctoral students who are now leaders in academia, industry, and government.

Title: Sequential Design and Analysis in the Randomized Clinical Trial: A Historical Perspective

Abstract: Sequential analysis, as invented by Wald, was not targeted to clinical trials; in fact, the randomized clinical trial was being invented nearly simultaneously by Hill on the other side of the Atlantic. The connection to clinical trials was established by Bross and Armitage in the early 1950s, and applied in a number of trials during that decade. Restrictions in its applicability in practice led to a dry spell until its resurgence with group sequential methods.

I review the historical context of the use of sequential analysis in actual randomized clinical trials. I do not review methodological developments, except as they relate to the historical and philosophical setting. (The reader interested in methodological developments is referred to Jennison and Turnbull (1990).)

Dan Kowal

Cornell University

a color photo of Dan Kowal.

Bio: Dan Kowal is an associate professor of statistics and data science. His research primarily revolves around three themes: (1) Bayesian models and algorithms for large and dependent (e.g., time series, spatial, functional) data, (2) modeling, generation, and imputation of mixed data, and (3) predictive inference for actionable and interpretable uncertainty quantification. He directs his research toward open questions in public health, epidemiology, physical activity data, economics, and finance. Recently, he has worked on addressing urgent issues related to racial inequities and biases in statistical modeling.

Title: Facilitating heterogeneous effect estimation via statistically efficient categorical modifiers

Abstract: Categorical covariates such as race, sex, or group are ubiquitous in regression analysis. While mainonly (or ANCOVA) linear models are predominant, cat-modified linear models that include categoricalcontinuous or categorical-categorical interactions are increasingly important and allow heterogeneous, groupspecific effects. However, with standard approaches, the addition of cat-modifiers fundamentally alters the estimates and interpretations of the main effects, often inflates their standard errors, and introduces significant concerns about group (e.g., racial) biases. We advocate an alternative parametrization and estimation scheme using abundance-based constraints (ABCs). Crucially, we show that with ABCs, the addition of cat-modifiers 1) leaves main effect estimates unchanged and 2) enhances their statistical power, under reasonable conditions. Thus, analysts can, and arguably should include cat-modifiers in linear regression models to discover potential heterogeneous effects—without compromising estimation, inference, and interpretability for the main effects. Using simulated data, we verify these invariance properties for estimation and inference and showcase the capabilities of ABCs to increase statistical power. We apply these tools to study demographic heterogeneities among the effects of social and environmental factors on STEM educational outcomes for children in North Carolina.

Ajit Tamhane

Northwestern University

A color photo of a man with glasses wearing a turtleneck and jacket

Bio: Ajit Tamhane is Professor Emeritus of Industrial Engineering & Management Sciences (IEMS) at Northwestern University. He was the Chair of the IEMS Department from 2003 to 2010 and Senior Associate Dean of the McCormick School of Engineering from 2010 to 2018. He retired in 2022.

He has published many papers in the areas of multiple testing in clinical trials, design of experiments, ranking and selection procedures, chemometrics and other areas. He has authored or coauthored four books and co-edited two volumes of research papers..

He is an elected fellow of the American Statistical Association (1991), the Institute of Mathematical Statistics (2010), the American Association for Advancement of Science (2013) and an elected member of the International Statistical Institute (2015).

Title: Testing Primary and Secondary Endpoints in a Two-Stage Group Sequential Clinical Trial

Abstract: We study the problem of testing two secondary endpoints conditional on a primary endpoint being significant in a two-stage group sequential procedure. Application of the Bonferroni test to test the intersection of the secondary hypotheses results in the Holm procedure while application of the Simes test results in the Hochberg procedure. We develop normal theory analogs of the abovementioned p-value based tests for this problem that take into account (i) the gatekeeping effect of the test on the primary endpoint and (ii) correlations between the endpoints. The normal theory boundaries are determined by finding the least favorable configuration of the correlations and so their knowledge is not needed to apply these procedures. The p-value based procedures are easy to apply and readily extend to multiple secondary endpoints and multiple stages, but they are less powerful, partly because they do not take into account (i) and (ii) given above. Comparisons between the two types of procedures are given in terms of secondary powers.

Cyrus Mehta

Harvard University

A color photo of a man wearing a suit and tie in front of a wall

Bio: Cyrus Mehta is President and co-founder of Cytel Corporation and Adjunct Professor of Biostatistics, Harvard University. Cytel (www.cytel.com) is a leading provider of software, clinical services and strategic consulting on the design, interim monitoring and implementation of adaptive clinical trials, with offices in the United States, Europe and India. Its software and services have won many industry awards. Dr Mehta’s research encompasses group sequential and adaptive design of clinical trials that include multiple treatment arms and multiple endpoints. Dr. Mehta consults extensively with the biopharmaceutical industry on these topics, offers workshops, and serves on data monitoring and steering committees for trials in many therapeutic areas.

Title: Graph Based, Adaptive, Multi Arm, Multiple Endpoint, Two Stage Design

Abstract: The graph based approach to multiple testing is an intuitive method that enables a study team to represent clearly, through a directed graph, its priorities for hierarchical testing of multiple hypotheses, and for propagating the available type-1 error from rejected or dropped hypotheses to hypotheses yet to be tested. Although originally developed for single stage non-adaptive designs, we show how it may be extended to two-stage designs that permit early identification of efficacious treatments, adaptive sample size re-estimation, dropping of hypotheses, and changes in the hierarchical testing strategy at the end of stage one. Two approaches are available for preserving the family wise error rate in the presence of these adaptive changes; the p-value combination method, and the conditional error rate method. In this investigation we will present the statistical methodology underlying each approach and will compare the operating characteristics of the two methods in a large simulation experiment.

Judy Zhong

Weill Cornell Medicine

A color portrait of Judy Zhong in front of a grey background.

Bio: Judy Zhong, Ph.D., is Professor and Division Chief of Biostatistics in the Department of Population Health Sciences at Weill Cornell Medicine. She has over a decade of experience leading the design, coordination, and analysis of multi-center clinical trials and large-scale observational studies, with continuous NIH R01 support. Her research spans statistical methods for dynamic risk prediction, adaptive trial design, algorithmic fairness, and bias correction in electronic health records, as well as innovative uses of AI and machine learning to improve clinical decision-making.

Title: Uniting Methodological Rigor and Automation: Advancing Clinical Trial Methods and Operations

Abstract: Modern clinical trials face dual challenges: maintaining validity in the presence of intercurrent events and managing the growing operational complexity of multisite coordination. In the first part of this talk, we examine evidence from a systematic review of large antithrombotic trials and results from targeted simulations showing how treatment discontinuation—often triggered by safety or tolerability concerns— can bias efficacy estimates under both treatment policy and while-on-treatment strategies. We highlight the limitations of current reporting practices and demonstrate how estimand-aligned analyses, supported by stratified outcome curves and other graphical diagnostics, can improve the interpretability of results. In the second part, we introduce ASTRA (Agent System for TRial Automation), a conceptual framework for a multi-agent Clinical Trial Center (CTC) integrating Clinical and Data Coordinating Center functions. Together, these perspectives underscore the need for methodological rigor in handling intercurrent events and for innovative, AI-enabled operational models to meet the scientific and regulatory demands of contemporary clinical trials.

Iain Johnstone

A color photo of a man smiling for a photo outside wearing a blue dress shirt.

Bio: Iain M. Johnstone is Marjorie Mhoon Fair Professor in the Department of Statistics at Stanford University with a joint appointment in the Department of Biomedical Data Science in Stanford’s School of Medicine. A native of Australia, he received his Ph.D. in Statistics from Cornell in 1981. His work in theoretical statistics has used ideas from harmonic analysis, such as wavelets, to understand estimation methods in statistical signal and image processing. More recently, he has applied random matrix theory to the study of high-dimensional multivariate statistical methods, such as principal components, canonical correlation analysis and multivariate components of variance. In biostatistics, he collaborated extensively with investigators in cardiology and prostate cancer. He is a Fellow of the American Statistical Association, a member of the National Academy of Sciences and a former president of the Institute of Mathematical Statistics.

Title: Complex variables and the CLT for generalized linear mixed models

Abstract: In the classical approach to showing asymptotic normality of maximum likelihood, often the real challenge is to control the third derivative of the log-likelihood near the true value. When the log-likelihood can be treated as a function of a complex variable, the difficulty can sometimes be overcome. Our motivation comes from a class of generalized linear mixed models in which both the number of groups and the number of observations within each group are large.

Vadim Zipunnikov

Johns Hopkins Bloomberg School of Public Health

A color photo of Vadim Zipunnikov

Bio: Vadim Zipunnikov, Ph.D. is an Associate Professor of Biostatistics at Johns Hopkins Bloomberg School of Public Health and co-leads the Wearable and Implantable Technology (WIT) group. His research focuses on developing statistical methods for analyzing digital health data from wearables, smartphones, and implantable devices to support clinical trials and regulatory science. He collaborates with the FDA, NIH, and industry partners to standardize digital biomarker development and integration into drug development. Dr. Zipunnikov has authored over 100 peer-reviewed publications and leads large-scale projects advancing digital endpoints in neurology, psychiatry, and aging.

Title: Developing more sensitive endpoints by leveraging novel statistical methods for Digital Health Technologies (DHTs) data

Abstract: Digital Health Technologies (DHT) are now used to continuously track physical activity and sleep in many clinical studies. This DHT data provides tremendous opportunities to develop novel more sensitive clinical trial endpoints. There is, however, a large gap between the complexity of DHT data and statistical methodology for fully leveraging the potential of DHT. This talk will discuss recent developments of novel DHT-centric statistical methods that can provide more sensitive endpoints by extracting and fusing together information from temporal, distributional, and time-series aspects of DHT data.