The Statistics Seminar speaker for Wednesday, November 1, 2017, is Victoria Stodden, an associate professor in the School of Information Sciences at the University of Illinois at Urbana-Champaign, with affiliate appointments in the School of Law, the Department of Computer Science, the Department of Statistics, the Coordinated Science Laboratory, and the National Center for Supercomputing Applications. Stodden is a data scientist working on open data and its implications. Her research group focuses on understanding the effect of big data and computation on scientific inference, for example studying adequacy and robustness in replicated results, designing and implementing validation systems, developing standards of openness for data and code sharing, and resolving legal and policy barriers to disseminating reproducible research.
Talk: Structuring Machine Learning Research in Data Driven Science
Abstract: Statistical discovery is increasingly taking place using data not collected by the discoverers and often completely in silico. This calls on new considerations of methods and computational infrastructure that support statistical pipelines. In this talk I present a novel framework for statistical analysis of "organic data" as opposed to "designed data" (Kreuter & Peng 2014) called CompareML that permits the direct comparison of findings that purport to answer the same statistical question. I will argue that such computational frameworks are crucial to reproducible science by way of an example from genomics (acute leukemia (Golub et al 1999)) where traditional approaches (surprisingly) fail at scale.