Statistics Seminar Speaker: Dianne Cook, 11/4/2015

The Statistics Seminar Speaker for November 4 is Dianne Cook, professor of Econometrics and Business Statistics at Monash University. From her webpage: "I am a Fellow of the American Statistical Association. My research is in data visualization, exploratory data analysis, multivariate methods, data mining and statistical computing. I have experimented with visualizing data in virtual environments, participated in producing software including xgobi, ggobi, cranvas and several R packages. Methods development include tours, projection pursuit, manual controls for tours, pipelines for interactive graphics, a grammar of graphics for biological data, and visualizing boundaries in high-d classifiers. My current work is focusing on bridging the gap between statistical inference and exploratory graphics. We are doing experiments using Amazon's Mechanical Turk, and eye-tracking equipment. Some of the applications that I have worked on include backhoes, drug studies, mud crab growth, climate change, gene expression analysis, butterfly populations in Yellowstone, stimulus funds spending, NRC rankings of graduate programs, technology boom and bust, election polls, soybean breeding, common crop population structures, insect gall to plant host interactions, soccer and tennis statistics."

Title: Statistical Inference by Crowd-Sourcing

Abstract: Plots of data often provoke the response "is what we see really there". In this talk we will discuss ways to give visual statistical methods an inferential framework. Statistical significance of "graphical discoveries" is measured by having the human viewer compare the plot of the real dataset with collections of plots of null datasets: plots take on the role of test statistics, and human cognition the role of statistical tests, in a process modeled after the "lineup", popular from criminal legal procedures. This is a simple but rigorous protocol that provides valid inference, yielding p-values and estimates of the test power, for graphical findings.

Amazon's Mechanical Turk is used to implement the lineup protocol and crowd-source the inference. Turk is a resource where people are employed to do tasks that are difficult for a computer, in this case, evaluating structure in plots of data. With a suite of experiments, the lineup protocol was run head-to-head against the equivalent conventional test, yielding results that mirror those produced by classical inference. This talk will describe these results, and show how the lineup protocol is used for assessing graphical findings and designing good data plots.

Joint work with Heike Hofmann, Mahbubul Majumder, Niladri Roy Chowdhury, Lendie Follett, Susan Vanderplas, Adam Loy, Yifan Zhao, Nathaniel Tomasetti

Refreshments will be served after the seminar in 1181 Comstock Hall.

Dianne_Cook.pdf