Genevera Allen is an Associate Professor at Rice University in the Departments of Statistics, Computer Science (by courtesy), and Electrical and Computer Engineering (by courtesy) and at Baylor College of Medicine, where she is an investigator in the Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital. Dr. Allen received her PhD in statistics from Stanford University (2010), under the mentorship of Prof. Robert Tibshirani, and her bachelors, also in statistics, from Rice University (2006).
Dr. Allen's research focuses on developing statistical methods to help scientists make sense of their 'Big Data' in applications such as high-throughput genomics and neuroimaging. Her work lies in the areas of modern multivariate analysis, graphical models, statistical machine learning, and data integration or data fusion. The recipient of several honors including a National Science Foundation CAREER award and the International Biometric Society's Young Statistician Showcase award, she also represented the American Statistical Association (ASA) at the Coalition for National Science Funding on Capitol Hill in 2013 and 2014, and has had her research highlighted on the House floor in a speech by Congressman McNerney (D-CA). In 2014, Dr. Allen was named to the "Forbes '30 under 30': Science and Healthcare" list. She is also the recipient of research grant awards from the Ken Kennedy Institute for Information Technology, the National Science Foundation (NSF), and joint initiatives between NSF and the National Institutes of Health. Dr. Allen currently serves as an Associated Editor for Biometrics, the Secretary / Treasurer for the ASA Section on Statistical Computing, and the Program Chair for the ASA Section on Statistical Learning and Data Science.
Outside of work, Dr. Allen is a patron of the Houston Symphony and Houston Grand Opera and is involved with several arts organizations throughout Houston. She also enjoys traveling, Texas craft beers, and playing viola.
Title: Inference, Computation, and Visualization for Convex Clustering and Biclustering
Abstract: Hierarchical clustering enjoys wide popularity because of its fast computation, ease of interpretation, and appealing visualizations via the dendogram and cluster heatmap. Recently, several have proposed and studied convex clustering and biclustering which similar in spirit to hierarchical clustering, achieve cluster merges via convex fusion penalties. While these techniques enjoy superior statistical performance, they suffer from slower computation and are not generally conducive to representation as a dendogram. In the first part of the talk, we present new convex (bi)clustering methods and fast algorithms that inherit all of the advantages of hierarchical clustering. Specifically, we develop a new fast approximation and variation of the convex (bi)clustering solution path that can be represented as a dendogram or cluster heatmap. Also, as one tuning parameter indexes the sequence of convex (bi)clustering solutions, we can use these to develop interactive and dynamic visualization strategies that allow one to watch data form groups as the tuning parameter varies. In the second part of this talk, we consider how to conduct inference for convex clustering solutions that addresses questions like: Are there clusters in my data set? Or, should two clusters be merged into one?
To achieve this, we develop a new geometric representation of Hotelling's T^2-test that allows us to use the selective inference paradigm to test multivariate hypotheses for the first time. We can use this approach to test hypotheses and calculate confidence ellipsoids on the cluster means resulting from convex clustering. We apply these techniques to examples from text mining and cancer genomics. This is joint work with John Nagorski and Frederick Campbell.