Explore the range of research areas within Statistics and Data Science and the leading faculty researchers guiding these areas.

Asymptotic Statistics | Bayesian Analysis | Causal Inference | Clinical Trials | Econometrics | Empirical Processes | Functional Data Analysis | Graphical Models | High-dimensional Statistics | Machine Learning | Model Selection | Spatial Analysis or Spatial Statistics | Statistical Genetics | Stochastic Processes | Statistical Optimal Transport in High Dimensions

Asymptotic Statistics

In statistics, asymptotic theory, or large sample theory, is a framework for assessing properties of estimators and statistical tests. Within this framework, it is often assumed that the sample size n may grow indefinitely; the properties of estimators and tests are then evaluated under the limit of n → ∞ (Wikipedia).

Faculty Researchers: Sumanta Basu, Florentina Bunea, Ahmed El Alaoui, Ziv Goldfeld, Thorsten Joachims, Amy Kuceyeski, Yang Ning, Karthik Sridharan, Y. Samuel Wang, Kilian Weinberger, and Dana Yang.

Bayesian Analysis

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian methods are important for modeling and inference in statistical analysis (Wikipedia).

Faculty Researchers: Tom Loredo, David Matteson, David Ruppert, and Martin Wells. 

Causal Inference

The goal of causal inference is to develop a formal statistical framework that answers causal questions from the real world data. Examples include identification and estimation of causal effect from observational studies, optimal treatment decision for patients and causal structure learning in network.

Faculty Researchers: Yang Ning

Clinical Trials

Clinical trials are prospective biomedical or behavioral research studies on human participants designed to answer specific questions about biomedical or behavioral interventions, including new treatments and known interventions that warrant further study and comparison (Wikipedia).

Faculty Researchers: Karla Ballman


Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships (Wikipedia).

Faculty Researchers: Sumanta Basu, Kengo Kato, Nicholas Kiefer, David Matteson, and Francesca Molinari.

Empirical Processes

In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state (Wikipedia).

Faculty Researchers: Kengo Kato and Marten Wegkamp.

Functional Data Analysis

Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. --Wikipedia. Many of the tools of FDA were developed by generalizing multivariate statistical analysis from finite to infinite dimensional spaces using mathematical theory taken from functional analysis and operator theory.

Faculty Researchers: James Booth, David Matteson, and David Ruppert. 

Graphical Models

A graphical model, or probabilistic graphical model, is a statistical model which can be represented by a graph, the vertices correspond to random variables and edges between vertices indicate a conditional dependence relationship. Often times, the problem of interest is estimating the graph structure from data. When using directed edges, graphical models can be used to express causal relationships. The area draws contributions from a variety of disciplines including statistics, computer science, mathematics, philosophy, and biology.

Faculty Researchers: Ahmed El Alaoui, Yang Ning, Felix Thoemmes, Y. Samuel Wang, Martin Wells, and Dana Yang.

High-dimensional Statistics

In statistical theory, the field of high-dimensional statistics studies data whose dimension is larger than typically considered in classical multivariate analysis (Wikipedia).

Faculty Researchers: Marten Wegkamp

Machine Learning

Machine learning is a field of inquiry devoted to understanding and building methods that 'learn,' that is, methods that leverage data to improve performance on some set of tasks. It is at the intersection of Statistics and Computer Science, and is seen as a part of Artificial Intelligence.

Faculty Researchers: Sumanta Basu, Florentina Bunea, Ahmed El Alaoui, Ziv Goldfeld, Thorsten Joachims, Kengo Kato, Amy Kuceyeski, Yang Ning, Karthik Sridharan, Y. Samuel Wang, Marten Wegkamp, Kilian Weinberger, and Dana Yang.

Model Selection

Model selection is the task of selecting a statistical model from a set of candidate models, given data.Instances include tuning parameter selection, feature selection in regression and classification, pattern recovery, nonparametric estimation. Model selection can also be viewed as a particular case of model aggregation.

Faculty Researchers: Sumanta Basu, James Booth, Florentina Bunea, Yang Ning, Marten Wegkamp, and Marty Wells. 

Spatial Analysis or Spatial Statistics

Spatial statistics is the study of models, methods, and computational techniques for spatially-referenced data, with the goal of making predictions of and drawing inferences about spatial processes.

Faculty Researchers: Joe Guinness

Statistical Genetics

The field of Statistical genetics focuses on development and application of quantitative methods for drawing inferences from genetic data. Using techniques from statistics, computer science and bioinformatics, statistical geneticists help gain insight into the genetic basis of phenotypes and diseases.

Faculty Researchers: Sumanta Basu, James Booth,  Jason Mezey, and Yang Ning.

Stochastic Processes

A stochastic process is a family of random variables, where each member is associated with an index from an index set. The type of variable is general, but common specifications are scalar, vector, matrix, or function valued random variables. The index set is also general, but common specification are the natural numbers or the real numbers, which define discrete indexed and continually indexed processes, respectively. For the former, the indexed family is considered a sequence, and the index is often associated with a time ordering. For the later, the index may be associated with continuous time, but it is also commonly associated with continuous space (location), in one, two, or three dimensions. Through such associations, the study and application of stochastic process is frequently linked to both time series analysis and spatial statistics.

Faculty Researchers: Ahmed El Alaoui, Ziv Goldfeld, Kengo Kato, David Matteson, and Gennady Samordinsky. 

Statistical Optimal Transport in High Dimensions

Statistical optimal transport is the area of statistics and machine learning devoted to coupling two separate distributions in an optimal way, with applications including non-parametric inference, goodnesss-of-fit testing, domain adaptation and data alignment, among very many other. The study of its high dimensional aspects is born from the need to address modern challenges in this area, and involves developing new notions of optimal transport that avoid the curse of dimensionality and the computational burden associated with classical approaches.

Faculty Researchers: Florentina Bunea and Kengo Kato.