Explore the range of research areas within Statistics and Data Science and the leading faculty researchers guiding these areas.

## Asymptotic Statistics

Asymptotic statistics studies the properties of statistical estimators, tests, and procedures as the sample size tend to infinity and finds approximations that can be used in practical applications when the sample size is finite.

Faculty Researchers: Sumanta Basu, Florentina Bunea, Ahmed El Alaoui, Ziv Goldfeld, Thorsten Joachims, Amy Kuceyeski, Yang Ning, Karthik Sridharan, Y. Samuel Wang, Kilian Weinberger, and Dana Yang.

## Bayesian Analysis

Bayesian statistics provides a mathematical data analysis framework for representing uncertainty and incorporating prior knowledge into statistical inference. In Bayesian statistics, probabilities are used to represent the uncertainty in parameters rather than the data itself. This approach allows for the incorporation of prior information and the use of subjective and objective beliefs about the parameters.

Faculty Researchers: Tom Loredo, David Matteson, David Ruppert, and Martin Wells.

## Causal Inference

The goal of causal inference is to develop a formal statistical framework that answers causal questions from the real world data. Examples include identification and estimation of causal effect from observational studies, optimal treatment decision for patients and causal structure learning in network.

Faculty Researchers: Yang Ning

## Clinical Trials

Clinical trials are research studies designed to test the safety and effectiveness of medical treatments, drugs, or medical devices. Clinical trials aim to determine whether a new intervention is effective and safe. The results of clinical trials provide important information for regulatory agencies, healthcare providers, and patients in making decisions about the use of new medical interventions.

Faculty Researchers: Karla Ballman

## Econometrics

Econometrics applies statistical methods to analyze and model economic data. It provides a way to test economic theories and make predictions about economic events. Econometric research extends methods from regression, time series, panel data, and multivariate analysis.

Faculty Researchers: Sumanta Basu, Kengo Kato, Nicholas Kiefer, David Matteson, and Francesca Molinari.

## Empirical Processes

Empirical process theory provides a rigorous mathematical basis for central limit theorems, large deviation theory, weak convergence, and convergence rates. Empirical process theory is widely used in many areas of statistics and has applications in fields such as machine learning, probability theory, and mathematical finance.

Faculty Researchers: Kengo Kato and Marten Wegkamp.

## Functional Data Analysis

Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. --Wikipedia. Many of the tools of FDA were developed by generalizing multivariate statistical analysis from finite to infinite dimensional spaces using mathematical theory taken from functional analysis and operator theory.

Faculty Researchers: James Booth, David Matteson, and David Ruppert.

## Graphical Models

A graphical model, or probabilistic graphical model, is a statistical model which can be represented by a graph, the vertices correspond to random variables and edges between vertices indicate a conditional dependence relationship. Often times, the problem of interest is estimating the graph structure from data. When using directed edges, graphical models can be used to express causal relationships. The area draws contributions from a variety of disciplines including statistics, computer science, mathematics, philosophy, and biology.

Faculty Researchers: Ahmed El Alaoui, Yang Ning, Felix Thoemmes, Y. Samuel Wang, Martin Wells, and Dana Yang.

## High-dimensional Statistics

In statistical theory, the field of high-dimensional statistics studies data whose dimension is larger than typically considered in classical multivariate analysis (Wikipedia).

Faculty Researchers: Marten Wegkamp

## Machine Learning

Machine learning is a field of inquiry devoted to understanding and building methods that 'learn,' that is, methods that leverage data to improve performance on some set of tasks. It is at the intersection of Statistics and Computer Science, and is seen as a part of Artificial Intelligence.

Faculty Researchers: Sumanta Basu, Florentina Bunea, Ahmed El Alaoui, Ziv Goldfeld, Thorsten Joachims, Kengo Kato, Amy Kuceyeski, Yang Ning, Karthik Sridharan, Y. Samuel Wang, Marten Wegkamp, Kilian Weinberger, and Dana Yang.

## Model Selection

Model selection is the task of selecting a statistical model from a set of candidate models, given data.Instances include tuning parameter selection, feature selection in regression and classification, pattern recovery, nonparametric estimation. Model selection can also be viewed as a particular case of model aggregation.

Faculty Researchers: Sumanta Basu, James Booth, Florentina Bunea, Yang Ning, Marten Wegkamp, and Marty Wells.

## Spatial Analysis or Spatial Statistics

Spatial statistics is the study of models, methods, and computational techniques for spatially-referenced data, with the goal of making predictions of and drawing inferences about spatial processes.

Faculty Researchers: Joe Guinness

## Statistical Genetics

The field of Statistical genetics focuses on development and application of quantitative methods for drawing inferences from genetic data. Using techniques from statistics, computer science and bioinformatics, statistical geneticists help gain insight into the genetic basis of phenotypes and diseases.

Faculty Researchers: Sumanta Basu, James Booth,  Jason Mezey, and Yang Ning.

## Stochastic Processes

A stochastic process is a family of random variables, where each member is associated with an index from an index set. The type of variable is general, but common specifications are scalar, vector, matrix, or function valued random variables. The index set is also general, but common specification are the natural numbers or the real numbers, which define discrete indexed and continually indexed processes, respectively. For the former, the indexed family is considered a sequence, and the index is often associated with a time ordering. For the later, the index may be associated with continuous time, but it is also commonly associated with continuous space (location), in one, two, or three dimensions. Through such associations, the study and application of stochastic process is frequently linked to both time series analysis and spatial statistics.

Faculty Researchers: Ahmed El Alaoui, Ziv Goldfeld, Kengo Kato, David Matteson, and Gennady Samordinsky.

## Statistical Optimal Transport in High Dimensions

Statistical optimal transport is the area of statistics and machine learning devoted to coupling two separate distributions in an optimal way, with applications including non-parametric inference, goodnesss-of-fit testing, domain adaptation and data alignment, among very many other. The study of its high dimensional aspects is born from the need to address modern challenges in this area, and involves developing new notions of optimal transport that avoid the curse of dimensionality and the computational burden associated with classical approaches.

Faculty Researchers: Florentina Bunea and Kengo Kato.