The Statistics Seminar Speaker for September 24th, 2014 will be Raazesh Sainudiin, from University of Canterbury in New Zealand.
Talk Title: Statistical Regular Pavings for Bayesian Non-parametric Density Estimation
Abstract:
We present a novel method for averaging a sequence of histogram states visited by a Metropolis-Hastings Markov chain whose stationary distribution is the posterior distribution over a dense space of tree-based histograms. The computational efficiency of our posterior mean histogram estimate relies on a statistical data-structure that is sufficient for non-parametric density estimation of massive, multi-dimensional metric data. This data-structure is formalized as statistical regular paving (SRP). A regular paving (RP) is a binary tree obtained by selectively bisecting boxes along their first widest side. SRP augments RP by mutably caching the recursively computable sufficient statistics of the data. The base Markov chain used to propose moves for the Metropolis-Hastings chain is a random walk that data-adaptively prunes and grows the SRP histogram tree. We use a prior distribution based on Catalan numbers and detect convergence heuristically. The L1-consistency of the initializing strategy over SRP histograms using a data-driven randomized priority queue based on a generalized statistically equivalent blocks principle is proved by bounding the Vapnik-Chervonenkis shatter coefficients of the class of SRP histogram partitions. The performance of our posterior mean SRP histogram is empirically assessed for large sample sizes simulated from several multivariate distributions that belong to the space of SRP histograms.
We also present arithmetical capabilities of the SRPs, including tree-based algorithms and structures for marginalization, conditional density extraction, fast look-up of product likelihood in validation, and uniform approximation of other efficient density estimates such as the dual-tree KDE of Gray and Moore. These operations have been used for fast cross-validation in prior-selection and subsequent anomaly detection in graph-valued time series (current work with Priebe and Lee), conditional density regression in up to 5 dimensions (current work with Tucker and Harlow), and collision-free co-trajectory arithmetic for air-traffic management (with Teng and Kuhn, Journal of Aerospace Computing, Information, and Communication, Vol. 9, No. 1, pp. 14-25, 2012).
This is joint work with Dominic Lee, Jennifer Harlow and Gloria Teng.
Relevant Papers:
MCMC paper: Posterior expectation of regularly paved random histograms, Raazesh Sainudiin, Gloria Teng, Jennifer Harlow and Dominic Lee, ACM Trans. Model. Comput. Simulat. (Special issue on Monte Carlo Methods), 23, 1, Article 6, 20 pages, 2013, http://dx.doi.org/10.1145/2414416.2414422
Arithmetic paper: Mapped Regular Pavings, Jennifer Harlow, Raazesh Sainudiin and Warwick Tucker, Reliable Computing, vol. 16, pp. 252-282, 2012, http://interval.louisiana.edu/reliable-computing-journal/volume-16/reliable-computing-16-pp-252-282.pdf
MRS 1.0: A C++ Class Library for Statistical Set Processing is available under GPL from:http://www.math.canterbury.ac.nz/~r.sainudiin/codes/mrs/
Refreshments will be served after the seminar in 1181 Comstock Hall.