This week's Joint BSCB/Statistics Seminar Speaker will be Terry Speed from the University of California-Berkeley.
Removing Unwanted Variation: from Principal Components to Random Effects
Ordinary least squares is a venerable tool for the analysis of scientific data originating in the work of A. M. Legendre and C. F. Gauss around 1800. Gauss used the method extensively in astronomy and geodesy. Generalized least squares is more recent, originating with A. C. Aitken in 1934, though weighted least squares was widely used long before that. At around the same time (1933) H. Hotelling introduced principal components analysis to psychology. Its modern form is the singular value decomposition. In 1907, motivated by social science, G. U. Yule presented a new notation and derived some identities for linear regression and correlation. Random effects models date back to astronomical work in the mid-19th century, but it was through the work of C. R. Henderson and others in animal science in the 1950s that their connexion with generalized least squares was firmly made.
These are the diverse origins of our story, which concerns the removal of unwanted variation in high dimensional genomic and other “omic” data using negative controls. We start with a linear model that Gauss would recognize, with ordinary least squares in mind, but we add unobserved terms to deal unwanted variation. A singular value decomposition, one of Yule’s identities, and negative control measurements (here genes) permit the identification of our model. In a surprising twist, our initial solution turns out to be equivalent to a form of generalized least squares. This is the starting point for much of our recent work. In this talk I will try to explain how a rather eclectic mix of familiar statistical ideas can combine with equally familiar notions from biology (negative and positive controls) to give a useful new set of tools for omic data analysis. Other statisticians have come close to the same endpoint from a different perspective, including Bayesian, sparse linear and random effects models.
A joint work with Johann Gagnon-Bartsch and Laurent Jacob.
Refreshments will be served in ILR Conference Center 429.