To find out what research topics our department is working on currently, you can browse this list of recently published papers by our Department of Statistical Sciences faculty and graduate students. Some titles may be truncated in the header field - please click on the header for the drop-down description of the paper for the complete title, all authors, revision date and abstract. Department of Statistical Sciences faculty and graduate students can submit recent paper information to dssgoodnews@cornell.edu for inclusion here.

Asymptotic total variation tests for copulas

Jean-David Fermanian, Dragan Radulović, and Marten Wegkamp

Received: November 2012

Revised: April 2014

First available in Project Euclid: 27 May 2015

#### Abstract

We propose a new platform of goodness-of-fit tests for copulas, based on empirical copula processes and nonparametric bootstrap counterparts. The standard Kolmogorov–Smirnov type test for copulas that takes the supremum of the empirical copula process indexed by orthants is extended by test statistics based on the empirical copula process indexed by families of Ln disjoint boxes, with Ln slowly tending to infinity. Although the underlying empirical process does not converge, the critical values of our new test statistics can be consistently estimated by nonparametric bootstrap techniques, under simple or composite null assumptions. We implemented a particular example of these tests and our simulations confirm that the power of the new procedure is oftentimes higher than the power of the standard Kolmogorov–Smirnov or the Cramér–von Mises tests for copulas.

Permanent link to this document

http://projecteuclid.org/euclid.bj/1432732042

Digital Object Identifier

doi:10.3150/14-BEJ632

Mathematical Reviews number (MathSciNet)

MR3352066

Citation

Fermanian, Jean-David; Radulović, Dragan; Wegkamp, Marten. Asymptotic total variation tests for copulas. Bernoulli 21 (2015), no. 3, 1911--1945. doi:10.3150/14-BEJ632. http://projecteuclid.org/euclid.bj/1432732042.

ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers

Jishnu Das, Kaitlyn M Gayvert, Florentina Bunea, Marten H Wegkamp and Haiyuan Yu

Published in BMC Genomics on April 3, 2015

**Abstract**

Background

With the explosion of genomic data over the last decade, there has been a tremendous amount of effort to understand the molecular basis of cancer using informatics approaches. However, this has proven to be extremely difficult primarily because of the varied etiology and vast genetic heterogeneity of different cancers and even within the same cancer. One particularly challenging problem is to predict prognostic outcome of the disease for different patients.

Results

Here, we present ENCAPP, an elastic-net-based approach that combines the reference human protein interactome network with gene expression data to accurately predict prognosis for different human cancers. Our method identifies functional modules that are differentially expressed between patients with good and bad prognosis and uses these to fit a regression model that can be used to predict prognosis for breast, colon, rectal, and ovarian cancers. Using this model, ENCAPP can also identify prognostic biomarkers with a high degree of confidence, which can be used to generate downstream mechanistic and therapeutic insights.

Conclusion

ENCAPP is a robust method that can accurately predict prognostic outcome and identify biomarkers for different human cancers.

Access link: http://www.biomedcentral.com/1471-2164/16/263

Adapted Variational Bayes for Functional Data Registration, Smoothing, and Prediction

Cecilia Earls, Giles Hooker(Submitted on 2 Feb 2015)

We propose a model for functional data registration that compares favorably to the best methods of functional data registration currently available. It also extends current inferential capabilities for unregistered data by providing a flexible probabilistic framework that 1) allows for functional prediction in the context of registration and 2) can be adapted to include smoothing and registration in one model. The proposed inferential framework is a Bayesian hierarchical model where the registered functions are modeled as Gaussian processes. To address the computational demands of inference in high-dimensional Bayesian models, we propose an adapted form of the variational Bayes algorithm for approximate inference that performs similarly to MCMC sampling methods for well-defined problems. The efficiency of the adapted variational Bayes (AVB) algorithm allows variability in a predicted registered, warping, and unregistered function to be depicted separately via bootstrapping. Temperature data related to the el-ni\~no phenomenon is used to demonstrate the unique inferential capabilities for prediction provided by this model.

Subjects: Methodology (stat.ME)Cite as: arXiv:1502.00552 [stat.ME] (or arXiv:1502.00552v1 [stat.ME] for this version)

Combining Functional Data Registration and Factor Analysis

Cecilia Earls, Giles Hooker(Submitted on 2 Feb 2015)

We extend the definition of functional data registration to encompass a larger class of registered functions. In contrast to traditional registration models, we allow for registered functions that have more than one primary direction of variation. The proposed Bayesian hierarchical model simultaneously registers the observed functions and estimates the two primary factors that characterize variation in the registered functions. Each registered function is assumed to be predominantly composed of a linear combination of these two primary factors, and the function-specific weights for each observation are estimated within the registration model. We show how these estimated weights can easily be used to classify functions after registration using both simulated data and a juggling data set.

Subjects: Methodology (stat.ME)Cite as: arXiv:1502.00587 [stat.ME] (or arXiv:1502.00587v1 [stat.ME] for this version)

Hierarchical Vector Autoregression

William B. Nicholson, Jacob Bien, David S. Matteson(Submitted on 17 Dec 2014)

Vector autoregression (VAR) is a fundamental tool for modeling the joint dynamics of multivariate time series. However, as the number of component series is increased, the VAR model quickly becomes overparameterized, making reliable estimation difficult and impeding its adoption as a forecasting tool in high dimensional settings. A number of authors have sought to address this issue by incorporating regularized approaches, such as the lasso, that impose sparse or low-rank structures on the estimated coefficient parameters of the VAR. More traditional approaches attempt to address overparameterization by selecting a low lag order, based on the assumption that dynamic dependence among components is short-range. However, these methods typically assume a single, universal lag order that applies across all components, unnecessarily constraining the dynamic relationship between the components and impeding forecast performance. The lasso-based approaches are more flexible but do not incorporate the notion of lag order selection. We propose a new class of regularized VAR models, called hierarchical vector autoregression (HVAR), that embed the notion of lag selection into a convex regularizer. The key convex modeling tool is a group lasso with nested groups which ensure the sparsity pattern of autoregressive lag coefficients honors the ordered structure inherent to VAR. We provide computationally efficient algorithms for solving HVAR problems that can be parallelized across the components. A simulation study shows the improved performance in forecasting and lag order selection over previous approaches, and a macroeconomic application further highlights forecasting improvements as well as the convenient, interpretable output of a HVAR model.

Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)Cite as: arXiv:1412.5250 [stat.ME] (or arXiv:1412.5250v1 [stat.ME] for this version)

Functional Principal Components Analysis of Spatially Correlated Data

Chong Liu, Surajit Ray, Giles Hooker(Submitted on 17 Nov 2014)

This paper focuses on the analysis of spatially correlated functional data. The between-curve correlation is modeled by correlating functional principal component scores of the functional data. We propose a Spatial Principal Analysis by Conditional Expectation framework to explicitly estimate spatial correlations and reconstruct individual curves. This approach works even when the observed data per curve are sparse. Assuming spatial stationarity, empirical spatial correlations are calculated as the ratio of eigenvalues of the smoothed covariance surface Cov(Xi(s),Xi(t)) and cross-covariance surface Cov(Xi(s),Xj(t)) at locations indexed by i and j. Then a anisotropy Mat\'ern spatial correlation model is fit to empirical correlations. Finally, principal component scores are estimated to reconstruct the sparsely observed curves. This framework can naturally accommodate arbitrary covariance structures, but there is an enormous reduction in computation if one can assume the separability of temporal and spatial components. We propose hypothesis tests to examine the separability as well as the isotropy effect of spatial correlation. Simulation studies and applications of empirical data show improvements in the curve reconstruction using our framework over the method where curves are assumed to be independent. In addition, we show that the asymptotic properties of estimates in uncorrelated case still hold in our case if 'mild' spatial correlation is assumed.

Subjects: Statistics Theory (math.ST)Cite as: arXiv:1411.4681 [math.ST] (or arXiv:1411.4681v1 [math.ST] for this version)

A Novel Test for Additivity in Supervised Ensemble Learners

Lucas Mentch, Giles Hooker(Submitted on 7 Jun 2014 (v1), last revised 11 Nov 2014 (this version, v2))

Additive models remain popular statistical tools due to their ease of interpretation and as a result, hypothesis tests for additivity have been developed to assess the appropriateness of these models. However, as data grows in size and complexity, learning algorithms continue to gain popularity due to their exceptional predictive performance. Due to the black-box nature of these learning methods, the increase in predictive power is assumed to come at the cost of interpretability and inference. However, recent work suggests that many popular learning techniques, such as bagged trees and random forests, have desirable asymptotic properties which allow for formal statistical inference when base learners are built with proper subsamples. This work extends hypothesis tests previously developed and demonstrates that by enforcing a grid structure on an appropriate test set, we may perform formal hypothesis tests for additivity among features. We develop notions of total and partial additivity and demonstrate that both tests can be carried out at no additional computational cost. We also suggest a new testing procedure based on random projections that allows for testing on larger grids, even when the grid size is larger than that of the training set. Simulations and demonstrations on real data are provided.

Subjects: Machine Learning (stat.ML); Applications (stat.AP)Cite as: arXiv:1406.1845 [stat.ML] (or arXiv:1406.1845v2 [stat.ML] for this version)

Simultaneous sparse estimation of canonical vectors in the p>>N setting

Irina Gaynanova, James G. Booth, Martin T. Wells(Submitted on 24 Mar 2014 (v1), last revised 4 Nov 2014 (this version, v3))

This article considers the problem of sparse estimation of canonical vectors in linear discriminant analysis when p≫N. Several methods have been proposed in the literature that estimate one canonical vector in the two-group case. However, G−1 canonical vectors can be considered if the number of groups is G. In the multi-group context, it is common to estimate canonical vectors in a sequential fashion. Moreover, separate prior estimation of the covariance structure is often required. We propose a novel methodology for direct estimation of canonical vectors. In contrast to existing techniques, the proposed method estimates all canonical vectors at once, performs variable selection across all the vectors and comes with theoretical guarantees on the variable selection and classification consistency. First, we highlight the fact that in the N>p setting the canonical vectors can be expressed in a closed form up to an orthogonal transformation. Secondly, we propose an extension of this form to the p≫N setting and achieve feature selection by using a group penalty. The resulting optimization problem is convex and can be solved using a block-coordinate descent algorithm. The practical performance of the method is evaluated through simulation studies as well as real data applications.

Comments: Added classification consistencySubjects: Methodology (stat.ME); Machine Learning (stat.ML)Cite as: arXiv:1403.6095 [stat.ME] (or arXiv:1403.6095v3 [stat.ME] for this version)

Topology Adaptive Graph Estimation in High Dimensions

Johannes Lederer, Christian Müller(Submitted on 27 Oct 2014)

We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compare GTREX with standard methods on a new simulation set-up that is designed to assess accurately the strengths and shortcomings of different methods. These simulations show that a neighborhood selection scheme based on Lasso and an optimal (in practice unknown) tuning parameter outperforms other standard methods over a large spectrum of scenarios. Moreover, we show that GTREX can rival this scheme and, therefore, can provide competitive graph estimation without the need for tuning parameter calibration.

Subjects: Machine Learning (stat.ML); Methodology (stat.ME)Cite as: arXiv:1410.7279 [stat.ML] (or arXiv:1410.7279v1 [stat.ML] for this version)

Penalized versus constrained generalized eigenvalue problems

Irina Gaynanova, James Booth, Martin T. Wells(Submitted on 22 Oct 2014 (v1), last revised 23 Oct 2014 (this version, v2))

We investigate the difference between using an ℓ1 penalty versus an ℓ1 constraint in generalized eigenvalue problems, such as principal component analysis and discriminant analysis. Our main finding is that an ℓ1 penalty may fail to provide very sparse solutions; a severe disadvantage for variable selection that can be remedied by using an ℓ1 constraint. Our claims are supported both by empirical evidence and theoretical analysis. Finally, we illustrate the advantages of an ℓ1 constraint in the context of discriminant analysis.

Comments: 13 pages, 4 figuresSubjects: Computation (stat.CO); Machine Learning (stat.ML)Cite as: arXiv:1410.6131 [stat.CO] (or arXiv:1410.6131v2 [stat.CO] for this version)

Optimal two-step prediction in regression

Didier Chételat, Johannes Lederer, Joseph Salmon(Submitted on 18 Oct 2014 (v1), last revised 6 Nov 2014 (this version, v2))

High-dimensional prediction typically comprises variable selection followed by least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso and thresholded ridge regression, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and does not provide theoretical guarantees for high-dimensional prediction. In this paper, we introduce an alternative scheme that is computationally more efficient than cross-validation and, in addition, provides optimal finite sample guarantees. While our scheme allows for a range of variable selection procedures, we provide explicit numerical and theoretical results for least-squares refitting on variables selected by the lasso and by thresholded ridge regression. These results demonstrate that our calibration scheme can outperform cross-validation in terms of speed, accuracy, and theoretical guarantees.

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)Cite as: arXiv:1410.5014 [stat.ME] (or arXiv:1410.5014v2 [stat.ME] for this version)

Tuning Lasso for sup-norm optimality

Michaël Chichignoud, Johannes Lederer, Martin Wainwright(Submitted on 1 Oct 2014)

We introduce novel schemes for tuning parameter calibration in high-dimensional linear regression with Lasso. These calibration schemes are inspired by Lepski's method for bandwidth adaptation in non-parametric regression and are the first calibration schemes that are equipped with both theoretical guarantees and fast algorithms. In particular, we develop optimal finite sample guarantees for sup-norm performance and give algorithms that consist of simple tests along a single Lasso path. Moreover, we show that false positives can be safely reduced without increasing the number of false negatives. Applying Lasso to synthetic data and to real data, we finally demonstrate that the novel schemes can rival standard schemes such as Cross-Validation in speed as well as in sup-norm and variable selection performance.

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)Cite as: arXiv:1410.0247 [stat.ME] (or arXiv:1410.0247v1 [stat.ME] for this version)

Compute Less to Get More: Using ORC to Improve Sparse Filtering

Johannes Lederer, Sergio Guadarrama(Submitted on 16 Sep 2014)

Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering in image classification pipelines to spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests stopping Sparse Filtering early. We therefore introduce the Optimal Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We show that this stopping criterion is related with pre-processing procedures such as Statistical Whitening and that it can make image classification with Sparse Filtering considerably faster and more accurate.

Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)Cite as: arXiv:1409.4689 [cs.CV] (or arXiv:1409.4689v1 [cs.CV] for this version)

Maximal Autocorrelation Functions in Functional Data Analysis

Giles Hooker, Steven Roberts(Submitted on 17 Jul 2014)

This paper proposes a new factor rotation for the context of functional principal components analysis. This rotation seeks to re-represent a functional subspace in terms of directions of decreasing smoothness as represented by a generalized smoothing metric. The rotation can be implemented simply and we show on two examples that this rotation can improve the interpretability of the leading components.

Comments: 10 pages 2 figuresSubjects: Methodology (stat.ME)Cite as: arXiv:1407.4578 [stat.ME] (or arXiv:1407.4578v1 [stat.ME] for this version)

Sparse Partially Linear Additive Models

Yin Lou, Jacob Bien, Rich Caruana, Johannes Gehrke

(Submitted on 17 Jul 2014)

The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing them to have either a linear or nonlinear effect on the response. However, the assignment of features to the linear and nonlinear groups is typically assumed known. Thus, to make a GPLAM a viable approach in situations in which little is known apriori about the features, one must overcome two primary model selection challenges: deciding which features to include in the model and determining which features to treat nonlinearly. We introduce sparse partially linear additive models (SPLAMs), which combine model fitting and both of these model selection challenges into a single convex optimization problem. SPLAM provides a bridge between the Lasso and sparse additive models. Through a statistical oracle inequality and thorough simulation, we demonstrate that SPLAM can outperform other methods across a broad spectrum of statistical regimes, including the high-dimensional (p≫N) setting. We develop efficient algorithms that are applied to real data sets with half a million samples and over 45,000 features with excellent predictive performance.

Subjects: Methodology (stat.ME); Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:1407.4729 [stat.ME]

(or arXiv:1407.4729v1 [stat.ME] for this version)

Truncated Linear Models for Functional Data

Peter Hall, Giles Hooker(Submitted on 30 Jun 2014)

A conventional linear model for functional data involves expressing a response variable Y in terms of the explanatory function X(t), via the model: Y=a+∫Ib(t)X(t)dt+error, where a is a scalar, b is an unknown function and I=[0,α] is a compact interval. However, in some problems the support of bor X, I1 say, is a proper and unknown subset of I, and is a quantity of particular practical interest. In this paper, motivated by a real-data example involving particulate emissions, we develop methods for estimating I1. We give particular emphasis to the case I1=[0,θ], where θ∈(0,α], and suggest two methods for estimating a, b and θ jointly; we introduce techniques for selecting tuning parameters; and we explore properties of our methodology using both simulation and the real-data example mentioned above. Additionally, we derive theoretical properties of the methodology, and discuss implications of the theory. Our theoretical arguments give particular emphasis to the problem of identifiability.

Subjects: Methodology (stat.ME)Cite as: arXiv:1406.7732 [stat.ME] (or arXiv:1406.7732v1 [stat.ME] for this version)

Convex Banding of the Covariance Matrix

Jacob Bien, Florentina Bunea, Luo Xiao(Submitted on 23 May 2014)

We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings.

Subjects: Statistics Theory (math.ST); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)Cite as: arXiv:1405.6210 [math.ST] (or arXiv:1405.6210v1 [math.ST] for this version)

Ensemble Trees and CLTs: Statistical Inference for Supervised Learning

Lucas Mentch, Giles Hooker(Submitted on 25 Apr 2014)

This paper develops formal statistical inference procedures for machine learning ensemble methods. Ensemble methods based on bootstrapping, such as bagging and random forests, have improved the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider predicting by averaging over trees built on subsamples of the training set and demonstrate that the resulting estimator takes the form of a U-statistic. As such, predictions for individual feature vectors are asymptotically normal, allowing for confidence intervals to accompany predictions. In practice, a subset of subsamples is used for computational speed; here our estimators take the form of incomplete U-statistics and equivalent results are derived. We further demonstrate that this setup provides a framework for testing the significance of features. Moreover, the internal estimation method we develop allows us to estimate the variance parameters and perform these inference procedures at no additional computational cost. Simulations and illustrations on a real dataset are provided.

Subjects: Machine Learning (stat.ML); Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)Cite as: arXiv:1404.6473 [stat.ML] (or arXiv:1404.6473v1 [stat.ML] for this version)

Consistency, Efficiency and Robustness of Conditional Disparity Methods

Giles Hooker(Submitted on 14 Jul 2013 (v1), last revised 16 Apr 2014 (this version, v2))

This paper considers extensions of minimum-disparity estimators to the problem of estimating parameters in a regession model that is conditionally specified; ie where the model gives the distribution of a response y conditional on covariates x but does not specify the distribution of x. The consistency and asymptotic normality of such estimators is demonstrated for a broad class of models that incorporates both discrete and continuous response and covariate values and is based on a wide set of choices for kernel-based conditional density estimation. It also establishes the robustness of these estimators for a wide class of disparities. As has been observed in Tamura and Boos (1986), kernel density estimates of more than one dimension can result in an asymptotic bias that is larger that n−1/2 in minimum disparity estimators and we characterize a similar bias in our results and show that in specialized cases it can be eliminated by appropriately centering the kernel density estimate. In order to demonstrate these results, we establish a set of L1-consistency results for kernel-based estimates of centered conditional densities.

Subjects: Statistics Theory (math.ST)Cite as: arXiv:1307.3730 [math.ST] (or arXiv:1307.3730v2 [math.ST] for this version)

On the Prediction Performance of the Lasso

Arnak S. Dalalyan, Mohamed Hebiri, Johannes Lederer(Submitted on 7 Feb 2014)

Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter leads to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. For the illustration of our approach with an important application, we deduce nearly optimal rates for the least-squares estimator with total variation penalty.

Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)Cite as: arXiv:1402.1700 [math.ST] (or arXiv:1402.1700v1 [math.ST] for this version)

On the theoretic and practical merits of the banding estimator for large covariance matrices

Luo Xiao, Florentina Bunea(Submitted on 4 Feb 2014)

This paper considers the banding estimator proposed in Bickel and Levina (2008) for estimation of large covariance matrices. We prove that the banding estimator achieves rate-optimality under the operator norm, for a class of approximately banded covariance matrices, improving the existing results in Bickel and Levina (2008). In addition, we propose a Stein's unbiased risk estimate (Sure)-type approach for selecting the bandwidth for the banding estimator. Simulations indicate that the Sure-tuned banding estimator outperforms competing estimators.

Comments: 19 pages, 1 figureSubjects: Statistics Theory (math.ST); Methodology (stat.ME)Cite as: arXiv:1402.0844 [math.ST] (or arXiv:1402.0844v1 [math.ST] for this version)

Goodness of Fit in Nonlinear Dynamics: Mis-specified Rates or Mis-specified States?

Giles Hooker, Stephen P. Ellner(Submitted on 2 Dec 2013)

This paper introduces tests to uncover the nature of lack of fit in ordinary differential equation models (ODEs) proposed for data. We present a hierarchy of three possible sources of lack of fit: unaccounted-for stochastic variation, mis-specification of functional forms in the rate equations, and missing dynamical variables in the description of the system. We represent lack of fit by allowing some parameters to vary over time, and propose generic testing procedures that do not rely on specific alternative models. Our hypotheses are expressed in terms of nonparametric relationships among latent variables, and the tests are carried out through a combined residual bootstrap and permutation methods. We demonstrate the effectiveness of these tests on simulated data, and on real data from laboratory ecological experiments and electro-cardiogram data.

Comments: 20 pages, 5 figuresSubjects: Methodology (stat.ME)Cite as: arXiv:1312.0294 [stat.ME] (or arXiv:1312.0294v1 [stat.ME] for this version)

Restricted Likelihood Ratio Tests for Linearity in Scalar-on-Function Regression

Mathew W. McLean, Giles Hooker, David Ruppert(Submitted on 22 Oct 2013)

We propose a procedure for testing the linearity of a scalar-on-function regression relationship. To do so, we use the functional generalized additive model (FGAM), a recently developed extension of the functional linear model. For a functional covariate X(t), the FGAM models the mean response as the integral with respect to t of F{X(t),t} where F is an unknown bivariate function. The FGAM can be viewed as the natural functional extension of generalized additive models. We show how the functional linear model can be represented as a simple mixed model nested within the FGAM. Using this representation, we then consider restricted likelihood ratio tests for zero variance components in mixed models to test the null hypothesis that the functional linear model holds. The methods are general and can also be applied to testing for interactions in a multivariate additive model or for testing for no effect in the functional linear model. The performance of the proposed tests is assessed on simulated data and in an application to measuring diesel truck emissions, where strong evidence of nonlinearities in the relationship between the functional predictor and the response are found.

Subjects: Methodology (stat.ME)DOI: 10.1007/s11222-014-9473-1Cite as: arXiv:1310.5811 [stat.ME]

Hellinger Distance and Bayesian Non-Parametrics: Hierarchical Models for Robust and Efficient Bayesian Inference

Yuefeng Wu, Giles Hooker(Submitted on 26 Sep 2013)

This paper introduces a hierarchical framework to incorporate Hellinger distance methods into Bayesian analysis. We propose to modify a prior over non-parametric densities with the exponential of twice the Hellinger distance between a candidate and a parametric density. By incorporating a prior over the parameters of the second density, we arrive at a hierarchical model in which a non-parametric model is placed between parameters and the data. The parameters of the family can then be estimated as hyperparameters in the model. In frequentist estimation, minimizing the Hellinger distance between a kernel density estimate and a parametric family has been shown to produce estimators that are both robust to outliers and statistically efficient when the parametric model is correct. In this paper, we demonstrate that the same results are applicable when a non-parametric Bayes density estimate replaces the kernel density estimate. We then demonstrate that robustness and efficiency also hold for the proposed hierarchical model. The finite-sample behavior of the resulting estimates is investigated by simulation and on real world data.

Subjects: Methodology (stat.ME)MSC classes: 62F35, 62F12, 62G07Cite as: arXiv:1309.6906 [stat.ME] (or arXiv:1309.6906v1 [stat.ME] for this version)

On the Identifiability of the Functional Convolution Model

Giles Hooker(Submitted on 9 Sep 2013)

This report details conditions under which the Functional Convolution Model described in \citet{AHG13} can be identified from Ordinary Least Squares estimates without either dimension reduction or smoothing penalties. We demonstrate that if the covariate functions are not spanned by the space of solutions to linear differential equations, the functional coefficients in the model are uniquely determined in the Sobolev space of functions with absolutely continuous second derivatives.

Subjects: Statistics Theory (math.ST)Cite as: arXiv:1309.2178 [math.ST] (or arXiv:1309.2178v1 [math.ST] for this version)

A loss function approach to model specification testing and its relative efficiency

Yongmiao Hong, Yoon-Jin Lee(Submitted on 20 Jun 2013)

The generalized likelihood ratio (GLR) test proposed by Fan, Zhang and Zhang [Ann. Statist. 29 (2001) 153-193] and Fan and Yao [Nonlinear Time Series: Nonparametric and Parametric Methods (2003) Springer] is a generally applicable nonparametric inference procedure. In this paper, we show that although it inherits many advantages of the parametric maximum likelihood ratio (LR) test, the GLR test does not have the optimal power property. We propose a generally applicable test based on loss functions, which measure discrepancies between the null and nonparametric alternative models and are more relevant to decision-making under uncertainty. The new test is asymptotically more powerful than the GLR test in terms of Pitman's efficiency criterion. This efficiency gain holds no matter what smoothing parameter and kernel function are used and even when the true likelihood function is available for the GLR test.

Comments: Published in at this http URL the Annals of Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)Subjects: Statistics Theory (math.ST)Journal reference: Annals of Statistics 2013, Vol. 41, No. 3, 1166-1203DOI: 10.1214/13-AOS1099Report number: IMS-AOS-AOS1099Cite as: arXiv:1306.4864 [math.ST] (or arXiv:1306.4864v1 [math.ST] for this version)

A lasso for hierarchical interactions

Jacob Bien, Jonathan Taylor, Robert Tibshirani(Submitted on 22 May 2012 (v1), last revised 19 Jun 2013 (this version, v3))

We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity - the number of nonzero coefficients - and practical sparsity - the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.

Comments: Published in at this http URL the Annals of Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)Journal reference: Annals of Statistics 2013, Vol. 41, No. 3, 1111-1141DOI: 10.1214/13-AOS1096Report number: IMS-AOS-AOS1096Cite as: arXiv:1205.5050 [stat.ME] (or arXiv:1205.5050v3 [stat.ME] for this version)

Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions

Johannes Lederer(Submitted on 1 Jun 2013)

Least-squares refitting is widely used in high dimensional regression to reduce the prediction bias of l1-penalized estimators (e.g., Lasso and Square-Root Lasso). We present theoretical and numerical results that provide new insights into the benefits and pitfalls of least-squares refitting. In particular, we consider both prediction and estimation, and we pay close attention to the effects of correlations in the design matrices of linear regression models, since these correlations - although often neglected - are crucial in the context of linear regression, especially in high dimensional contexts. First, we demonstrate that the benefit of least-squares refitting strongly depends on the setting and task under consideration: least-squares refitting can be beneficial even for settings with highly correlated design matrices but is not advisable for all settings, and least-squares refitting can be beneficial for estimation but performs better for prediction. Finally, we introduce a criterion that indicates whether least-squares refitting is advisable for a specific setting and task under consideration, and we conduct a thorough simulation study involving the Lasso to show the usefulness of this criterion.

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)Cite as: arXiv:1306.0113 [stat.ME] (or arXiv:1306.0113v1 [stat.ME] for this version)

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Marten Wegkamp, Yue Zhao(Submitted on 28 May 2013 (v1), last revised 6 Jan 2014 (this version, v2))

We study the adaptive estimation of copula correlation matrix Σ for elliptical copulas. In this context, the correlations are connected to Kendall's tau through a sine function transformation. Hence, a natural estimate for Σ is the plug-in estimator Σˆ with Kendall's tau statistic. We first obtain a sharp bound for the operator norm of Σˆ−Σ. Then, we study a factor model for Σ, for which we propose a refined estimatorΣ˜ by fitting a low-rank matrix plus a diagonal matrix to Σˆ using least squares with a nuclear norm penalty on the low-rank matrix. The bound for the operator norm of Σˆ−Σ serves to scale the penalty term, and we obtain finite sample oracle inequalities for Σ˜. We also consider an elementary factor model of Σ, for which we propose closed-form estimators. We provide data-driven versions for all our estimation procedures and performance bounds.

Subjects: Machine Learning (stat.ML)Cite as: arXiv:1305.6526 [stat.ML] (or arXiv:1305.6526v2 [stat.ML] for this version)

Bayesian Functional Generalized Additive Models with Sparsely Observed Covariates

Mathew W. McLean, Fabian Scheipl, Giles Hooker, Sonja Greven, David Ruppert(Submitted on 15 May 2013)

The functional generalized additive model (FGAM) was recently proposed in McLean et al. (2012) as a more flexible alternative to the common functional linear model (FLM) for regressing a scalar on functional covariates. In this paper, we develop a Bayesian version of FGAM for the case of Gaussian errors with identity link function. Our approach allows the functional covariates to be sparsely observed and measured with error, whereas the estimation procedure of McLean et al. (2012) required that they be noiselessly observed on a regular grid. We consider both Monte Carlo and variational Bayes methods for fitting the FGAM with sparsely observed covariates. Due to the complicated form of the model posterior distribution and full conditional distributions, standard Monte Carlo and variational Bayes algorithms cannot be used. The strategies we use to handle the updating of parameters without closed-form full conditionals should be of independent interest to applied Bayesian statisticians working with nonconjugate models. Our numerical studies demonstrate the benefits of our algorithms over a two-step approach of first recovering the complete trajectories using standard techniques and then fitting a functional regression model. In a real data analysis, our methods are applied to forecasting closing price for items up for auction on the online auction website eBay.

Comments: 36 pages, 5 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)Cite as: arXiv:1305.3585 [stat.ME] (or arXiv:1305.3585v1 [stat.ME] for this version)

The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms

Florentina Bunea, Johannes Lederer, Yiyuan She(Submitted on 1 Feb 2013 (v1), last revised 31 Jul 2013 (this version, v2))

We introduce and study the Group Square-Root Lasso (GSRL) method for estimation in high dimensional sparse regression models with group structure. The new estimator minimizes the square root of the residual sum of squares plus a penalty term proportional to the sum of the Euclidean norms of groups of the regression parameter vector. The net advantage of the method over the existing Group Lasso (GL)-type procedures consists in the form of the proportionality factor used in the penalty term, which for GSRL is independent of the variance of the error terms. This is of crucial importance in models with more parameters than the sample size, when estimating the variance of the noise becomes as difficult as the original problem. We show that the GSRL estimator adapts to the unknown sparsity of the regression vector, and has the same optimal estimation and prediction accuracy as the GL estimators, under the same minimal conditions on the model. This extends the results recently established for the Square-Root Lasso, for sparse regression without group structure. Moreover, as a new type of result for Square-Root Lasso methods, with or without groups, we study correct pattern recovery, and show that it can be achieved under conditions similar to those needed by the Lasso or Group-Lasso-type methods, but with a simplified tuning strategy. We implement our method via a new algorithm, with proved convergence properties, which, unlike existing methods, scales well with the dimension of the problem. Our simulation studies support strongly our theoretical findings.

Subjects: Statistics Theory (math.ST); Computation (stat.CO)Cite as: arXiv:1302.0261 [math.ST] (or arXiv:1302.0261v2 [math.ST] for this version)

Supervised Classification Using Sparse Fisher's LDA

Irina Gaynanova, James G. Booth, Martin T. Wells(Submitted on 21 Jan 2013 (v1), last revised 16 Sep 2014 (this version, v2))

It is well known that in a supervised classification setting when the number of features is smaller than the number of observations, Fisher's linear discriminant rule is asymptotically Bayes. However, there are numerous modern applications where classification is needed in the high-dimensional setting. Naive implementation of Fisher's rule in this case fails to provide good results because the sample covariance matrix is singular. Moreover, by constructing a classifier that relies on all features the interpretation of the results is challenging. Our goal is to provide robust classification that relies only on a small subset of important features and accounts for the underlying correlation structure. We apply a lasso-type penalty to the discriminant vector to ensure sparsity of the solution and use a shrinkage type estimator for the covariance matrix. The resulting optimization problem is solved using an iterative coordinate ascent algorithm. Furthermore, we analyze the effect of nonconvexity on the sparsity level of the solution and highlight the difference between the penalized and the constrained versions of the problem. The simulation results show that the proposed method performs favorably in comparison to alternatives. The method is used to classify leukemia patients based on DNA methylation features.

Subjects: Machine Learning (stat.ML); Computation (stat.CO)Cite as: arXiv:1301.4976 [stat.ML] (or arXiv:1301.4976v2 [stat.ML] for this version)