Optimal two-step prediction in regression, Didier Chételat, Johannes Lederer, Joseph Salmon, 10-18-14

Optimal two-step prediction in regression

Didier Chételat, Johannes Lederer, Joseph Salmon(Submitted on 18 Oct 2014 (v1), last revised 6 Nov 2014 (this version, v2))

High-dimensional prediction typically comprises variable selection followed by least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso and thresholded ridge regression, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and does not provide theoretical guarantees for high-dimensional prediction. In this paper, we introduce an alternative scheme that is computationally more efficient than cross-validation and, in addition, provides optimal finite sample guarantees. While our scheme allows for a range of variable selection procedures, we provide explicit numerical and theoretical results for least-squares refitting on variables selected by the lasso and by thresholded ridge regression. These results demonstrate that our calibration scheme can outperform cross-validation in terms of speed, accuracy, and theoretical guarantees.

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)Cite as: arXiv:1410.5014 [stat.ME]  (or arXiv:1410.5014v2 [stat.ME] for this version)