Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions
Johannes Lederer(Submitted on 1 Jun 2013)
Least-squares refitting is widely used in high dimensional regression to reduce the prediction bias of l1-penalized estimators (e.g., Lasso and Square-Root Lasso). We present theoretical and numerical results that provide new insights into the benefits and pitfalls of least-squares refitting. In particular, we consider both prediction and estimation, and we pay close attention to the effects of correlations in the design matrices of linear regression models, since these correlations - although often neglected - are crucial in the context of linear regression, especially in high dimensional contexts. First, we demonstrate that the benefit of least-squares refitting strongly depends on the setting and task under consideration: least-squares refitting can be beneficial even for settings with highly correlated design matrices but is not advisable for all settings, and least-squares refitting can be beneficial for estimation but performs better for prediction. Finally, we introduce a criterion that indicates whether least-squares refitting is advisable for a specific setting and task under consideration, and we conduct a thorough simulation study involving the Lasso to show the usefulness of this criterion.
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)Cite as: arXiv:1306.0113 [stat.ME] (or arXiv:1306.0113v1 [stat.ME] for this version)