Pratik Patil is a postdoctoral researcher in Statistics at the University of California, Berkeley. He obtained his PhD in Statistics and Machine Learning from Carnegie Mellon University. His research broadly spans a range of topics at the intersection of statistical machine learning, optimization, and information theory. Much of his recent work focuses on the statistical analysis of machine learning methods in the overparameterized regime, such as bagging, sketching, cross-validation, and model tuning, drawing upon tools from statistical physics and random matrix theory. More details can be found at: https://pratikpatil.io/.
Talk: Facets of regularization in overparameterized machine learning
Abstract: Modern machine learning often operates in an overparameterized regime in which the number of parameters far exceeds the number of observations. In this regime, models can exhibit surprising generalization behaviors: (1) Models can overfit with zero training error yet still generalize well (benign overfitting); furthermore, in some cases, even adding and tuning explicit regularization can favor no regularization at all (obligatory overfitting). (2) The generalization error can vary non-monotonically with the model or sample size (double/multiple descent). These behaviors challenge classical notions of overfitting and the role of explicit regularization.
In this talk, I will present theoretical and methodological results related to these behaviors, primarily focusing on the concrete case of ridge regularization. First, I will identify conditions under which the optimal ridge penalty is zero (or even negative) and show that standard techniques such as leave-one-out and generalized cross-validation, when analytically continued, remain uniformly consistent for the generalization error and thus yield the optimal penalty, whether positive, negative, or zero. Second, I will introduce a general framework to mitigate double/multiple descent in the sample size based on subsampling and ensembling and show its intriguing connection to ridge regularization. As an implication of this connection, I will show that the generalization error of optimally tuned ridge regression is monotonic in the sample size (under mild data assumptions) and mitigates double/multiple descent. Key to both parts is the role of implicit regularization, either self-induced by the overparameterized data or externally induced by subsampling and ensembling. Finally, I will briefly mention some extensions and variants beyond ridge regularization.
The talk will feature joint work with the following collaborators (in surname-alphabetical order): Pierre Bellec, Jin-Hong Du, Takuya Koriyama, Arun Kumar Kuchibhotla, Alessandro Rinaldo, Kai Tan, Ryan Tibshirani, Yuting Wei. The corresponding papers (in talk-chronological order) are: optimal ridge landscape (https://pratikpatil.io/papers/ridge-ood.pdf), ridge cross-validation (https://pratikpatil.io/papers/functionals-combined.pdf), risk monotonization (https://pratikpatil.io/papers/risk-monotonization.pdf), ridge equivalences (https://pratikpatil.io/papers/generalized-equivalences.pdf), and extensions and variants (https://pratikpatil.io/papers/cgcv.pdf, https://pratikpatil.io/papers/subagging-asymptotics.pdf).