Ramya Korlakai Vinayak is the Dugald C. Jackson assistant professor in the Dept. of ECE and affiliated faculty in the Dept. of Computer Science and the Dept. of Statistics at the UW-Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing, with a focus on preference learning and alignment under heterogeneity, reliable and efficient dataset creation, and human-in-the-loop systems. Her works aim to address theoretical and practical challenges that arise when learning from heterogeneous societal data. Prior to joining UW-Madison, Ramya was a postdoctoral researcher in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. She received her Ph.D. in Electrical Engineering from Caltech. She obtained her Masters from Caltech and Bachelors from IIT Madras. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013-15, and an invited participant at the Rising Stars in EECS workshop in 2019. She is the recipient of NSF CAREER Award in 2023.
Talk: Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences
Abstract: Large pre-trained models trained on internet-scale data are often not ready for safe deployment out-of-the-box. They are heavily fine-tuned and aligned using large quantities of human preference data, usually elicited using pairwise comparisons. While aligning an AI/ML model to human preferences or values, it is important to ask whose preference and values we are aligning it to? The current approaches of alignment are severely limited due to their inherent uniformity assumption. While there is rich literature on learning preferences from human judgements using comparison queries, the models often focus on learning average preference over the population due to the limitations on the amount of data available per individual or on learning an individual's preference using a lot of queries. Furthermore, the knowledge of the metric, i.e., the way humans judge similarity and dissimilarity, is assumed to be known which does not hold in practice. We aim to overcome these limitations by building mathematical foundations for learning from diverse human preferences.
In this talk, I will present, PAL, a personalize-able reward modelling framework for pluralistic alignment, which captures diversity in preferences while also capturing commonalities that can be learned by pooling together data from individuals. I will also discuss some recent theoretical results on per user sample complexity for generalization and fundamental limitations when there are limited pairwise comparisons.
Based on work with Daiwei Chen, Yi Chen, Aniket Rege, Zhi Wang, Geelon So, Greg Canal, Blake Mason, Gokcan Tatli, and Rob Nowak.
References:
- PAL: Pluralistic Alignment Framework for learning from heterogeneous preferences (preprint, 2024)
- One-for-all: Simultaneous metric and preference learning (appeared in Neurips 2022)
- Metric learning via limited pairwise comparisons (appeared in UAI 2024), and
- Learning Populations of Preferences via pairwise comparisons (appeared in AISTATS 2024).