Bandit A/B testing via Stability: A Tale of Two Algorithms

Time: 4:30-5:30 p.m.
Date: Wednesday, April 8, 2026
Speaker: Koulik Khamaru
Title: Bandit A/B testing via Stability: A Tale of Two Algorithms

A color photo of a man wearing a red jacket along a tree-lined road.

Abstract: Modern decision-making increasingly relies on adaptive experimentation, particularly in settings such as A/B testing, multi-armed bandits, and reinforcement learning. While these methods enable more efficient learning and allocation of resources, they fundamentally challenge traditional statistical inference. Classical i.i.d.-based tools often break down under adaptive data collection, resulting in biased estimators and misleading confidence intervals.

This talk offers an overview of statistical inference in these adaptive environments through the concept of stability—originally formulated by Lai and Wei (1982). A key advantage of stability is that it allows us to recover classical inferential guarantees—such as asymptotic normality and valid confidence intervals—even when the data arise from highly adaptive algorithms.

Next, we discuss the stability properties of two widely used algorithms: the Upper Confidence Bound (UCB) and Thompson Sampling. We argue that while UCB is stable, Thompson Sampling is not. Finally, we propose a modification of Thompson Sampling that regains stability while maintaining near-optimal regret. Key illustrations include quantitative central limit theorems of empirical mean in stochastic bandits and least square estimators in contextual bandits. We also present a new proof technique for analyzing regret and establishing stability, which we believe has broader applicability and may be of independent interest.

The talk is based on a series of joint works with Cunhui Zhang, Qiyang Han, Budhaditya Halder and Subhayan Pan.

Bio: Koulik Khamaru is an Assistant Professor in the Department of Statistics at Rutgers University. His research lies at the intersection of statistics and machine learning, with interests in Gaussian mixture models, convex and nonconvex optimization, and reinforcement learning. His recent work focuses on statistical inference for data collected through sequential reinforcement learning algorithms.

Before joining Rutgers, he earned his PhD in Statistics from University of California Berkeley under the guidance of Prof. Martin Wainwright and Prof. Michael Jordan.