TIGP (SNHCC) – Bandit Learning: Optimality, Scalability, and Reneging
- LecturerDr. Ping-Chun Hsieh (Department of Computer Science,National Chiao Tung University)
Host: TIGP SNHCC Program - Time2019-12-25 (Wed.) 14:00 ~ 16:00
- LocationAuditorium106 at IIS new Building
Abstract
Bandit learning is a classic framework that captures the exploration-exploitation dilemma. Despite the existing variety of bandit algorithms, there is still an unsatisfactory trade-off between regret performance and computational complexity. In this talk, we will present a new family of bandit algorithms, that are formulated in a general way based on the biased maximum likelihood estimation (BMLE) method. We prove that the BMLE algorithm achieves a logarithmic finite-time regret bound and hence attains order-optimality. Through extensive simulations, we demonstrate that the proposed algorithm achieves regret performance comparable to the best of several state-of-the-art baseline methods while having a significant computational advantage in comparison to other best-performing methods. Lastly, we will discuss how bandit learning can be extended to capture reneging risk and heteroscedasticity.