TIGP (SNHCC) – Bandit Learning: Optimality, Scalability, and Reneging
- 講者謝秉均 博士 (國立交通大學資訊工程系)
邀請人:TIGP SNHCC Program - 時間2019-12-25 (Wed.) 14:00 ~ 16:00
- 地點資訊所新館106演講廳
摘要
Bandit learning is a classic framework that captures the exploration-exploitation dilemma. Despite the existing variety of bandit algorithms, there is still an unsatisfactory trade-off between regret performance and computational complexity. In this talk, we will present a new family of bandit algorithms, that are formulated in a general way based on the biased maximum likelihood estimation (BMLE) method. We prove that the BMLE algorithm achieves a logarithmic finite-time regret bound and hence attains order-optimality. Through extensive simulations, we demonstrate that the proposed algorithm achieves regret performance comparable to the best of several state-of-the-art baseline methods while having a significant computational advantage in comparison to other best-performing methods. Lastly, we will discuss how bandit learning can be extended to capture reneging risk and heteroscedasticity.