學術演講

TIGP (SNHCC) – Bandit Learning: Optimality, Scalability, and Reneging

講者謝秉均博士 (國立交通大學資訊工程系)
邀請人：TIGP SNHCC Program
時間2019-12-25 (Wed.) 14:00 ~ 16:00
地點資訊所新館106演講廳

摘要

Bandit learning is a classic framework that captures the exploration-exploitation dilemma. Despite the existing variety of bandit algorithms, there is still an unsatisfactory trade-off between regret performance and computational complexity. In this talk, we will present a new family of bandit algorithms, that are formulated in a general way based on the biased maximum likelihood estimation (BMLE) method. We prove that the BMLE algorithm achieves a logarithmic finite-time regret bound and hence attains order-optimality. Through extensive simulations, we demonstrate that the proposed algorithm achieves regret performance comparable to the best of several state-of-the-art baseline methods while having a significant computational advantage in comparison to other best-performing methods. Lastly, we will discuss how bandit learning can be extended to capture reneging risk and heteroscedasticity.

中央研究院資訊科學研究所

活動訊息

學術演講

TIGP (SNHCC) – Bandit Learning: Optimality, Scalability, and Reneging

摘要