您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

學術演講

:::

TIGP (SNHCC) – Bandit Learning: Optimality, Scalability, and Reneging

  • 講者謝秉均 博士 (國立交通大學資訊工程系)
    邀請人:TIGP SNHCC Program
  • 時間2019-12-25 (Wed.) 14:00 ~ 16:00
  • 地點資訊所新館106演講廳
摘要

Bandit learning is a classic framework that captures the exploration-exploitation dilemma. Despite the existing variety of bandit algorithms, there is still an unsatisfactory trade-off between regret performance and computational complexity. In this talk, we will present a new family of bandit algorithms, that are formulated in a general way based on the biased maximum likelihood estimation (BMLE) method. We prove that the BMLE algorithm achieves a logarithmic finite-time regret bound and hence attains order-optimality. Through extensive simulations, we demonstrate that the proposed algorithm achieves regret performance comparable to the best of several state-of-the-art baseline methods while having a significant computational advantage in comparison to other best-performing methods. Lastly, we will discuss how bandit learning can be extended to capture reneging risk and heteroscedasticity.