學術演講

Rethinking Policy Improvement in Reinforcement Learning

講者謝秉均教授 (國立陽明交通大學資訊工程系)
邀請人：TIGP (SNHCC)
時間2022-03-21 (Mon.) 14:00 ~ 16:00
地點資訊所新館106演講廳

摘要

Policy improvement is one central component of any reinforcement learning (RL) algorithm, and the most widely-used approach is to leverage the policy gradient (PG) theorem to iteratively improve the learned policy. Despite the success of PG, it could suffer from inefficient training in various settings. In this talk, I will go beyond PG and introduce two new policy improvement frameworks:
(i) First, I will introduce the action-constrained RL problem and discuss the critical “zero-gradient issue” resulting from PG. Then, I will present Frank-Wolfe policy optimization, which is a decoupling framework that completely resolves the challenging zero-gradient issue.
(ii) Next, I will present Hinge policy optimization (HPO), which rethinks policy updates as solving a large-margin classification problem with hinge loss. The HPO framework opens up a whole new family of RL algorithms, including PPO with a clipped surrogate objective (PPO-clip) as a special case. Moreover, we formally prove that HPO attains a globally optimal policy. To our knowledge, this is the first global convergence guarantee for the PPO-clip algorithm.
Finally, experimental results will also be presented to corroborate the effectiveness of the two frameworks.

BIO

Ping-Chun Hsieh is currently an assistant professor in the Department of Computer Science at National Yang Ming Chiao Tung University (NYCU). He received his B.S. and M.S. in Electrical Engineering from National Taiwan University in 2011 and 2013, respectively, and his Ph.D. degree in Electrical and Computer Engineering from Texas A&M University (TAMU) in 2018. His research interests include reinforcement learning, multi-armed bandits, and wireless networks. His research received the Best Paper Awards from ACM MobiHoc 2020 and ACM MobiHoc 2017. He is a recipient of Junior Faculty Award (黃培城青年講座) from NYCUin 2020, Young Scholar Fellowship (愛因斯坦計畫) from the Ministry of Science and Technology in 2019, the Outstanding PhD Student Award from the ECE Department at TAMU in 2016, and the Government Scholarship to Study Abroad from the Ministry of Education, Taiwan.

中央研究院資訊科學研究所

活動訊息

學術演講

Rethinking Policy Improvement in Reinforcement Learning

摘要

BIO