中央研究院 資訊科學研究所

活動訊息

友善列印

學術演講

[資訊所/資創]前瞻科技演講系列(第一場) Term Revealing: A New Quantization Approach for Deep Learning

  • 孔祥重院士 教授 (Computer Science and Electrical Engineering, Harvard University(USA))
    邀請人:鐘楷閔、楊得年、蘇黎
  • 2020-09-01 (Tue.) 10:00 – 11:30
  • 實體: 資訊所新館106演講廳,視訊: 連結請詳摘要
摘要

Virtual Meeting Link: https://asmeet.webex.com/asmeet/j.php?MTID=m33d611563571e837de9931c0df7ed52d

ID: 170 978 2671

密碼:GPfq8RBSk23

*此系列演講主要開放對象為本院資訊所及資創同仁

**本所具與會者參加資格之認定,為確保演講品質,必要時得將解除與會權限(現場及視訊)

---------------------------------------------------------------------------------------------------------------------

Quantization is a widely used technique for efficient deep learning computation. However, aggressive post-training quantization is often not possible. E.g., for ImageNet, quantization using fewer than 8 bits would substantially degrade classification accuracy. A problem with conventional quantization is that it indiscriminately truncates lower-order bits of numbers, while these bits are precisely those critical bits that differentiate features.

We propose a new quantization method, "term revealing," to alleviate the problem.  The technique substantially reduces required bits for filter weights and activation data, by dynamically adapting bit usage so that only these bits which are most critical to classification accuracy will be kept.  Term revealing can apply to models with or without quantization-aware training.

To enhance efficiency, we further propose "HESE encoding" (Hybrid Encoding for Signed Expressions) for signed-digit representations involving both positive and negative terms, in contrast to standard unsigned binary representations using only positive terms. E.g., we represent 55 as 26 - 23 - 20 rather than 25 + 24 + 22 + 21 + 20. We describe an efficient one-pass scheme for forming minimum-length signed-digit representations.

This talk focuses on the post-training term revealing. (There is no need for model re-training.)  We evaluate the approach on MLP for MNIST, CNNs for ImageNet, and LSTM for Wikitext-2. We show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same model accuracy.  (Our recent work in term-quantization-aware training shows similar gains for YOLOv5.) A description of term revealing with source code is accessible from a forthcoming SC20 paper jointly written with Brad McDanel and Sai Zhang of Harvard.

 

BIO

H. T. Kung is William H. Gates Professor of Computer Science and Electrical Engineering at Harvard University.  He has pursued a variety of research interests in his career, ranging from computer science theory, parallel computing, VLSI design, database algorithms, computer systems, wireless communications, and networking, to machine learning. His academic honors include Member of National Academy of Engineering (USA), Guggenheim Fellowship, and the ACM SIGOPS 2015 Hall of Fame Award (with John Robinson). He currently serves as volunteer President of a non-profit organization, Taiwan AI Academy (台灣人工智慧學校), with multiple campuses in Taiwan, that has nurtured thousands of AI talents for the industry