學術演講

A Dependency Parser for Tweets

講者Lingpeng Kong 先生 (Language Technologies Institute, Carnegie Mellon University)
邀請人：古倫維
時間2015-08-20 (Thu.) 10:00 ~ 11:00
地點資訊所新館106演講廳

摘要

In contrast to the edited, standardized language of traditional publications such as news reports, social media text closely represents language as it is used by people in their everyday lives. These informal texts, which account for ever larger proportions of written content, are of considerable interest to researchers. Here, we describe a new dependency parser for English tweets, TweeboParser. This work builds on several contributions: new syntactic annotations for a corpus of tweets (TweeBank), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.

BIO

Lingpeng Kong is a Ph.D. student in School of Computer Science, Carnegie Mellon University, co-advised by Prof. Noah Smith and Prof. Chris Dyer. His main research interests are in designing algorithms to tackle the core problems in natural language processing (NLP). His work utilizes methods from machine learning, optimization and combinatorial algorithms with applications related to syntactic parsing, machine translation, and social media. Prior to CMU, he worked in IBM China Systems and Technology Lab.

中央研究院資訊科學研究所

活動訊息

學術演講

A Dependency Parser for Tweets

摘要

BIO