Institute of Information Science, Academia Sinica



Press Ctrl+P to print from browser

AIIA Lab of IIS, Academia Sinica, led by Chun-Nan Hsu, achieved top scores in the second BioCreative Text Mining Challenge


    The research team from the AIIA Lab, Institute of Information Science, Academia Sinica, Taiwan, led by Chun-Nan Hsu, and from I-fang Chung's Lab at the Institute of Bioinformatics, National Yang-Ming University, achieved the second and third highest scores for the two methods that they submitted to the second BioCreative Challenge Evaluation, held in Madrid, Spain in 2007. There were 21 participants who submitted their methods to this Challenge. The top score was achieved by a team from IBM T.J. Watson Research Center in the USA. However, the organizer reported that the top 3 scores did not have statistically significant differences, and thus these scores could all be considered as the top scores in this Challenge. Moreover, after reweighting the samples to correct the sample selection bias, the first method submitted by Hsu and Chung received the top score among those submitted by all participants.

    This Challenge is to evaluate the performance of the state-of-the-art computer programs for the task of extracting gene and gene product mentions from a large corpus of literature in molecular biology. Such computer programs can assist molecular biologists to search literature related to certain genes. They also allow researchers to extract a large number of reports on certain molecular biology events (e.g., protein-protein interactions, reaction pathways, etc.) from literature without performing resource-demanding and time-consuming experiments. Therefore, a great deal of efforts has been devoted to this research around the world. Extracting gene mentions is particularly difficult because authors rarely use standardized gene names and gene names naturally co-occur with other types that have similar morphology, and even similar context. The Academia Sinica-Yang Ming University team applied machine learning algorithms to train conditional random fields and support vector machines from a corpus of 15,000 sample sentences to achieve their top scores. They have been studying efficient training algorithms for conditional random fields and have already achieved promising results. Those results will be published in the near future.

    This research is supported by the National Research Program for Genomic Medicine (NRPGM), National Science Council (NSC) under the grant for Advanced Bioinformatics Core (ABC) facility. ABC consists of four teams from National Yang-Ming University and Academia Sinica. ABC welcomes collaboration proposals from biology labs to extend the impact of their research achievements.

    Other participants include teams from the University of Pennsylvania, which was the defending top scorer, National Center for Biotechnology Information (NCBI), Cambridge University and other renown research institutes from Netherland, Spain, Germany, Korea, China etc.
Pls. visit