Font Size: a A A

Research Of Protein-Protein Interaction Extraction Based On Semi-supervised And Active Learning

Posted on:2009-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:B J CuiFull Text:PDF
GTID:2120360272470659Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As the quantity of biomedical literatures is increasing rapidly, various kinds of biomedical information appear in front of biomedical researchers. On one hand, this brings biomedical researchers a heavy burden and makes it difficult to find needed information from these literatures rapidly; on the other hand, they usually need to tag many samples to research or specific work, which costs too much because of the majority of data. In order to improve work efficiency, an automated facility is urgently needed to find needed information rapidly; also people hope least label data can meet the actual need of research. Research on protein-protein interaction (PPI) extraction from biomedical literature by using semi-supevised learning and active learning methods emerges under this background. Furthermore, there is high application value in PPI automatic extraction from biomedical literature, which can help to build protein relation network, predict protein function and design new drugs.The paper first introduces the related knowledge of PPI extraction and the general research. Then semi-supevised learning methods including self-traing and co-training are shown together with active learning methods. Some of the methods are applied into PPI extraction, which try to solve the task in two different ways to alleviate the tag burden as much as possible. First, self-traing and co-training are applied respectively to explore how to use the plenty of unlabeled data to have a nice PPI extraction performance; secondly, active learning method is used to pick up the most informative unlabeled samples to show how to reduce the amount of human labeling effort while maintaining the PPI extraction performance. At last, the two ways are combined to find a harmonic way where users can use much less labeled data to have a good performance. The paper tests all the methods on different corpra and gives detailed discussion and conclusion.
Keywords/Search Tags:PPI, Semi-supervised Learning, Active Learning
PDF Full Text Request
Related items