Font Size: a A A

Research On Building Wikipedia Semantic Knowledge Base And Its Application In Text Classification

Posted on:2011-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X K SuFull Text:PDF
GTID:2178360305968259Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology and the rapid spread use of the Internet, more and more people started using the Internet for information obtain. Now in the age of information, how to get the rich semantic knowledge from the mass text information, how to use these semantic knowledge for existing natural language processing provides reliable service, becomes a very important research subject.The research found that the current source of semantic knowledge can be divided into two categories:One kind is the semantic knowledge constructed manually (e.g. HowNet), another kind is a large-scale real texts, including the massive text on the Internet, a variety of off-line text collections (such as the corpus of all sizes), a variety of encyclopedic knowledge library (such as Wikipedia, etc.). The research showed that the semantic knowledge constructed manually has been hard to meet the increasing demand of the network information processing. Therefore, an automatic method based on a certain scale corpus of Wikipedia to construct knowledge base has been proposed. The main work includes:1. In the area of formal representation of knowledge, a representation method that uses the semantic label as reference, and the semantic fingerprints to depict the semantic has been proposed. This representation method thought that every concept (term) has some background information for support, and puts forward a probability formula to quantify the semantic fingerprint contribution of semantic label. This method use the strategy of explicit representation semantic knowledge that the artificial knowledge base used, and add the probability information, so as to obtain a more accurate description of semantics, and can easily be integrated into the existing text of calculation models.2. Based on the formal representation of knowledge method through the semantic label and the semantic fingerprints that proposed, through some operations such as preprocessing, semantic label selection, relative concepts extraction, value of the contribution determination to a certain scale of Wikipedia corpus, mine the rich link relationships between the pages of the Wikipedia, a Wikipedia semantic knowledge base is established.3. In order to prove the validity of the knowledge base, combined with prior research on the Chinese text classification, a method that by use of this semantic knowledge base to expand the text terms, so as to improve the accuracy of text classification. And the comparative experiments between traditional classification method and the method proposed in this paper prove the effectiveness of the knowledge base.The experiment result show that the semantic knowledge base constructed in this paper improves accuracy of text classification proves the effectiveness of the knowledge base.
Keywords/Search Tags:Wikipedia, semantic knowledge base, text classification, information expansion
PDF Full Text Request
Related items