Research On Building Wikipedia Semantic Knowledge Base And Its Application In Text Classification

Posted on:2011-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:X K Su

Full Text:PDF

GTID:2178360305968259

Subject:Computer application technology

Abstract/Summary:

With the continuous development of computer technology and the rapid spread use of the Internet, more and more people started using the Internet for information obtain. Now in the age of information, how to get the rich semantic knowledge from the mass text information, how to use these semantic knowledge for existing natural language processing provides reliable service, becomes a very important research subject.The research found that the current source of semantic knowledge can be divided into two categories:One kind is the semantic knowledge constructed manually (e.g. HowNet), another kind is a large-scale real texts, including the massive text on the Internet, a variety of off-line text collections (such as the corpus of all sizes), a variety of encyclopedic knowledge library (such as Wikipedia, etc.). The research showed that the semantic knowledge constructed manually has been hard to meet the increasing demand of the network information processing. Therefore, an automatic method based on a certain scale corpus of Wikipedia to construct knowledge base has been proposed. The main work includes:1. In the area of formal representation of knowledge, a representation method that uses the semantic label as reference, and the semantic fingerprints to depict the semantic has been proposed. This representation method thought that every concept (term) has some background information for support, and puts forward a probability formula to quantify the semantic fingerprint contribution of semantic label. This method use the strategy of explicit representation semantic knowledge that the artificial knowledge base used, and add the probability information, so as to obtain a more accurate description of semantics, and can easily be integrated into the existing text of calculation models.2. Based on the formal representation of knowledge method through the semantic label and the semantic fingerprints that proposed, through some operations such as preprocessing, semantic label selection, relative concepts extraction, value of the contribution determination to a certain scale of Wikipedia corpus, mine the rich link relationships between the pages of the Wikipedia, a Wikipedia semantic knowledge base is established.3. In order to prove the validity of the knowledge base, combined with prior research on the Chinese text classification, a method that by use of this semantic knowledge base to expand the text terms, so as to improve the accuracy of text classification. And the comparative experiments between traditional classification method and the method proposed in this paper prove the effectiveness of the knowledge base.The experiment result show that the semantic knowledge base constructed in this paper improves accuracy of text classification proves the effectiveness of the knowledge base.

Keywords/Search Tags:

Wikipedia, semantic knowledge base, text classification, information expansion

Related items

1	Mining Semantic Knowledge From Chinese Wikipedia
2	Automatic Construction Method For Domain Concepts Based On Wikipedia Semantic Knowledge Base
3	Text Classification Based On Wikipedia Knowledge
4	Research And Implementation Of The Knowledge Search System Based On Wikipedia
5	A Collaborative Method On Association Semantic Knowledge Base Construction
6	Automatic Classification Of Various Types Of Documents Based On Wikipedia
7	Short Text Classification Method Combining Statistical Information And Conceptual Information Of Knowledge Base
8	A Semantic-Wiki Knowlege Base System Based On Knowledge Elements
9	Discovering Entity Relationship And Semantic Annotations Base On Wikipedia Encyclopedia Knowledge Resources
10	Using Knowledge From Wikipedia To Improve Document Classification