Font Size: a A A

Knowledge Extraction And Reuse In Wikipedia

Posted on:2010-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhangFull Text:PDF
GTID:2178360275470239Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the emergence of Web 2.0, collaborative authoring systems em-brace the power of collective intelligence and have been widely adopted for knowledge management. The wiki is a popular example of such systems. One of the best-known wikis is Wikipedia, the largest free online encyclope-dia authored by a broad community of volunteers. Wikipedia also qualifies as a potential semantic data source for its broad knowledge coverage, well-defined information structure and dynamic evolvement with the change of world knowledge. Semantic wikis aim to enhance wikis by Semantic Web technologies via adding explicit semantics to wiki entities.While the freedom in collaborative wiki contributes to the success of Wikipedia, it also creates problems. In particular, it results in a large number of missing and noisy annotations, which affect the quality of the content and impede the terminology convergence. Currently, low quality annotations have to be addressed by a small group of experts, which becomes a bottle-neck. Meanwhile, these experts are also the most active contributors who contribute the most of edits, which leads to a heavy burden on them. The Semantic Wikis face the similar problem: lack of annotated semantics and semantic annotators. Specifically, for casual users, in order to edit hi-gi-quality articles that have meaningful relationships with the rest of the collection, users are required to have much knowledge about the collection and also to understand the underlying semantic technologies. They need to know:1) When is it necessary to provide a hyperlink to a target entity of a related topic for reference? How to locate the target? 2) What categories are proper to characterize an article?3) What infobox can be used to model the properties of an article?4) Is there any implicit relationship between entities when editing Se-mantic Wikipedia? If so, how to annotate it?In this thesis, we try to help user answer these questions via knowledge extraction and knowledge reuse. Here, knowledge extraction is the pre-step of knowledge reuse, which is performed based on the extracted background knowledge. We are inspired by collaborative filtering research that uses the ratings from other like-minded users to calculate recommenda-tions for the active users. Similarly, we reuse the collective knowledge by annotation suggestion for Wikipedia authoring.To accomplish this goal, we first extract meaningful knowledge from the data currently annotated in Wikipedia as our background know-ledge, which can be structural and semi-structural semantic features of Wi-kipedia entities including entity thesaurus, entity types, and semantic rela-tionships between entities. And then we propose a unified annotation suggestion algorithm framework to exploit such extracted knowledge and apply our knowledge reuse solution to Wikipedia authoring.We present our prototype system named EachWiki that provides the following annotation suggestion services for users: link suggestion, category suggestion, infobox suggestion, and relation suggestion, in which way, the collective intelligence is leveraged. The above suggestion services can not only help users create high-quality Wikipedia knowledge, but also help brick Semantic Wikipedia. Finally, the experimental evaluations of every sugges-tion modules prove the effectiveness, efficiency, and usability of our ap-proaches.
Keywords/Search Tags:Wikipedia, Knowledge Extraction, Knowledge Reuse, An-notation Suggestion, Relation Suggestion
PDF Full Text Request
Related items