Font Size: a A A

Research On Crucial Technologies Of Web Person Name Entity Disambiguation

Posted on:2013-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y X TanFull Text:PDF
GTID:2248330395980522Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There is a large quantity of information related to person name entities in Web pages whichare an important platform for information dissemination. Mining Web person name entities’information has become a significant channel to reveal the membership of various socialorganizations and social relations. The research of the Web person name entity disambiguation isof great significance for integration of multi-form social networks and Web informationextraction. In this paper, after analyzing the single-feature disambiguation method’s lack ofsensitive awareness of the true identity of Web person name entities and low utilization rate offeatures as well as strong dependence on the empirical value, we design the framework of Webperson name entity disambiguation by synthesizing a variety of disambiguation technologies.Deep research on extraction of disambiguation features and disambiguation clustering isconducted. And the major achievements are listed as follows:1. Construction of the multi-feature based framework of Web person name entitydisambiguation. We design a step-by-step Web name entity disambiguation framework byintegrating advantages of a variety of methods with different features. The framework consists ofthree parts: extraction of disambiguation features, distance calculation and Web page clustering.2. An algorithm of association of static attributes of Web person name entities based onMarkov logic networks. Web pages are divided into two categories according to the visualstructure. Blocks with close content relation are determined by VIPs algorithm for differentcategories of Web pages, with which static attributes are associated. Web page classification andstatic attributes association are unified into Markov Logic Networks for global reasoning. Theeffectiveness of the method is validated by comparison with the method without Web pageclassification.3. An algorithm to disambiguate Web person name entities based on weighted features andclustering ensembles. Static attributes, topics and Web-based social network features areextracted from Web pages. For each feature a mature single-level clustering disambiguationalgorithm is leveraged with different values of experience threshold to obtain different clusters ofWeb page sets. These clusters are firstly filtered based on the theory of clustering ensembles andthen the weights of the three features are calculated according to the concept of entropy in theinformation theory. To reduce the time complexity, we reduce dimensionality of the weightedclustering ensemble matrix for later ensembles and gain the final clustering results. And variousfeatures are utilized comprehensively. Experiments on WePS-2data set are conducted. Theresults show that our proposed method has better performance of disambiguation than that ofsingle-feature clustering.
Keywords/Search Tags:Web person name entity, Disambiguation, Attribute association, MarkovLogic Networks, Weighted feature, Clustering ensemble
PDF Full Text Request
Related items