Font Size: a A A

Research And Application Of The Identification Method Of Book Authors

Posted on:2019-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2428330548452629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There is a large population in China,and the ambiguity of names becomes a common phenomenon.This phenomenon has puzzled the research in search engine,knowledge base,machine learning,Natural Language Processing and other aspects.How to extract the personage info that the user pays attention to from the massive information becomes the current hot topic.Chinese book authors have a large number of phenomena with the same name.This phenomenon seriously affects the retrieval quality of books by using the authors'name as a keyword,and consumes a large amount of screening time.This article is based on the authors' information and the books' basic data.It focuses on the identification of authors.It aims to improve the accuracy of searching multi-person homonym and name variant,and quickly locates the authors' own information and book works.It facilitates researchers to track the authors' relevant research results.This article focuses on research and analysis of the authors' profiles.It is found that there are narrative differences and incomplete attribute descriptions in the authors'profiles information,which will lead to a high vacancy rate in the feature matrix generated by the extracted authors' attributes.In view of the above characteristics,this paper makes improvements when calculating attributes' weights and feature-matrix'weights.When the attributes' weight is calculated,an attribute mutual exclusion amplification method is proposed to improve the authors' identification accuracy in the case of attribute mutual exclusion.For the feature matrix vacancy phenomenon,a feature matrix vacancy reduction method was proposed to improve the high vacancy in the feature matrix.Aiming at the phenomenon of feature matrix vacancy,this paper proposes a feature matrix vacancy reduction method to improve the accuracy of the authors' identity recognition in the case of a high vacancy rate of the feature matrix.Experimentally verified,when the mutual exclusion attribute accounts for the proportion of the whole attribute in the range of 0.16~0.77,the application of attribute mutual exclusion amplification method has advantages.When the vacancy rate of the feature matrix is 0.35,the authors' identification rate is optimal and increases by nearly 5 percentage points compared to the method without the use of feature matrix vacancy reduction.It is concluded that the attribute mutual exclusion amplification method and feature matrix vacancy reduction are effective to improve the accuracy of authors' identification.In the end,the general index of B_Cubed is used as the evaluation criterion of similarity threshold.When the similarity threshold is 0.47,the recall rate,accuracy rate and F-value of authors' identification are the best.
Keywords/Search Tags:Chinese author identification, name disambiguation, mutual exclusion amplification, gap reduction, hierarchical clustering
PDF Full Text Request
Related items