Font Size: a A A

Keyword Extraction In Literary Corpora Based On Spatial Distribution And Information Entropy

Posted on:2011-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q H SunFull Text:PDF
GTID:2120330332961050Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Keyword extraction is always the key problem in the field of information retrieval. To have good performance, traditional keyword extraction methods must have a reference corpus(i.e.,a dictionary). With subjects overlapping, M.Ortuno et al.proposed a new model based on the space distribution of words inspired by the level statistics of quantum disordered systems following the random matrix theory in 2002, in which no dictionary is needed. In the same year, Montemurro et al.proposed a model which took into account the information entropy of words as main rank index, this model also have good performance in the situation that no dictionary can find.In this paper, we analyze the process of those two models and make computer implementation of them, the spatial distribution model and the information entropy model, then we have made a remarkable improvement for both of the models. After that, We propose a new model combined with the two improved models. Our model can do well in keyword extraction without the dictionary. We introduce the concept of recall and precision in information retrieval to assess our model. The numerical experiments prove that through selecting a group of reasonable parameters, our model can achieve high recall and precision.
Keywords/Search Tags:Keyword, Feature Selection, Spatial Distribution, Monte-Carlo Simulation, Information Entropy
PDF Full Text Request
Related items