| With rapid development of information technology, information productionspeed reaches an unprecedented level. As a result, how to effectively deal with hugeamounts of information become an important problem. In response to vast amounts ofinformation processing, automatic abstract, information retrieval, text categorization,text clustering, and other fields increasingly cause attention of scholars. Keyword isable to reflect content of the article topics, and it’s a brief summary of the article, sokeyword detection is needed by these areas as a guide. As a basic issue in these areaskeyword detection has now become particularly important.Traditional keyword detection technology is mainly based on word frequencymethod, and most need to constantly update the database as a priori informationsupport. However, at today’s information age, it’s almost impossible to update themass of data. So, we need a reliable keyword detection system without prioriinformation supporting.To achieve the goal of keyword detection this paper adopts multiscale method todivide the article, considering the distribution features of the words in each particlesize to calculate the degree of the words topical relevance, so as to effectively detecttext keyword. Contents of this paper researchs are as follows:First, this paper analyzes the distribution characteristics of keywords. Differentwords of the article have different distributions, and the unrelated words are oftenmore randomly distributed, mostly present uniform distribution characteristics. Due tocertain keywords and certain contents of the articles are closely related, so thesewords often concentrated in certain positions, thus forming word s density fluctuations.Refer to the references, this paper defines the word density fluctuations concept, andprovide a theoretical basis for the calculation of the relevancy degree between theword and topic.Second, this paper presents a multi-scale division keyword detection algorithm.This paper uses experimental analysis, founds that word’s distribution have differentcharacteristics at different scales. In order to calculate the relevancy degree betweenthe word and topic more accurate, improve keyword detection accuracy, this paper calculated the words density fluctuations at different scales and calculate therelevancy degree between the word and topic by pattern recognition methods. In theexperiment this paper did keyword detection to the article "The origin of species", gottop13accuracy rate of100%performance.Third, based on the proposed algorithm to analyze words fluctuate significantlyregion to further improve the performance of the algorithm. As the wo rds reflect acertain theme of the article, and therefore the consistency of the words distributionand the topic distribution can reflect the relevancy degree between the word and topic.Because of the words density fluctuation appear in the text, recorded words fluctuatemore significant position can closely link the words with the article, the algorithmbased on this perspective to further explore the relation between the word distributionand the relevancy degree between the word and topic, and to change the relevancydegree between the word and topic. After the algorithm improved, this paper didkeyword detection to the article "The origin of species", performance improvedsignificantly, got top19accuracy rate of100%performance. |