Font Size: a A A

TextRank Algorithm Optimization Based On Markov Model

Posted on:2019-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:L Q LiuFull Text:PDF
GTID:2428330566995990Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an important algorithm for task of massive hypertext ranking,PageRank treats all the nodes in the internet as a graph model,ranks the nodes by computing their own importance,iteratively.TextRank applies PageRank to the task of keyword extraction and auto abstracting of specific text.It treats a specific text as a graph model,uses the iterating process to do the calculation,and obtains an improved performance for processing the text.TextRank is theoretically simple,unsupervised,suit for both the single and multiple text tasks,and widely applied to every subfield of NLP.Currently the main work focus on combining TextRank with other algorithm to solve the specific task,but lacks of research for the TextRank algorithm itself and its difference with the PageRank algorithm.Both the characteristic of NLP task and the corresponded TextRank are obviously different with the PageRank task.PageRank algorithm and the parameters are not compatible to be used in NLP task without any adjustment.After deeply researching on PageRank,TextRank and the difference of their theory,mathematical model,process of iteration and corresponding tasks,a new algorithm,which named TextRank based on the conditional probability model,and the optimization,are proposed.Then introduces the Markov model theory into the understanding for the iteration and the graph model,finds a defect of this conditional probability model,and make up them.Meanwhile,according to this algorithm,a potential distinct method can give a further performance improvement and application field extension to TextRank.Experiment demonstrates that TextRank base on the conditional probability model is more accurate than the others based on the traditional model.In the experiment measured by Precise-Recall graph,TextRank base on the conditional probability model gave a better performance than all the other five TextRank based on the traditional graph model.Constrained by re-identified threshold,the iteration of the algorithm does not increase significantly.And based on which,introduce the process of qualitative analysis into the iteration,for the case of NLP task which require specific output.Experimental proof that for the case of task which requires the sequence of the word only,application of qualitative analysis can significantly decrease the iteration.A conditional probability model is combined with the nodes graph,a more valid text modeling method is proposed,and then the accuracy of text keyword ranking is improved,while the error rate decreased.Furthermore,introducing the qualitative analysis process significantly improve the performance of TextRank for some specific task.
Keywords/Search Tags:TextRank, PageRank, keyword ranking, Markov model, conditional probability, sequence stability, quantitative analysis, qualitative analysis
PDF Full Text Request
Related items