Font Size: a A A

The Research On User Intention Recognition Based On Fasttext And Keyword Extraction Of Question And Answer System

Posted on:2019-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:L L DaiFull Text:PDF
GTID:2428330545966444Subject:Information Security and Electronic Commerce
Abstract/Summary:PDF Full Text Request
In view of the complexity and long time consuming of the text classification process,this paper introduces the Facebook open source sentence classification and the word feature learning model fastText into the Chinese text classification field,and verifies its effect in Chinese classification.The experimental results show that the classification method based on the fastText model can reduce the classification time greatly While ensuring the classification effect compared with the current mainstream text classification method.Question answering system is a highly responsive system.Shorten user waiting time can improve user experience.On the basis of this experiment,this paper applies fastText to the user intention recognition of the question answering system.The experimental results show that the accuracy,recall and F1 value of the fastText classification result are obviously higher than the convolution neural network method.The time of fastText experiment is only 1.15%of convolution neural network.At the same time,this paper further explores the minimum dimension value of the equivalent simple classifier and the parameter optimization rules to improve the classification accuracy,and constructs a fastText Chinese text classification model with simple model and optimal parameters.The key words of chatting questions are not obvious,wide range,colloquial and short content.In this paper,we optimize the retrieval of chatting questions by keyword extraction.Based on the formula of information entropy,the average information entropy of each word in the initial candidate keyword set is calculated by H(t).The reciprocal of the average information entropy is used as the initial weight of each vertex to improve the TextRank algorithm to extract the question key words.According to the improved formula,we calculate the weight of each candidate keyword iteratively.We sorted the top ranked keywords as the results of the method,according to the weight of words.The experimental results show that the improved TextRank algorithm has a better keyword extraction effect than before.In order to further improve the accuracy and recall of keywords extraction,this paper introduces the idea of the mathematical set into the experiment.According to the advantages and disadvantages of each method,this paper combines the TF-IDF method with the TextRank algorithm to expand the number of correct keywords in the extraction results.In order to further improve the accuracy,this paper,we intersect the results of the sum aggregate and the improved TextRank algorithm to filter the incorrect keywords in the two methods.The results show that both the accuracy and the FI value are obviously improved.In this paper,we set the threshold N of intersection number of key words,and get the best threshold N in many experiments.To sum up,this paper applies the fastText model of linear classification to the domain of user intention recognition in the question answering system.Through experiments,it is proved that the speed of the model is very fast under the condition that the accuracy of classification is not lost,and the classification time can be shortened obviously compared with the mainstream classification method in the current classification field.Through the algorithm improvement,multi method fusion and the introduction of the mathematical set idea,the final experimental results show that the extraction results of the proposed keyword extraction scheme are relatively stable,and the overall performance of the model is high,which proves the feasibility of the proposed scheme.
Keywords/Search Tags:question and answer system, text classification, intention recognition, keyword extraction
PDF Full Text Request
Related items