Font Size: a A A

Research And Implementation Of Text Similarity Algorithm Based On Semantic Fusion

Posted on:2022-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:R LiuFull Text:PDF
GTID:2518306539981249Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text similarity calculation is a key technology in the text mining process,and it is widely used in text classification,machine translation,search engines,plagiarism detection,automatic question and answer and other fields.At present,the most widely used text similarity algorithm is the similarity algorithm based on the vector space model,but this algorithm has the problem of unreasonable word weights and ignores the semantics of words.This paper will improve the text similarity algorithm based on the traditional vector space model algorithm and apply it to text classification.The main work of this paper is as follows:(1)Aiming at the problem that the traditional text similarity algorithm has a single word frequency weight and cannot effectively provide text features,a vector space model algorithm based on multi-feature fusion(Multi-feature Fusion Weights-Vector Space Model,MFW-VSM)is designed.By introducing information gain and intra-class dispersion,this algorithm improves the TF-IDF algorithm that does not consider the distribution of feature words between classes,and applies it to the cosine method of the vector space model.(2)Aiming at the problem of ignoring the semantics of words in traditional text similarity algorithms,a text similarity algorithm based on semantic fusion(MFW-VSM-How Net)is proposed.This algorithm first performs the word similarity calculation method based on How Net proposed by Liu Qun and Li Sujian.It is optimized and extended from word level to paragraph level,and then the above-mentioned improved feature weight calculation method is applied to the algorithm,and finally the algorithm and the MFW-VSM algorithm are weighted and evaluated.(3)Determine the optimal ratio between the MFW-VSM algorithm and the optimized How Net algorithm.(4)Through experiments,the algorithm proposed in this paper is compared with classic text similarity algorithm and semantic similarity algorithm,and finally its effect is presented through text classification.Experiments show that the improved algorithm in this paper has the best effect in text classification.
Keywords/Search Tags:Text similarity, Vector Space Model, TF-IDF, How Net
PDF Full Text Request
Related items