Research Of Text Representation Method Based On Co-occurrence Analysis

Posted on:2022-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:T Yan

Full Text:PDF

GTID:2507306509969779

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Using space vector to represent text information reasonably and effectively affects the results of text clustering and retrieval.In the traditional text representation model,the vector space model(VSM)represents the text simply and intuitively,which can be applied to many fields.However,there are two independent text feature words,which are not related to each other,and there is no further analysis of their internal relations.The improved co-occurrence latent semantic vector space model(CLSVSM)based on VSM deeply mines the co-occurrence latent semantic information between text feature words,and improves the performance of text clustering.In the field of text mining,keywords,as text feature words,are the key of text retrieval and represent text topics.The frequency of feature words in text is extracted and normalized,which represents the word frequency information of text.Based on CLSVSM,this paper studies the relationship between the co-occurrence potential semantic information and word frequency information of feature words.Based on CLSVSM,this paper first introduces the word frequency information of feature words,and then gives the co-occurrence strength of CLSVSM with the introduced word frequency as the weight,and finally constructs the feature weighted CLSVSM.In addition,the word vector constructed by neural network method contains the semantic information of the text.Taking the new model as the bridge,a text representation method combining word2 vec and vector space model is proposed.Experimental results show that: Compared with VSM and CLSVSM,the entropy of word frequency CLSVSM is reduced by 12% and 2%respectively in Chinese data,and the F value of word frequency CLSVSM is increased by 14% and 8% respectively in English data.The clustering effect of word frequency CLSVSM is more stable than other models.The clustering effect of feature weighted CLSVSM on Chinese data: in terms of F value,compared with CLSVSM,it improves by nearly 2.4%,Compared with word frequency CLSVSM can improve the clustering effect,the clustering effect of English data is similar to other models.Feature weighted CLSVSM makes the expression of text information more comprehensive and improves the performance of text clustering.The text representation method based on vector space model and word2 vec model is better than other models in clustering effect.

Keywords/Search Tags:

CLSVSM, Feature weighting, Word frequency, word2vec, Text clustering

PDF Full Text Request

Related items

1	Penalized Matrix Decomposition And Its Application In Text Topic Clustering
2	Research On The Course Recommendation Based On Word2Vec And TF-IDF
3	Research And Application Of Text Clustering Based On Topic Model
4	Feature Weighting Method For Binary Classification In Machine Learning
5	A Study On The Recruitment Market Of Data Analysis Based On Text Mining
6	The Method Of Selecting Local Feature Words And Its Application In Text Classification
7	Improved SSD Algorithm Based On Feature Pyramid And Clustering
8	Design And Implementation Of Student Automatic Grouping System Based On Feature Clustering
9	Attention Pattern Mining And Application Of Multivariate Student Behaviors Based On Time-Frequency Analysis
10	Variance Analysis In Teaching Evaluation Themes Of Students With Different Majors Based On LDA Model