| Using space vector to represent text information reasonably and effectively affects the results of text clustering and retrieval.In the traditional text representation model,the vector space model(VSM)represents the text simply and intuitively,which can be applied to many fields.However,there are two independent text feature words,which are not related to each other,and there is no further analysis of their internal relations.The improved co-occurrence latent semantic vector space model(CLSVSM)based on VSM deeply mines the co-occurrence latent semantic information between text feature words,and improves the performance of text clustering.In the field of text mining,keywords,as text feature words,are the key of text retrieval and represent text topics.The frequency of feature words in text is extracted and normalized,which represents the word frequency information of text.Based on CLSVSM,this paper studies the relationship between the co-occurrence potential semantic information and word frequency information of feature words.Based on CLSVSM,this paper first introduces the word frequency information of feature words,and then gives the co-occurrence strength of CLSVSM with the introduced word frequency as the weight,and finally constructs the feature weighted CLSVSM.In addition,the word vector constructed by neural network method contains the semantic information of the text.Taking the new model as the bridge,a text representation method combining word2 vec and vector space model is proposed.Experimental results show that: Compared with VSM and CLSVSM,the entropy of word frequency CLSVSM is reduced by 12% and 2%respectively in Chinese data,and the F value of word frequency CLSVSM is increased by 14% and 8% respectively in English data.The clustering effect of word frequency CLSVSM is more stable than other models.The clustering effect of feature weighted CLSVSM on Chinese data: in terms of F value,compared with CLSVSM,it improves by nearly 2.4%,Compared with word frequency CLSVSM can improve the clustering effect,the clustering effect of English data is similar to other models.Feature weighted CLSVSM makes the expression of text information more comprehensive and improves the performance of text clustering.The text representation method based on vector space model and word2 vec model is better than other models in clustering effect. |