| Text representation algorithm research is of great significance.There are two main text representational algorithm:statistical language models and probabilistic neural language model.However,these two models have disadvantages:statistical language model is simple,but the dimension of the vector space is too big,subsequent computational complexity is too high;the features’ representation from probabilistic neural language model is of good quality and of high compression rate,but the model itself is too complex:too many parameters,the computational complexity of the model itself is too high.In this thesis,text representation model based on word vector is mainly discussed.Whether it is good or bad depends on the quality of the word vector.Word’s distributed representation is superior to the word one-hot’s representation.So to improve the quality of the text representation model,word’s one-hot representation can be replaced by word’s distributed representation.To begin with,improvement to bag of words model is introduced.Bag of word model may be expressed as using the sum of word’s one-hot representation to represent the text vector.In order to improve the model:First,use word’s distributed representation to replace word’s one-hot representation;Second,on the basis of step one,use the sum of Chinese character vector to represent text vector.The effect:First,it can effectively reduce the data dimension and the complexity of the subsequent calculations;Second,it can skip word segmentation part,thus reducing the the complexity of the model.Then,it is the improvement to the Crepe model.For Chinese text,the model preprocesses them into pinyin,then use one-hot representation to represent each English letter and put the text matrix into the CNN.The model will be improved in this way:get distributed representation of each word instead of one-hot representation of each English letter.By doing so,input data dimension is reduced and the training time of the model has fallen dramatically.Finally,a system,the prediction of reviews’ emotion trend,is built based on the improved bag of words model. |