| Deep learning model, proposed by Geoffrey Hinton, can extract information effectively and represent feature accurately so that it has introduced great revolutions in many areas, such as, speech recognition, digital image processing, and nature language processing. However, it has brought a lot of trouble to users because of its long time-consuming and difficult training. Meanwhile, to Tibetan, the traditional machine learning algorithm cannot analyze sentence structural information very well so as to reduce its accuracy, so it cannot be used in Tibetan public opinion monitoring effectively. To solve these problems, main work in this thesis is as follows:1) To effectively improve the performance of Semi-Supervised Recursive Auto Encoder (Semi-Supervised RAE) on text sentiment analysis while the training data set and testing data set are enormously massive, two parallel computing methods corresponding to these two data sets were proposed which are based on MapReduce framework. For the massive training data set, firstly, it is divided into data chunks,and the chunk error of every data chunk is calculated in mappers; secondly, these chunk errors were sent to the cluster buffer; thirdly, optimization object function is calculated by these chunk errors collected in reducer, and its parameter set can be updated through L-BFGS(Limited-memory BFGS); fourthly, the above mentioned processes are iterated till the optimization object function is convergence to obtain its optimal parameter set. For the massive testing data set, firstly the cluster was initialized by the trained optimal parameter set in the training step; secondly the vectors of each sentence are calculated in mappers and sent to cluster buffer; thirdly the sentiment labels are calculated by the classifiers using the sentence vector in reducer. At last, serval experiments are designed to check the parallel algorithms,which are all working very well in four aspects, such as, precision, speedup, scaleup and sizeup.2) In past, the traditional machine learning algorithm may ignore Tibetan sentence structure, word order and so on, so as to decrease accuracy of Tibetan sentiment analysis. To conquer this problem, the Semi-Supervised Recursive Auto Encoder model is introduced in this thesis. Through a large number of experiments,the best vector dimensions and reconstructed error coefficients can be searched to improve the performance of our approach. Meanwhile, the relationships among the capacity of training set, training time and accuracy are analyzed through experiments,and if the better predicted result and less time consuming are the ideal goal, the training set should be decreased in a proper way. Compared with traditional Tibetan sentiment analysis algorithms, our approach has better accuracy and efficiency. |