| Similarity computing is the basic technology of information processing, such as data mining, Machine Translation, automatic question answering, query retrieval and so on. In the field of Tibetan information processing for similarity calculation method is less and less. on the analysis of the existing Chinese segmentation fusion similarity calculation method based on the proposed Tibetan segmentation fusion of similarity calculation method:the method to the paragraph as a unit, each section of text in approximately as a short text, by calculating the similarity of short text and short text, then get the similarity between the long text and long text value. Thus we have the two Tibetan texts similarity value.The following technical route and method adopted in this paper:Two Tibetan text for a given, respectively after the removal of stop words, feature dimension reduction, and then filter out the specified Tibetan word part of speech of each paragraph, we will obtain all paragraphs in line with the conditions of two Tibetan text; then calculate the number of feature words and the value of TF, while the TF value is normalized; then the value of TF and some related parameters to calculate the weights of each paragraph of words; finally the weights of each paragraph calculated two paragraphs and paragraphs in the text similarity value and similarity matrix, through a series of precise treatment is calculated the two Tibetan text similarity value.Next will in the Tibetan text similarity calculation and further extended to Tibetan sentence similarity computation, try to compute the similarity of the Tibetan sentences, and fusion is paragraphic similarity. Finally, by paragraphic similarity fusion for text similarity. While trying to establish the more complex similarity model system, the system can Tibetan text is similar to that in the sentence to find out, and the ability to accurately enumerate what Tibetan sentences is similar.In this paper, the experimental results are evaluated according to the accuracy and the recall rate as well as the F1 value. As the corpus of the experiment is closed, it can only be used to test a probable value. In this paper,150 test texts were randomly selected from a good classification of corpus, and the F1 test value reached 67.86%, which is between the accuracy and the recall rate, the accuracy and the recall rate is roughly equal. The experimental results show that this method has a certain effect. |