| Paraphrase is a different expression of the same sentence.It is very common in natural language,which reflects the flexibility,diversity and complexity of human language.Paraphrase mainly has three research contents: one is paraphrase extraction,that is to extract key words from the original text and recombine these words or vocabularies and keeps the same semantic meaning with the original sentence.Second is paraphrase recognition,i.e.it is to find out different language units such as sentences or paragraphs of the same meaning from the corpus.Third is paraphrase generation,in other words,first to input a sentence and then,output a sentence text with the same semantic meaning.Paraphrase methods include from phrases to phrases,sentences to sentences and so on of the same length paraphrase,as well as from words to phrases,from phrases to sentences of different length paraphrase.This dissertation mainly studies the paraphrases generation of Tibetan declarative sentences.Firstly,to classify Tibetan sentences and extracts declarative sentences.Then,carries out semantic analysis of the sentences,and construct a Tibetan paraphrase sentence corpus.Finally,Tibetan paraphrasing sentences are automatically generated through machine learning.This dissertation includes the following five major problems and solutions:1.Research on Tibetan sentence classification method based on recurrent convolutional neural networkThe classification of Tibetan sentences is a research topic that has not been paid attention to in the field of Tibetan linguistics and natural language processing,which is rarely seen in relevant literature.The research object of this dissertation is the automatic generation of paraphrase of Tibetan declarative sentences.The difficulty of this research is that the traditional sentence classification methods of other languages are not suitable for Tibetan sentence classification because Tibetan language does not have special punctuation marks to identify different types of sentence.In this dissertation,the context information and characteristic functions of Tibetan sentences are used as the basis for recognition and classification of the sentences,and the circular convolutional neural network is used to identify and classify Tibetan sentences after a fully analysis of the characteristic information of Tibetan different sentence types.The experimental results show that the average accuracy of identifying and classifying Tibetan sentences is 85.61%,and the recall rate is 86.54%,and the F valued 85.59%.2.Research on semantic segmentation method of Tibetan sentences based on dilated convolution networkAt present,the research content and method of Tibetan sentence meaning only focuses on syntactic analysis.Therefore,there is no specific research method for the study of Tibetan sentence meaning understanding,and there is still a big gap between the study of Tibetan sentence meaning understanding and other languages.To study Tibetan paraphrasing generation,the essential difficulty is that the paraphrase of Tibetan declarative sentences can only be generated after understanding the meaning of the original sentences.The difficulty of this problem is that in the semantic segmentation of sentences in other languages,word is usually used as the segmentation unit.However,if word is used as the segmentation unit in Tibetan,many lexical ambiguity and semantic sequence decoding instability occurs due to the excessively fine particles.This dissertation proposes a new method for the segmentation of semantic units based on the analysis of Tibetan language features and the rules of language coding combination.The length of this semantic unit is above the word meaning and below the sentence meaning,and it integrates the grammar,semantics and context.Then it divided Tibetan sentences semantically by using the hollow convolutional neural network.The experimental results show that the accuracy of the model is 92.39%.3.Research on the construction method of paraphrasing sentence corpus based on Tibetan words order and semantic dictionaryThe size and quality of data resources in machine learning directly affect the learning results.In this dissertation,a large scale of Tibetan data resources is needed in the research process to generate the paraphrase of Tibetan declarative sentences.The difficulty of this research is that at present,there is not a large-scale and high-quality of Tibetan data resource for machine learning publicly available at China or abroad to,more over,it lakes the data set of paraphrase sentences.In order to solve the problem of lacking data resources of paraphrase sentences in Tibetan,this dissertation proposes to construct a new paraphrase corpus resource in Tibetan by using the methods of Tibetan word order transformation and Tibetan semantic dictionary.The experimental results show that the accuracy of Tibetan paraphrase based on word order transformation after manual evaluation is 97.31%.And the accuracy rate of paraphrase based on Tibetan semantic dictionary is 93.33%.4.Research on the generation of Tibetan paraphrase sentences based on attention mechanismIn recent years,with the application of paraphrase research results in machine translation,automatic question and answer,information retrieval,information extraction,text generation,reading comprehension and other related research,more and more researchers and research institutions began to pay more attention to paraphrase research.However,at present,there is no relevant literature that studies the generation of paraphrasing sentences in Tibetan declarative sentences by using attention mechanism.This dissertation attempts to apply the attention mechanism to the study on the automatic generation of paraphrase in Tibetan declarative sentences in order to expand the existing data resources of Tibetan retell sentences.Based on the constructed paraphrase data resources above,this dissertation proposes an automatic generation method of Tibetan paraphrase sentences based on the attention mechanism.The experimental results shows that the BLEU value of Tibetan paraphrase sentences is 40.38%.5.Research on the automatic definition generation of Tibetan neologisms based on attention mechanismWith the progress of human society and the development of scientific technology,new terms and vocabularies emerge constantly.Currently,the interpretation of Tibetan new terms cannot meet people’s needs,therefore,in order to solve this problem,this dissertation attempts to use machine learning method to automatically interpret Tibetan new terms.This dissertation proposes an automatic definition generation method for Tibetan neologisms based on attention mechanism.The experimental results show that the accuracy of dictionary original meaning generation is 87.17%,and the definition accuracy of new words is 80.32%.In this dissertation,various methods are used to construct a large-scale data resource of Tibetan paraphrase sentences,and the automatic generation of Tibetan paraphrase sentences based on machine learning is studied by using these data resources.This study has obtained good preliminary results,and it is hoped that these results can be provided as reference for the study of natural language understanding in Tibetan. |