Font Size: a A A

Application-oriented Chinese Separable-Words Recognition

Posted on:2020-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhaoFull Text:PDF
GTID:2415330578974937Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The automatic recognition of Separable-Words has an important influence on many fields such as English-Chinese machine translation,information retrieval,speech recognition and so on.The existing research on automatic recognition of Separable-Words mainly focuses on a small number of cases of Separable-Words,while the research on large-scale corpus about Separable-Words is still lacking.To study the automatic identification of Separable-Words,this topic uses the Xinhua News from 1991 to 2004 as the original corpus.Not only use the large-scale corpus,but also cover a wider range.The main contents of this thesis are organized as follows:The construction of Separable-Words vocabulary and candidate Separable-Words corpus.This thesis can extract the original candidate corpus from the original corpus and generating the Separable-Words vocabulary dynamically through the original candidate corpus.The Separable-Words vocabulary this thesis construct is more correct as the Separable-Words have strong regularity,and does not depend on the existing Separable-Words vocabulary which has been labeled.With the Separable-Words vocabulary,this thesis can then filter the original candidate corpus and get the candidate Separable-Words corpus for automatic recognition.Research on the automatic recognition method of Separable-Words based on rule matching.Firstly,this thesis judged the Separable-Words from candidate Separable-Words corpus by rule matching.The results of the experiments show that the rules matching can achieve a high precision in the corpus that this thesis extract.Research on automatic recognition method of Separable-Words based on machine learning.The method of rule matching can get good results for sentences which have strong regularity.However,for sentences that cannot be recognized by rule matching,this thesis use machine learning method to tacle the problem.Firstly,this thesis designed the feature templates according to the characteristics of sentences in corpus.Secondly,this thesis used feature templates to extract the features of positive and negative sentences.Finally,this thesis used K-nearest neighbors and support vector machines to classify sentences.The results of the experiments show that the machine learning method has achieved ideal results in Separable-Words recognition tasks.Research on automatie identification method of Separable-Words based on neural network.Machine learning methods can recognize sentences with weak regularity.However,this method has a great disadvantage that it needs to design a large number of feature templates and the selection of features will have some impacts on the experimental results.The method based on the neural network can avoids this problem very well.Therefore,this thesis designed the CNN+LSTM+Attention model to recognize the Separable-Words automatically.The experimental results show that compared with machine learning method,the recognition effect of this model has been improved.The construction of joint model for automatic recognition of Separable-Words.Through the method this thesis proposed,this thesis build three joint models:(1)rule matching+KNN,(2)rule matching+SVM,(3)rule matching+neural network for Separable-Words recognition.Finally,the recognition results of the three joint models are synthesized by voting.The experimental results show that the joint model has a good recognition effect which proved that the joint model this thesis designed has strong practicability.
Keywords/Search Tags:Separable Words, Automatic identification, Large-Scale Corpus, Machine Learning, Neural Network
PDF Full Text Request
Related items