Font Size: a A A

Internet Chat Robot In The Key Technology In The Teaching Of Chinese Ethnic Minorities

Posted on:2013-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z P YangFull Text:PDF
GTID:2215330374458636Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
As a computer tool for human's life, the Internet chat robot had been concerted recently years. It technology also had been became more mature with the continual development of the information technology. The robot "xiaoi" actives on the Internet for at presently can inquire the knowledge of daily life such as weather situation for people convenience, chat with people, and learn language, play and leisure by people. Because of the Internet chat robot have chatting and learning function, this paper designed and studied the application system of robot that applied in Chinese teaching in the minority areas, and rounded the key technology of the chat robot-natural language understanding carried out an in-depth study.The study of chat robot technology starts from Chinese word segmenting in text. Because Chinese has its characteristic features, between words is not separate out by dominant marker such as blank, so Chinese word segmenting became the bottlenecks of natural language understanding. Solving well the Chinese word segmenting is decided the performance of the Internet chat robot. So the main content-, key technology and innovation point of this paper as follow:Firstly, this paper has adopted the way based on the statistics natural language and collected thirty texts from the teaching material of national minority in five/six book to built a little scale Chinese corpus. The work include brokering words punctuate,part of speech tagging and statistical analysis for corpus. Part of speech tagging has always been strict accordance with 《Chinese dictionary》to label. This work is a long-term and heavy work. It needs a large numbers of manual labors to work. All works will provide data service for follow-up.Secondly, Because Chinese existed ambiguity and unlisted word phenomenon, so the key problems of solving Chinese word segmenting are ambiguity resolution and unlisted word distinguish. This text from the trait of shortest path algorithm has fast and highly active to solve the ambiguity and unlisted word, but which did not decide optimum relation under the number of shortest paths, to propose a better way-second shortest path. The second shortest path inherits property of shortest path. The second shortest path has played a better role to ambiguity and unlisted word phenomenon from experience. It's the first contribution for text.Thirdly, this paper analyzed how to build the structure of HMM and parameter training in Chinese segment. At the same time, it adopted net charts explaining the process of Veterbi algorithm in detail. The advantage of this way can clearly observe the process of segment what the sentence of un-segment can seek the best phase to segment according to the idea of Veterbi. At presently, the theory of Veterbi is mature, but it's lack of description of segment in practice in document. So this work can be consulted for new people who are just to enter research. It's the secondly contribution for text.Fourthly, combining path what searched from the second shortest algorithm, to seek effective result of segment according simulation testing to proof the Veterbi algorithm under the condition of having trained well HMM model. From experience, HMM algorithm which combining the second shortest path algorithm can improve the efficiency of HMM, the rate of accuracy and recall also improved. It is the fourth contribution for this subject. It's the thirdly contribution for text.Fifthly, all works of text are based on programming. The idea stem from a paper that author published. The paper clearly discussed a new programming method avoiding complicated process design. This method proposed the logical structure of programming is independent of storage structure. It mustn't change the logical of algorithm, to change the storage structure of algorithm, such as using array store complicated grapy structure to run depth-first algorithm of graph. Before this, almost documents are based on graph, such as adjacency list. Through the new method, a set of programming tasks became much simpler. The text applied this way in Veterbi algorithm, only changed input-output interface of algorithm, but it's mustn't change HMM model parameters to find proper array of part speech. This new way can be used by many classic algorithm of programming. It's the fourth contribution for text.
Keywords/Search Tags:the Internet chat robot, natural language understanding, corpus, Hidden Markov Model (HMM), Veterbi
PDF Full Text Request
Related items