Font Size: a A A

Research On Automatic Chinense Q&A System Based On Syntax Analysis And Machine Learning

Posted on:2008-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:A SunFull Text:PDF
GTID:2155360242493979Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The research on Automatic Question and Answering system (QA) develops because of two aspects: to face the objective challenge of information explosion and to meet people's subjective requirements of quick and accurate acquisition of information. It is gradually becoming the frontier of Natural Language Processing (NLP) and Natural Language Understanding (NLU).This paper first analyzes the whole architecture of QA, and gives a detailed overview on the tasks and solutions of the three important modules of QA. Then this paper proposes a Chinese question classification approach based on the combination of the analysis of Chinese question sentence pattern and Support Vector Machines (SVM). This paper also proposes a binary-classification approach for answer extraction based on Maximum Entropy Model (MEM).The overview part of this paper gives an in-depth description of the tasks and solutions of the three important modules of QA; especially analyze and summarize the solutions of question analysis and answer extraction, which serve as the two most important sub-modules of QA. This paper points out that the feature extraction for question classification based on the syntax analysis of question sentence and the classification approach based on machine learning are becoming the technology trend of question analysis, and that syntax analysis and machine learning are becoming the two most important components of answer extraction.In the question analysis module, this paper proposes for the first time that determining the predicate according to the principle of the shortest distance between the likely predicate word and the question word, and that analyzing the Chinese question sentence pattern according to the distance information between the question word and the predicate word. Then, based on the analysis of Chinese question sentence pattern, we extract question word, predicate word, subject word and object word as features for question classification. At last, we conduct question classification experiment based on SVM, and its accuracy achieves 95.87%.In the answer extraction module, this paper proposes for the first time a binary-classification approach for answer extraction. We first extract the sequence of words and their part-of-speech (POS) tags, keywords, question word, subject, predicate and object from the question sentence as question feature set based on the analysis of its sentence pattern; then we extract the sequence of words, the sequence of words'POS and the POS of correct answer word from answer candidate sentence as sentence feature set based on shallow parse of that sentence; then we obtain combined feature set by combining question feature set and sentence feature set. Finally, we apply MEM to combined feature set to train answer classifiers. The good performance of the experiment confirms the feasibility of this approach.
Keywords/Search Tags:Automatic Chinese Question Answering, Question Classification, Answer Extraction, Analysis of Sentence Pattern, Machine Learning, Support Vector Machines, Maximum Entropy Modle
PDF Full Text Request
Related items