Font Size: a A A

Chinese Question Classification Research Of Statistics

Posted on:2013-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2245330374465633Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Question Answer System can provide an interface which the user can access it by asking a question in a nature language way, and it returns the answer directly rather than the number of WebPages. Compared with the traditional search engine, question answering system can better express the needs of users and adapt to the user’s habits, the answer is also more accurate, faster, and more efficient, it overcomes the defects of traditional search engine. Now it is a hot research area and attracts more and more researchers. Question classification is an important part of question answering system; it provides the selecting strategy for answer extraction processing of the question answer system, so the classification accuracy directly influences the performance of the question answering system. This paper mainly did some researches on the feature selection and feature space reduction in Chinese question classification and the property kernel function. The main results are as follows:(1) Focused on the problems of data sparseness and high-dimension feature space when Bag-of-words method is selected as features in question classification, a method combined word similarity and manifold learning is proposed in this paper. In detail, the method extracts high document frequency keywords as classification features and uses the computing method of word semantic similarity to adjust the feature weight. With the help of the supervised locally linear embedding (SLLE) algorithm, the feature space of question is obtained by finding the low-dimension embedding from the original high-dimension feature space, and then the classifier is trained by the SVM (Support Vector Machine) algorithm. Experiments on more than7000questions in the Chinese question system of tourism domain were done by employing different methods, including feature extraction, adjustment of the feature weight, and dimension reduction. The results show how the presented method effectively improves classification accuracy.(2) The support vector machine (SVM) is widely used in the classification problem, but the SVM ignores the question structure of the question when uses the standard kernel function. Focus the problem above, a question property kernel function which combines dependency relationship and POS (part of speech) is proposed in this paper. First the kernel function extracts the term, POS, dependency relationship of "HED" words, dependency relationship of question words, and then, adopts the value of kernel function by computing the dependency relationship of the term, POS, and the dependency path which the two terms shared. At last, we get the support vectors by SMO algorithm. Experiments on the Chinese question system of tourism domain were done by employing different kernel functions. The results shows that the kernel function proposed in this paper can use the structure of question effectively and improves the accuracy of the classification(3)Adopted the algorithms proposed in this paper, we designed and bring out the question classification system combined the word similarity and manifold learning and the question classification system based on the question property kernel function.
Keywords/Search Tags:Question Answering System, Chinese Question Classification ManifoldLearning, Dependency Relationship, Property Kernel Function
PDF Full Text Request
Related items