Font Size: a A A

A Framework Of Chinese And English Question Target Identification And Classification

Posted on:2020-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:W X XieFull Text:PDF
GTID:2415330590480619Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of open domain question answering systems such as Baidu Zhidao,Yahoo! and Quora,millions of questions and posted answers are accumulated over time and accelerate the research of question answering.Thus,how to utilize the accumulated question-answer pairs to effectively serve the web users is the most important task for question answering.Question answering is one of the most fastgrowing and challenging tasks in natural language processing which aims to build systems that are able to answer users' free form questions automatically.Besides,for a question answering system,question target classification is a vital step with an aim to accurately assign labels to questions based on an expected answer type and give semantic constraints on candidate answers.The accuracy of an answer that provided by a question answering system heavily depends on the expected answer type of the classified question.Many research of question understanding and classification are conducted especially in English question target classification with a large amount of published datasets.For the complexity of syntactic structure,inconsistence of grammar rules and the characteristic of the language itself,the analysis and understanding of the intent of Chinese questions is more difficult than that in English questions.In addition,most of the existing question understanding and classification methods are language-oriented and unable to apply to multi-language question answering systems.Thus,aiming to systematically investigate the existing question target classification approaches,analyze the linguistic,syntactic and semantic structure of Chinese and English questions,this paper proposes an automatic Bilingual Question Target Identification and Classification Framework,named as Bi-QTFrame.This framework is expected to be applied to question answering system,websites or human-computer dialogue system to detect the user intent of a posted question and expand the syntactic and semantic features,as well as to assist answer extraction process.The proposed Bi-QTFrame mainly consists three modules: question analysis,question target feature expansion and question target classification.Aiming at tackling the difficulties of existing research,the main contributions of this work are as follows:1)Based on the discovered the characteristics and differences of these dual language question dependency relations,a question target identification and classification approach is proposed.2)Based on the characteristics of both Chinese and English questions,a concise yet effective question target feature set is proposed which consists of two kind of features,i.e.concrete feature and abstract feature.The concrete feature is the feature that maintains the semantic information and characteristic of the question itself.For instance,the character feature of Chinese questions,the N-gram feature of English question and question target words.The abstract feature is the abstract representation of common feature among the questions within a category,i.e.the hyper-semantic concept of extracted question target and the way of asking question.The higher the abstract level of features,the less the possibility of guessing,and the better the benefit to the classification process.For the posted questions are always short,lack of context and feature localization issue,hyper-concepts are proposed to expand the semantic and hypernym information of extracted question target words,so as to give semantic constrains on candidate answers.Besides,for those questions that do not contain a question target but are similar in the way of asking question,the concise syntactic feature is proposed to better preserve the characteristics of question asking format of different type's questions.3)The proposed Bi-QTFrame is applied to both Chinese and English question target classification task.The experiment datasets are publicly available annotated question datasets,provided by the First Evaluation of Chinese Human-Computer Dialogue Technology,the University of Illinois at Urbana-Champaign(UIUC)and Text Retrieval Conference(TREC).The results shows that the proposed method gains 94.8% of F1 on Chinese question classification and 87.2% of accuracy on English question classification The performance outperforms the state-of-the-art baseline methods,demonstrating its effectiveness in bilingual question target classification.The Bi-QTFrame is expected to provide consistent and meaningful guidance in the research of multi-language question target classification.
Keywords/Search Tags:Question Target Classification, Question Target Words, Question Target Feature, Concrete Feature, Abstract Feature
PDF Full Text Request
Related items