Font Size: a A A

Q & Automatic Access To Research

Posted on:2009-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:X Y MengFull Text:PDF
GTID:2208330332478193Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The scale and quality of FAQs are the critical ingredient to question answering system based on FAQ base. Until now, the FAQ base is built artificially, so it is a time-and effort-consuming work. The author does a series of research which includes retrieving FAQs from HTML pages, filtering Domain FAQs and Related FAQs in the process of FAQ organization. The main innovative achievements are as follows:(1) This paper presents a method based on DOM tree to retrieve FAQs automatically from HTML pages. This method parses HTML page into a DOM tree and chooses the text nodes of the tree as a candidate for the FAQs, and then obtains the classify characteristic according to the text nodes and the structural information of DOM tree, and then realizes retrieving FAQs from HTML pages automatically by applying improved Bayes classify learning algorithm to build a classification model. The experiment results showed that the FAQ-retrieving method has a very good effect.(2) This paper presents a method of Domain-FAQ-filtering combing syntactic structures relationships and domain characteristics. We constructs the domain knowledge base by researching the method of constructing domain knowledge base, meanwhile, combines the characteristics of Yunnan tourism. The author chooses the sentence trunk and domain terms as classified characteristic based on syntactic analysis, uses improved Bayes sorter to filter domain FAQs. The experiment results showed that this method can achieve a remarkable effect.(3) According to the characteristics of FAQs, this paper presents a method of Related-FAQ-filtering oriented words-combination and natural language sentence. By calculating the similarity between words-combination or sentences and FAQs, we can determine whether the two related or not. This method, based on HowNet, calculates the semantic similarity between words, and extracts question syntactic interdependence pairs by applying syntactic analysis, and calculates the similarity among question syntactic interdependence pairs. So, we realize the FAQ similarity computing combined lexical, syntactic, and semantic. The experiment results showed that this related FAQ filtering method can get a good effect.(4) Based on the above researching achievements, we designed and implemented the HTML page FAQ-retrieving system, the Yunnan tourism FAQ-filtering system, and the related FAQ-filtering system oriented words-combination and natural language sentence.
Keywords/Search Tags:Question Answering System, Restricted Domain, FAQ retrieving, Domain FAQ filtering, Related FAQ filtering
PDF Full Text Request
Related items