| The current search engine is mainly based on the key string match, users can search information by entering keywords only, and it can not access the required information accurately. The search engine based on natural language question-and-answer can make up for the inadequacy of the former, and it becomes an important development trend of the next generation of search engines. Nowadays, there are a lot of researchs about open domain question-answering(QA) system, including the questions about people information, time, location, historical events and professional technology. Moreover, the research and application about the strict domains QA system which face the questions above will accelerate the development of the open domains QA system.This paper focus on the biography ansewer exracion solution and the realization of the biography QA prototype system, including some of the following:Firstly, this paper reviews research status, related concepts and current technology about the biography QA system, additional, analysis its application requirements and text features.Secondly, a new biography answer extraction solution based on frequent sub-tree mining is proposed. This solution utilizes the grammar analysis tools change the sentences of the template corpus into syntax trees, and then, uses the frequent subtree mining algorithm--TreeMiner mines the frequent patterns of the template corpus, in order to generate the template library. Afterward, it translates the sentences of the candidate answer set into syntax trees as the same way before, and uses the pattern match algorithm compute the similarity between the candidate answer and the template, in order to check the answer sentence. To verify the effect of frequent sub-tree mining based answer extraction solution, we build a contrast experiment by comparing a frequent sequence mining based answer extraction solution which was a common technology on answer extraction.Subsequently, designs and realizes a biography QA prototype system. The design approach mainly about: the prototype system data flow design, the prototype system features modular design and the relevant data structure design. The realization includs: the solution of uses the web page analysis tool—-HtmlPaer to extract the biography sentences and related documents from the web page, the solution of uses the StanfordParser as a grammar analysis tools to generate the syntax trees and the procedure detail of answer extraction based on TreeMiner algorithm. As summary, we reveal the interface of the biography QA prototype system, and analysis its performance and application prospects.Finally, this paper concludes by summarizing the research and indicating its future work. |