| As more and more users tend to present queries in their own ways, question analysis comes to a hot point in question answering system research. However, current question understanding techniques cannot handle complex factoid questions perfectly.The main substance behind this thesis is how to decompose complex factoid questions effectively and enhance the ability of analyzing complex factoid questions in future question answering systems. Facing the task of complex factoid question decomposition, the main content of this thesis can be described as follows.First, as decomposition of complex factoid questions is just a burgeoning research topic with short history, accumulation of previous work is very little. To build the basis of complex factoid question decomposition task, this thesis presents a set of annotation rules that is detailed, formal and easy to apply. With this set of annotation rules, we had built a complex factoid question decomposition corpus in high quality. After analyzing the corpus we had built, we sum up a question decomposition taxonomy that includes atomic-decomposable questions, paralleldecomposable questions and nested-decomposable questions.Secondly, we present a framework for complex factoid question decomposition, and explain the role of question decomposition category classification in this framework. After pointing out the difference between complex factoid questions and text data in other formats, we applied a tree kernel based method to classify the decomposition categories. Experiments show that the tree kernel based method can greatly take advantage of syntactic structure features in complex factoid questions, and complete the classification task quite well.In the last step, taking lessons from binary code in computer science, this thesis presents a concept of complex factoid question decomposition tag, which encodes both length information and content information of corresponding sub-question series.With help of this type of tag, we introduce two methods to build sub-question series, namely the sub-question series generation method based on dependency parsing, and the sub-question series generation method based on sequence labelin g. The prior method incorporates a transition-based neural network dependency parser, while the latter includes a linear chain conditional random fields model. Finally, by full experiment results in detail and comparisons with previous work, the two proposed methods show their effectiveness in complex faction question decomposition task. |