| Information is expanded with the development of the Internet. More and more information reach people in the form of electronics. How to quickly extract what you are really interest in from a vast of information is just the research content of IE.Information Extraction refers to extracting specific information from non-structured data and forming structured data pool for people to search and use. Its essence is to extracting events, entities and relations that user has interest in and generate specific data output, so as to realize the information automatic searching, extracting and understanding. IE is a combination of a lot of basic national language processing techniques and it is useful in many application areas. The research into Chinese IE has a quite late start. Designing a comprehensive Chinese IE system is still on an infant stage, so it is of great practice meaning and application value to probe the applicable techniques and methods in Chinese IE.Series of related reports of a same event is the research object in this thesis. In the base of totally analysis of the text feature, we present an IE method based on sentence clustering.The main work that we do is as follows:First, doing sentence clustering based on ISODATA algorithm by step. At the same time presenting a calculating sentence similarity method which kernel is the Vector Space Module, moreover, the information of words, semantic and word strings is totally used.Second, defining and constructing a two-layer template composed of several profiles and slots. For the convenience of extracting and describing special information, profile vector and pattern feature is used to describe the profile and slot information. |