| In today's information explosion age,the technology of events'information extraction,which is based on event frame, can better satisfy the need of getting valid information from Internet.By analyzing the news corpus,we predefine three kinds of breaking news' event frame and thus deal with each news' flank in customized methods.By the use of POS tagging on news article,querying in location database and defining rules based on corpus study,we can effectively extract news event's flank information such as time,location and results.The complexity and the dynamic changing of news events cause such a problem: the static frame structure restricts extractable contents. In order to solve this problem in information extraction system, we propose a new technology called events' new flank detection,which uses automatic detection to find out undefined flanks.To take fully advantage of the POS,word order and the relations between words in sentences,we use word pair feature model to extract features and select paragraph-oriented LSA clustering algorithm to implement new flank detection.According to the testing results on the prototype system on three kinds of breaking events corpus, it is proved that the methods in this thesis are feasible. The extraction of breaking news' elements reaches high precise and recall rates. The results of event new flank detection show uniqueness of single event and several common points in events of same kind, which are not included in the event frame.The experiment results ensure the application foreground of this research. |