Font Size: a A A

Research Of Chinese Text Information Extraction In The Information Platform For Development Strategy Study Of Equipments

Posted on:2004-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuangFull Text:PDF
GTID:2132360152957103Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In the development strategy study of equipments, it needs to know the situations of the equipments in the study, design, manufacture, usage, maintenance, renew and the feature of the equipments in time. It also needs to discovery the trends and characteristics in the development of equipments abroad. So, how to extract structured equipments information from large volumes of texts instantly and quickly, and to discovery the hidden knowledge has become the emergent problem in the development strategy study of equipments.As a kind of technologies that can automatically extract specified structured information from plain text, the information extraction gets more and more importance. It takes use of the theory and technology of parsing, but its objective is to extract the pertinent information. So it need not parse text fully and gets rid of the difficulties of full text understanding. It is a kind of technologies that can be put into action now. Aim at the problem that how to extract the information of equipments from the semi-structured text, this paper has studied the techniques in the named entity discovery and the text structuring under the thought of the need of the development strategy study of equipments.The main jobs in this paper lie in:1. It has studied the statistics-based named entity discovery methods, and improved the N-gram statistics method and the self-increasing method. And the main work of it is to analyze the features of the self-increasing pattern, and to utilize the statistics and POS information to filter the primary patterns. It improved the precision of the named entity discovery.2. It has studied the rule-based text structuring method. In the experiment, it has studied the establishment of the rule base, the classification of the rules and the extraction method of the bi-gram relation, and set up the framework of the text structuring.3. It has designed and implemented the information extraction subsystem using the above two techniques, in which the rule can be expended.
Keywords/Search Tags:Named Entity Discovery, N-gram Statistics, Self-increasing Statistics, Text Structuring
PDF Full Text Request
Related items