In the age of the global information economy, enterprise organization is more powerful and the information is much richer than ever before. Enterprise data includes structured data and unstructured data, the structured data refers to the data that can be efficiently stored and access in a relational database, while the unstructured data refers to the data that is inherently unsuitable to be stored and retrieved in a relational database. As Merrill Lynch has recently estimated, more than eighty percent of the potentially valuable commercial information is stored in the unstructured data, which means that the most valuable corporate information is not effectively analyzed and utilized.This problem is particularly obvious in the field of securities and stock. The present information technology in the field of securities and stock is highly developing and the support systems become gradually perfect. But for the massive amounts of unstructured electronic data that generated from enterprise business, such as various business reports, the latest development information in the industry, policy information and so on, there is still not effective method to analyze and utilize. Most of these data is present as the form of electronic documents and it is an important basis for corporate decision-making. Therefore, how to obtain the necessary information from the vast amounts of electronic information quickly and accurately becomes the major issue for enterprise in the management of information.The traditional keyword-based full-text retrieval technology only provides literal match on the query keywords and has limited understanding on semantic level. Thus there is too much useless information in the retrieval results, which finally leads to relatively low retrieval precision and recall rate. What’s more, the traditional full-text retrieval is unable to meet the demand of knowledge reasoning and associated data mining.In view of the above questions, on the basis of the ontology technology and full-text retrieval technology, this paper proposes a stock information semantic retrieval method based on domain ontology. This method establishes inverted indexes for stock electronic information based on the full-text search engine--Lucene, introduces the ontology technology to the traditional full-text search engine, makes full use of the ontology’s knowledge performance, knowledge sharing and logic inference function, then proposes an ontology-based semantic retrieval model.On the building of ontology, this paper uses the Protege tool to build a stock ontology -- SO, then crawls stock knowledge from SINA finance website based on the SO, and stores the stock knowledge in form of OWL ontology file to build the Stock Ontology Library. In the aspect of concept semantic similarity computation, this paper takes full account of the semantic distance between ontology concepts, semantic coincidence degree, the conceptual level, the density of the concept region, ontology concept’s object properties factors, then proposes a computing method of concept semantic similarity and semantic correlation based on domain ontology. In addition, in the aspect of Chinese Segmentation, this paper uses the Hidden Markov Model-based Chinese characters position probability to overcome the defects in new word discovery of IKAnalyzer. The results of experiments show that this method achieves good effect on new word discovery.Based on the above semantic retrieval model, this paper designs and implements the stock information semantic retrieval system based on stock domain ontology, which is called SISRS-SO for short. SISRS-SO has clear hierarchical design and module division design. In order to test the performance of the semantic retrieval model, this paper does experiments by comparing this model with traditional keyword-based full-text retrieval model. SISRS-SO system supports semantic analysis and semantic reasoning in the retrieval by introducing ontology technology. The results of experiments show that SISRS-SO is much better than the traditional keyword-based full-text search system in terms of retrieval precision and recall rate. |