Research And Application Of Technology Of Acquiring Enterprises Information Based On Web

Posted on:2015-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:P Z Feng

Full Text:PDF

GTID:2298330467470292

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

Web contains rich information resources, it has become the main way to acquireinformation. However, as the massive information on the Web, it is still a difficult task forenterprises to acquire information needed from massive information. Therefore, thetechnology of acquiring enterprises information based on Web has become a research hotspot.This thesis found manufactures of products based on Web starting from the enterprises â€™products and found enterprises homepages. Enterprises homepages contain a lot of enterprisesinformation about product introduction, enterprise honor, development goals and otherinformation. We get enterprises homepages and acquire enterprises informationcomprehensively and timely.The main work of this thesis is as follows:Firstly, based on naming characteristics of enterprise name, this thesis proposedenterprises name extraction mode algorithm based on LCS. Firstly, we create index based onknown enterprise information and retrieve the corresponding manufacturer by a given productname. Then, we extract the longest common subsequence based on LCS algorithm. Finally,enterprise name pattern is extracted according to the longest common subsequence andcorporate names match. Experimental results show that this method can extract the enterprisename pattern as a query expansion set of query expansion effectively.Secondly, this thesis adopted information filtering algorithm based on Bayesianclassification. The algorithm acquired enterprise homepages and filtered out non-enterprisehomepages after the pages are searched through Bayesian classifier. During selecting featuresin the classifier, the thesis proposed anchor text extracted in navigation bar method based onWeb link block. According to the inter-character spacing link to the page to identify pagesblock, we extract the anchor text whose average length is three-five words and the number ofwhich is more than two. We get these anchor text as features. This thesis selected mechanicalcategory, electrical power category, architectural building materials category, materials category and other category products to do the experiment, experimental results show that thismethod has achieved good results.

Keywords/Search Tags:

Enterprises Information, Pattern Extraction, Query Expansion, InformationFilter, Feature Select

PDF Full Text Request

Related items

1	Research And Application Of Information Extraction Based On Query Expansion
2	Application Of Synonyms In Text Feature Extraction And Query Expansion
3	Information Extraction Technology Based On Semantic Expansion
4	Information Retrieval System Based On Document Query
5	Expansion Of Dynamic Query Based On Query Log
6	Research And Realization Of Text Retrieval Technology Based On Keywords Query Expansion
7	Research On Hybrid Query Expansion Technology For Information Search
8	Query Expansion Based On User Log Clustering
9	Research On Intelligent Query Expansion Technology Based On Usersâ€™ Feedback
10	Research On Query Expansion Algorithm In Information Retrieval