Web contains rich information resources, it has become the main way to acquireinformation. However, as the massive information on the Web, it is still a difficult task forenterprises to acquire information needed from massive information. Therefore, thetechnology of acquiring enterprises information based on Web has become a research hotspot.This thesis found manufactures of products based on Web starting from the enterprises ’products and found enterprises homepages. Enterprises homepages contain a lot of enterprisesinformation about product introduction, enterprise honor, development goals and otherinformation. We get enterprises homepages and acquire enterprises informationcomprehensively and timely.The main work of this thesis is as follows:Firstly, based on naming characteristics of enterprise name, this thesis proposedenterprises name extraction mode algorithm based on LCS. Firstly, we create index based onknown enterprise information and retrieve the corresponding manufacturer by a given productname. Then, we extract the longest common subsequence based on LCS algorithm. Finally,enterprise name pattern is extracted according to the longest common subsequence andcorporate names match. Experimental results show that this method can extract the enterprisename pattern as a query expansion set of query expansion effectively.Secondly, this thesis adopted information filtering algorithm based on Bayesianclassification. The algorithm acquired enterprise homepages and filtered out non-enterprisehomepages after the pages are searched through Bayesian classifier. During selecting featuresin the classifier, the thesis proposed anchor text extracted in navigation bar method based onWeb link block. According to the inter-character spacing link to the page to identify pagesblock, we extract the anchor text whose average length is three-five words and the number ofwhich is more than two. We get these anchor text as features. This thesis selected mechanicalcategory, electrical power category, architectural building materials category, materials category and other category products to do the experiment, experimental results show that thismethod has achieved good results. |