Font Size: a A A

Research And Implementation Of Internet Standard Resource Retrieval And Recommendation

Posted on:2015-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:H T YuFull Text:PDF
GTID:2208330431476821Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Standards refers to the related unified regulations of the industrial and agricultural productions and engineering construction, include the quality of constructions, methods of inspection and demands of technology, etc. All of those regulations and principles need to be obeyed by all related departments. Without standards the production management of enterprise and public institution will lack of reference and theoretical basis. The demand of general public for standards is increasingly urgent. In order to better promote the related standardization works of Yunnan Province, recommend valuable information from vast amounts of standards resources for enterprises and community. In this thesis, we study the crawling and recommended methods for standard resources. We had accomplished innovative achievements as follows:(1)This thesis discussed a method of extracting entity data from Deep Web precisely, designed an entity extraction system, which will extract data from Deep Web automatically. Firstly, designed a web crawler based on the characteristics of Deep Web, take advantage of the web crawler to get resources from Internet; Secondly, the pretreatment of web resources, normalize the pages which are non-standard; Finally, locate and extract the entity data from Deep Web accurately, in this thesis, based on the hierarchy and layout features in DOM tree, combined XPath with RegExp to locate entity data, then stored the extracted entity attributes and attribute values.(2)We proposed a method to construct standard bibliographic topic model based on semi-supervised graph clustering. We first analyze the attributive character and structural features of standard bibliographic documents, extract some structural information that could responds standard bibliographic topics. Combined with the Hierarchical structure characteristics of ICS and CCS, we define and extract the association relationship features among standard bibliographic documents. Then, using Expectation-maximization method to calculate the weights of different association relationships, calculate correlations among standard bibliographic documents and build undirected graph model for standard bibliographic documents. Finally, according to the marked association relationship features as supervised information for clustering, we use semi-supervised graph clustering algorithm to cluster for standard bibliographic documents, get the topic words of standard bibliographic, realize the construction of the standard bibliographic topic model, which lays the foundation for the subsequent evaluation standard bibliographic recommendation.(3)This thesis proposed a standard resources recommended method based on topic model. Firstly, construct the user attention model, using LDA topic model to analyze the standard resource which user concerned, generate topic words from documentation, combined with the user’s registration information on the basis of topic words, generate user attention label automatically, integrate many users concerned labels, create the user attention model to describe their concerns industry information; Secondly, using standard bibliographic model which has been constructed and user attention model to complete the correlation analysis; Finally, according to the correlation using Top-N algorithm to recommend standard bibliographic, then recommend the highest standard resource to the user.(4)This dessertation designed and implemented an crawling and recommendation prototype system for standard resources, this prototype system could provide convenience for the further research of standard resources recommendation method.
Keywords/Search Tags:Standard Resources, Deep Web, Accurate extraction, semi-supervisedgraph clustering, Similarity calculation
PDF Full Text Request
Related items