Font Size: a A A

Intelligent Search Engine Based On National Standard Of UCL

Posted on:2021-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2518306476953299Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of Internet and the explosive growth of digital information,all kinds of massive and fragmented contents are emerging.How to retrieve effective information from heterogeneous data is a great challenge to search engines.The traditional search engine retrieves information from the Internet through keyword matching.Then it can return to the user related links.In this way,search engine can not accurately understand the user's search intention,and the result is too single,and the semantic information contained is not rich enough,requiring the user to do multiple retrievals.In order to improve the disadvantages of the above search mode,knowledge-based search engine has attracted widespread attention in the industry.The core of knowledge-based search engine technology is to construct a knowledge graph.At present,some research on knowledge graph has made progress.But its way of expressing knowledge is mainly to organize the structured data in the form of relatively simple triples,and the rich semantic information is not rich enough.Uniform Content Label(UCL)can effectively aggregate the disordered heterogeneous content on the Internet,so that users can quickly and conveniently get rich semantic information in Internet.Therefore,it is of great prospective significance to study how to make full use of the advantages of UCL to enrich the semantic vector coding of Internet information,and build content-centric intelligent search engine.For this reason,in view of the advantages of UCL,this thesis build UCL Knowledge Graph(UCL Knowlegde Graph,UCLKG),and research on intelligent search related technologies.An entity disambiguation algorithm based on similarity of semantic environment and a relational reasoning algorithm based on representation learning and UCL semantic perception are proposed to realize the UCL Knowlegde Graph.The dynamic theme mining algorithm Dynamic Latent Dirichlet Allocation for Search Environment(DLDA_SE)and the query generation algorithm based on Semantic Depency Parsing(SDP)dependency analysis are proposed to improve the search engine's recognition of user search intentions and knowledge semantic analysis ability.The main research work in this thesis is as follows:(1)Combining with the requirements of intelligent search engines,a method for constructing UCL knowledge graph based on semantic fusion is proposed for the problem of how to semantically associate heterogeneous data in the Internet.First,This method analyzes the offline corpus of Wikidata and Baidu Encyclopedia,and combines the information extraction tool to extract entity information to complete the construction of the basic knowledge base.Then calculate the semantic weight of the content entities in UCL,and use the entity disambiguation algorithm to merge UCL with the basic entity base.Finally,a relational reasoning algorithm based on representation learning and UCL semantic perception is proposed to realize the automatic updating of UCL knowledge graph based on entity disambiguation algorithm.(2)Focusing on the problem that traditional search engines can not effectively identify users' intentions and lack the ability of semantic parsing of content.The intelligent search engine based on UCL mainly deals with user search information from two aspects.One is to provide personalized search with user interest as the center.To fully exploit the semantic content of users' historical behavior information in the Internet,a dynamic topic mining algorithm named DLDA_SE is proposed to identify the user's search intention.The search results are sorted according to the user's intention and the topic relevance of the UCL file.Second,the content semantic analysis service is provided at the center of knowledge.A query generation algorithm based on SDP dependency analysis is proposed,which translations natural language queries searched by users into database query statements and directly acquires knowledge.(3)The prototype system of intelligent search engine is realized,and the related algorithm is verified by experiments.The experimental results show that compared with the traditional entity disambiguation algorithm,the entity disambiguation algorithm based on semantic environment similarity has better effect of disambiguation.Compared with the traditional relational reasoning algorithm,the relational reasoning algorithm based on the representation learning and UCL semantic perception has a better ability to distinguish between "one to many" and "many to many" type relations.Compared with the traditional LDA algorithm,the DLDA_SE algorithm is more suitable for dynamic topic mining on online corpus,and the query generation algorithm based on SDP analysis has better conversion ability to the four basic problems defined in this thesis.
Keywords/Search Tags:search engine, UCL, knowledge graph, topic mining, semantic analysis
PDF Full Text Request
Related items