Font Size: a A A

Research On Text Retrieval Methods For Cross-border Ethnic Culture

Posted on:2021-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2515306095490574Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet big data era,more and more information is transferred from written storage to network storage.Digital technology is fully reflected in all aspects of people's work and life,and more and more retrieval systems are applied to information retrieval in different scenarios.In the field of cross-border ethnic culture,the most important task at present is how to quickly and effectively retrieve cross-border ethnic cultural texts due to people's increasing demand for understanding the differences between cross-border ethnic cultures.Therefore,through the research of text retrieval in the field of cross-border national culture,this paper can make people's understanding of cross-border national culture more convenient and fast.The main work of this paper is as follows:(1)Construction of cross-border ethnic cultural knowledge graphIn order to make the results of cross-border ethnic culture text retrieval more accurate,this paper constructs a knowledge graph of cross-border ethnic culture to assist the retrieval.Through investigation and research,the local ethnic groups to be studied are Dai and Yi,and the cross-border ethnic groups include Tai,Shan,Lao and Luoluo ethnic groups.To define the classification of the cross-border ethnic cultural knowledge graph system and data model,and according to the classification system of the cross-border ethnic culture defined from the existing knowledge graph,and in the Infobox encyclopedia sites for triple knowledge extraction,a total of triple 863 access to relevant knowledge,the triple import secondary figure database to complete the cross-border ethnic culture construction of knowledge graph.(2)Cross-border ethnic cultural text classification method incorporating entity vectorAiming at the problem that the text semantic environment of cross-border national culture is complex and the feature quality is uneven,a text classification method of cross-border national culture which integrates entity vector is proposed.Firstly,the Trans E model is used to vectorize the knowledge triples in the knowledge graph of cross-border national culture,and the entity vector,the relation vector and the label vector of the entity are obtained.And use BERT is used in each word in the text of the training model for vector representation,through the entities in the text of the position information of the corresponding entity semantic vector and BERT model representation of the entity word vector,then using Bi GRU model training neural network model,finally be trained cross-border ethnic culture text classification model,using the trained model to crawl to the data classification,the classification of good data as to retrieve data.(3)Cross-border ethnic culture text retrieval method based on entity semantic extensionAiming at the sparse semantics of Query statement input by users,this paper proposes a text retrieval method of cross-border national culture based on entity semantic extension.Firstly,the Query text entered by the user is preprocessed,and the pre-processed words are mapped to the map of cross-border national cultural knowledge,and the triples containing the entity and the label information of the entities in the triples are returned.Then put these knowledge triples and entity label information through vectorization said Trans H model,and the relationship between the relevant vector and entity label to the corresponding entities vector,vector fusion after be extended entity semantic vector,the extended entity semantic vector fusion to Query entity in the vector,the entity of the Query semantic extension;Then respectively using convolution neural network to extract Query text Query and retrieval n?gram text features of text Document,the text characteristic vector of the Query and the Document similarity calculation,the text characteristic vectors which are get corresponding similarity vector,then put these similarity vector is obtained by gauss kernel function is mapped to the semantic space of new feature vector,using the sort of learning Point Wise method to calculate the correlation between the Query and the Document,finally complete model training,And the trained model is used to retrieve the corresponding Document text according to the Query.(4)Design and implementation of the prototype system of cross-border ethnic culture text retrievalDjango framework is used to build a text retrieval system of cross-border ethnic culture,which is mainly divided into four functional modules,namely entity query module,relational query module,text classification module and text retrieval module.The main function of the entity query module is to query other entities directly related to the entity according to the entity input by the user.The relational query module queries the relationship between the two entities by the two entities entered by the user.The text classification module classifies the text input by the user and determines the category label of the text.The text retrieval module is to Query the text according to the Query input by the user,and then retrieve the corresponding documents from the data set of the classified cross-border ethnic culture text by invoking the trained cross-border ethnic culture text retrieval model.
Keywords/Search Tags:Cross-border ethnic culture, Graph of cross-border ethnic cultural knowledge, Text classification, The text retrieval
PDF Full Text Request
Related items