Research And Implementation Of Short Text Topic Extraction Based On Document-Word Co-Occurrence Graph

Posted on:2023-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Wei

Full Text:PDF

GTID:2558307061453834

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Chinese public cultural resources contained in various cultural places such as libraries,museums,art galleries are huge and diverse,but it brings difficulties to users’ choice.Constructing a public cultural knowledge graph and effectively organizing and managing public cultural resources is conducive to the co-construction and sharing of public cultural resources.Extract resource topics from the text description information of public cultural resources and complete the knowledge graph of public cultural resources,enhance the semantic characteristics of cultural resources,and facilitate the description of users’ interests and preferences,which is conducive to the realization of personalized recommendation of public cultural resources,and promotes public cultural dissemination.However,public cultural resources have limited description information and short texts.Due to sparse data,traditional topic extraction methods cannot effectively capture the semantics of words and texts in short texts,and the extracted topics have low accuracy and quality.In order to achieve high-quality topic extraction from short texts of public cultural resources,this master’s thesis conducts in-depth research on short text topic extraction,proposes a short text topic extraction method based on document-word co-occurrence graph,and applies it to public cultural knowledge graph system,so as to provide support for the construction of high-quality public cultural knowledge graph.The main work of the thesis includes:(1)Aiming at the problems of poor generalization ability and insufficient semantic representation of existing short text topic models,a document-word co-occurrence graph construction method based on the BERT model is proposed,which intuitively represents the relationship between text and topic words in the form of a graph.At the same time,the BERT word vector model is introduced to enhance the semantic association of co-occurring topic pairs,so as to obtain the overall topic distribution of the text corpus and the topic range to which each text belongs,and improve the quality of the topic words.(2)Aiming at the difficulty of topic inference in existing short text topic models,a topic inference method based on document-word co-occurrence graph mutual information maximization is proposed.We train the document-word co-occurrence graph use a graph embedding model based on maximizing node-graph mutual information,so that words under the same topic have similar representations,and word representations under different topics are quite different.Finally,using the feature matrix of the document-word co-occurrence graph obtained by training,the topic words of each text are inferred from the topic domain to which the text belongs.(3)Designed and implemented a short text topic extraction prototype system,introduced the system’s functional architecture,module design and database design in detail,expounded the implementation method and processing flow of the core part,and showed the actual operation effect of the system,verifying the feasibility and validity of theoretical research.

Keywords/Search Tags:

Public cultural, Knowledge graph, Short text, Topic extraction, Document-word co-occurrence graph

PDF Full Text Request

Related items

1	Knowledge Extraction From Document-level Formatted Text
2	Research On Short Text Emotion Assessment Method Based On Knowledge Graph
3	Short Text Classification Based On The Model Of Knowledge Graph And Word Combination
4	Short Text Topic Modeling Research Based On The Semantic Extension Of Knowledge Graph
5	Automatic Knowledge Graph Generation For Cultural Relics From Web Text
6	The Construction Of Knowledge Graph In Education Field Based On Natural Language Processing
7	Design And Implementation Of Question Answering System Based On Document Archive Knowledge Graph
8	Research And Application Of Short Text Semantic Analysis Based On Domain Knowledge Graph
9	Optimization Of Topic Model For Self-aggregating Short Texts
10	Research On Topic Detection Method Of Complex Short Text Based On Topic Model