| Chinese public cultural resources contained in various cultural places such as libraries,museums,art galleries are huge and diverse,but it brings difficulties to users’ choice.Constructing a public cultural knowledge graph and effectively organizing and managing public cultural resources is conducive to the co-construction and sharing of public cultural resources.Extract resource topics from the text description information of public cultural resources and complete the knowledge graph of public cultural resources,enhance the semantic characteristics of cultural resources,and facilitate the description of users’ interests and preferences,which is conducive to the realization of personalized recommendation of public cultural resources,and promotes public cultural dissemination.However,public cultural resources have limited description information and short texts.Due to sparse data,traditional topic extraction methods cannot effectively capture the semantics of words and texts in short texts,and the extracted topics have low accuracy and quality.In order to achieve high-quality topic extraction from short texts of public cultural resources,this master’s thesis conducts in-depth research on short text topic extraction,proposes a short text topic extraction method based on document-word co-occurrence graph,and applies it to public cultural knowledge graph system,so as to provide support for the construction of high-quality public cultural knowledge graph.The main work of the thesis includes:(1)Aiming at the problems of poor generalization ability and insufficient semantic representation of existing short text topic models,a document-word co-occurrence graph construction method based on the BERT model is proposed,which intuitively represents the relationship between text and topic words in the form of a graph.At the same time,the BERT word vector model is introduced to enhance the semantic association of co-occurring topic pairs,so as to obtain the overall topic distribution of the text corpus and the topic range to which each text belongs,and improve the quality of the topic words.(2)Aiming at the difficulty of topic inference in existing short text topic models,a topic inference method based on document-word co-occurrence graph mutual information maximization is proposed.We train the document-word co-occurrence graph use a graph embedding model based on maximizing node-graph mutual information,so that words under the same topic have similar representations,and word representations under different topics are quite different.Finally,using the feature matrix of the document-word co-occurrence graph obtained by training,the topic words of each text are inferred from the topic domain to which the text belongs.(3)Designed and implemented a short text topic extraction prototype system,introduced the system’s functional architecture,module design and database design in detail,expounded the implementation method and processing flow of the core part,and showed the actual operation effect of the system,verifying the feasibility and validity of theoretical research. |