Font Size: a A A

Research On Text Classification For Proposals And Construction Of Domain Knowledge Graph

Posted on:2024-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhaoFull Text:PDF
GTID:2568306944968449Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of knowledge graphs in various industries,domain knowledge graphs show incomparable advantages to general graphs.However,the research on communication domain knowledge graph for 3 GPP proposals is not mature enough,and the semantic information in the massive unstructured proposals has not been fully extracted,which brings great inconvenience to researchers’information extraction and proposal retrieval.This thesis constructs the communication domain knowledge graph for the 3GPP proposals,and devotes to the research of the key technologies in the domain knowledge graph.The main work of this thesis includes domain named entity recognition,text fine-grained classification and domain knowledge graph construction..Firstly,An Incremental Knowledge-Guided Multi-model Collaborative Training Model(IKG-MCT)is proposed,which adds a Knowledge-Guided unlabeled samples selection module to the original collaborative training process,and uses the domain information of the proposal to set the priority of unlabeled samples to realize the incremental knowledge-guided iterative training.This thesis constructs five COMM datasets for the proposal corpus and conducts comparative experiments.The IKG-MCT model shows better performance for named entity recognition on COMM datasets with different labeling proportions.Especially in the COMM-10%dataset,the IKG-MCT model improves the precision,recall and F1 value by 2.37%,3.58%and 2.97%respectively compared with the original Tri-training algorithm.It is confirmed that the IKG-MCT model can better guide the unlabeled data to complete named entity recognition of communication domain when the proportion of labeled data is small.The IKG-MCT model realizes incremental iterative training with the help of domain knowledge,which alleviates the risk of error propagation in the semi-supervised model and provides a more effective solution for entity recognition in professional fields.Secondly,A Keyphrase-Enhanced Graph Convolutional Network(KPE-GCN)for Imbalanced Text Classification is proposed,which performs graph convolution based on a two-layer heterogeneous graph that fuses words and keywords,so that proposals can simultaneously learn general global features and domain features.At the same time,KPE-GCN adds a keyphrase-based data enhancement module,which uses keyphrase as the category center to construct pseudo-samples,effectively solving the problem of class imbalance.KPE-GCN further improves the Micro-F1 value and Macro-F1 value to more than 99%for the first time,showing high-precision classification performance on the proposal dataset.KPE-GCN provides a unified domain feature extraction scheme,which can be widely applied to text classification in various professional fields,such as standards,proposals,patents,etc.Finally,The communication domain knowledge graph for 3 GPP proposals is designed and constructed,which can model the 3GPP proposals in the communication field,and realize the functions of multi-dimensional data retrieval and reasoning.Based on the characteristics of 3GPP proposals,the ontology of the communication domain knowledge graph is designed to carry the communication domain information,covering seven types of entities such as proposal,conference,organization,category,term and so on,and 12 corresponding relationships between entities.Based on the named entity recognition and text classification in the communication field,An explicit knowledge extraction method is designed based on template matching according to the characteristics of proposal documents,and Neo4j is used to store and visualize the knowledge of the graph.This thesis shows the multi-dimensional data retrieval and multi-hop relationship reasoning functions of the graph on the communication domain knowledge graph,which provides researchers with more accurate information retrieval and knowledge reasoning services,and shows the association of 3 GPP proposals and the evolution process of key technologies more clearly and intuitively.
Keywords/Search Tags:Knowledge Graph, Named Entity Recognition, Text Classification, Graph Convolutional Network, Multi-model Collaborative Training
PDF Full Text Request
Related items