Algorithmic Studies On Knowledge Enhanced Pre-trained Language Models

Posted on:2024-09-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:T L Zhang

Full Text:PDF

GTID:1528307145496264

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Pre-trained language models(PLMs)have unified the learning paradigms for various text tasks and greatly improved the results of downstream tasks.However,semantic understanding of factual and common sense knowledge in massive internet data still faces significant challenges for these models.The main goal of knowledge-enhanced representation learning models,especially PLMs,is to provide external knowledge data semantic enhancement for important components such as entities in unstructured text that PLMs cannot understand.They also provide better knowledge understanding ability and support for various internet intelligent text tasks for large-scale PLMs.Existing mainstream knowledge-enhanced pre-trained language models(KEPLMs)mainly focus on performance improvement in open domains with rich language resources.However,text tasks in real-world scenarios are constrained by factors such as data and computational resources,closed domains and low resources,making it difficult to directly adapt existing algorithms.KEPLMs mainly use large-scale open domain unstructured text and structured knowledge graph(KG)to conduct self-supervised pre-training tasks,focusing on improving the performance of downstream tasks.Existing research usually assumes abundant resources and considers the model structure design process of deep context semantics and knowledge semantics fusion calculation in this direction,ignoring many constraints in user scenarios.This results in several problems that need to be solved:(1)most previous work only considers structured KGs’ data to enhance pre-training text semantics,but there are various external knowledge forms and the quality of semantic understanding provided by knowledge data is also uneven,leading to low utilization of open-domain knowledge data.(2)Mainstream KEPLMs are usually studied on open-domain KGs,which has a large difference in data distribution from closed-domain KGs.Therefore,existing work lacks modeling of the characteristics of closed-domain knowledge data,which leads to unsatisfactory model transfer effects.(3)Previous research is usually carried out in Chinese and English languages with rich pre-training knowledge data.However,low-resource languages have insufficient knowledge data,making it difficult to directly adapt existing algorithms to enhance knowledge semantics in such languages and the performance of various mainstream model research work is poor.(4)The above KEPLMs usually control the learning and updating process of all model knowledge parameters through self-supervised tasks.They are not clear how knowledge data enhances PLMs,resulting in a large computational resource cost in the knowledge learning process and low efficiency of knowledge parameter learning,which greatly hinders the practical application of this direction.(1)For the first question above,we propose a heterogeneous graph neural attention network and a decoupled knowledge learning algorithm.As knowledge can come from different structural components in practical text tasks,we propose a heterogeneous graph data structure to unify various types of knowledge in the graph heterogeneous attention network for knowledge pre-training learning.To further improve the utilization of existing knowledge data,we analyze that the pre-trained models have a greater difficulty in understanding long-tail knowledge,and therefore design a decoupled knowledge data learning algorithm in the three stages of pre-training,fine-tuning,and inference.(2)For the second question above,we propose a pre-training algorithm enhanced by domain-specific implicit graph structure contrastive learning.This section aims to explore the characteristics of closed-domain and open-domain graph data to build a unified framework for KEPLMs learning in a specific domain.The closeddomain graph data structure has the characteristics of global sparsity and local density.We propose the use of hyperbolic space embedding learning to obtain hierarchical structural representations of closed-domain graphs,which compensates for the global sparse semantics.Then,through the density of local structures,we construct higher-quality contrastive learning positive and negative samples of different difficulty levels to enhance the differences between entity-level semantics and further compensate for the lack of global semantics.(3)For the third question above,we propose cross-lingual pre-training algorithm enhanced by unsupervised language clustering.This section mainly explores the formation of language clusters through the similarity between multiple languages.We use an unsupervised Domino-Chain-Learning method to learn mutual knowledge enhancement within each language,and then inject the cross-lingual knowledge representation into the pre-training of contextual representation through pseudo-embedding knowledge injection.This achieves the goal of using a rich variety of pre-training knowledge data to enhance low-resource language tasks via cross-lingual learning.(4)For the fourth question above,we propose a pre-training algorithm enhanced by a semi-parameterized knowledge memory bank.This section explores the use of a knowledge-based memory bank to accelerate the learning process of the model’s knowledge parameters.By using semi-parameterized learning of knowledge,the model only needs to dynamically update a portion of the neural network parameters that store knowledge positions in the feed-forward network to achieve a faster training speed and better performance of the KEPLMs.

Keywords/Search Tags:

Pre-trained Language Models, Knowledge Enhancement, Transfer Learning, Contrastive Learning, Clustering Learning

PDF Full Text Request

Related items

1	Chinese Language Identification Based On Transfer Learning
2	Augmentation Of Pre-Trained Model For Programming Language Based On Structure Information
3	Research On Knowledge Base Question Answering Model Based On Contrastive Learning
4	Research On Text Representation Optimization Method Based On Pre-Trained Models
5	Research Based On Pre-trained Language Models And Knowledge Enhancement For Aspect-based Sentiment Analysis
6	Research Of Program Learning Methods Based On Pre-trained Language Models
7	Research On Compression Techniques Of Pre-trained Language Models
8	A Study On Multilingual Representation Learning And Application Based On Pre-Trained Language Model
9	Research On Span-Extraction Machine Reading Comprehension Models Based On Pre-Trained Language Models
10	Research On Anomaly Detection Of Log Data Based On Contrastive Learning And Word Embedding