Font Size: a A A

Research On Natural Language Generation Techniques In The Large Language Model Era Of Deep Learning

Posted on:2024-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W LiaoFull Text:PDF
GTID:1528307079950699Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,Open AI’s conversational general artificial intelligence(AI)tool Chat GPT shows amazing capabilities of language understanding,text generation,and knowledge reasoning.It can understand user intentions well,conduct effective multiple rounds of conversations,and generate a substantial,well-organized,and logical answer,which greatly exceeds people’s expectations for current AI.Only two months after its launch,the number of active users reached 100 million,making it the fastest-growing consumer application in history.In addition to being sought after by a large number of users,Chat GPT has also attracted extensive attention from governments,businesses,and academia,setting off a new round of AI arms race among large Internet companies and bringing unlimited imagination to people from all walks of life.Chat GPT is not only a huge commercial success,but more importantly,it shows a viable path to solving the core problem of cognitive intelligence,namely natural language processing(NLP).It is considered a solid step towards general AI that will disrupt many fields and replace many people’s jobs.Chat GPT is one of the most representative large language models(LLM)when the technology of NLP evolved into the LLM era of deep learning.LLM brings new opportunities and issues around their characteristics.In terms of the two major components of machine learning-model and data-LLM has the following main characteristics.From the perspective of the model,it learns large amounts of knowledge and ”emerges” new capabilities by increasing the size of the parameters,and provides a more friendly humanmachine interface by using prompting to formally unify natural language understanding(NLU)and natural language generation(NLG)tasks into NLG tasks.From the perspective of data,the model is pre-trained with large-scale unlabeled data to acquire world knowledge,and fine-tuned with a small amount of high-quality labeled data to rapidly adapt to a task-specific domain.Based on the above characteristics,this thesis focuses on the research of models and data in the LLM era of deep learning and starts from some typical NLG tasks to explore the appropriate model structure and data method to improve the performance and practicality of the system.The main content and contributions are summarized as follows.(1)A general-purpose NLG model with a sparsely activated approach is proposed to alleviate the negative transfer problem in multi-task learning.When performing a task,the model sparsely activates some parameters according to a set of pre-defined skills required by the task.This sparse architecture of the model enables efficient multi-task learning by using prior knowledge and avoiding mutual interference between unrelated tasks.Experimental results on multi-task text generation tasks show that the proposed method effectively eases the negative transfer in multi-task learning,and achieves better performance than traditional dense models on multiple tasks.(2)A modular neural network architecture is introduced to modify the integrated model for quickly adapting to the new task.The modified model can dynamically expand with new modules to support new tasks.When adapting to new tasks,this method only needs to train the newly added modules on the new task data without retraining the whole model,which significantly reduces the time and cost for practical deployment.In addition,the added modules alleviate the capacity bottleneck problem caused by the fixed parameter size of the model.Experimental results on multilingual machine translation tasks show that the proposed approach enables the model to quickly add new languages while maintaining the translation performance between the originally supported languages.(3)A cross-modal conversion data augmentation method is proposed to solve the problem of the lack of training data for tasks.This method can obtain a large amount of synthetic data by using the cross-modal conversion to transform the task data similar to the target task,which effectively alleviates the overfitting problem caused by the lack of labeled training data.In addition,synthetic and labeled data are used through a two-stage training strategy,which minimizes the impact of noisily synthetic data.Experimental results on the speech recognition post-processing task show that the proposed method greatly improves the performance of the model to produce highly readable text.(4)A human-machine collaborative data construction method is proposed to address the difficulty of manual annotation of new task data.This method is a semi-automatic pipeline combining automatic generation and human judgment,which can be used to construct some new NLG task data that is hard to annotate manually and produces a large amount of task-labeled data that meet certain quality criteria with little human effort.A large-scale dataset for the text polish task is constructed using the proposed method,which will facilitate further study for the novel and practical text polish task.
Keywords/Search Tags:Natural Language Processing, Natural Language Generation, Large Language Model, Modular Deep Learning, Data Augmentation
PDF Full Text Request
Related items