Font Size: a A A

Cross-lingual And Cross-domain Transfer Learning For Text Classification

Posted on:2024-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiangFull Text:PDF
GTID:2568307067472664Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important research branch in the field of artificial intelligence,natural language processing technology is widely used in scenarios such as text classification,text sequence labeling,automatic summarization,machine translation,and dialogue systems.With the continuous expansion of data scale,constant algorithm updates,and continuous improvement of computing power,large-scale language models represented by Chat GPT even show a trend of further unifying all natural language processing applications.However,when faced with some rare-resource or imbalanced languages(such as Arabic,Vietnamese,etc.)or domains(beauty,medical,electronics,etc.),the performance of many current natural language processing techniques,including large-scale language models,can only be considered passable.Therefore,cross-lingual transfer learning,cross-domain transfer learning,and the combination of both,cross-lingual cross-domain transfer learning,remain some of the most challenging topics in current natural language processing.This thesis focuses on the most core and basic application scenario in natural language processing applications-text classification,starting from cross-lingual transfer learning,cross-domain transfer learning,and the combination of both,explores improvement and optimization strategies,aiming to further enhance the performance of scarce-resource languages and domains in actual text classification tasks.The main research contents of this thesis are as follows:In terms of cross-lingual transfer learning,considering the limitations of isomorphism,we introduces "preference features" that are beneficial to actual text classification tasks based on existing cross-lingual embedding learning schemes,i.e.,preference feature mining and weighting methods,and proposes supervised or unsupervised learning conditions under weighted cross-lingual word embedding representation training methods to further improve the performance of language models in actual text classification tasks.In terms of cross-domain transfer learning,this thesis innovates on the current domain adaptation learning methods based on pivot features.For the input layer,we propose domain adaptability fine-tuning optimization based on autoencoders.For pivot feature selection,we design an Attention Rank pivot selection method based on the attention mechanism and Text Rank,a text graph structure mining algorithm,and proposes an iterative optimization method.A large number of experiments show that the improved and innovative schemes of our methods have achieved better results than the baseline schemes in cross-domain product sentiment classification scenarios across multiple different domains,and attention scores provide more visual explanations.Based on the theories of cross-lingual transfer learning and cross-domain transfer learning in the previous two chapters,we propose a joint training cross-lingual cross-domain transfer method,which designs reconstruction,preservation,and domain alignment optimization objectives by coordinating the components within the model,further enhancing the performance of transfers under different languages and domains.In addition,we propose a multi-source training mode,aiming to combine multiple rich-resource languages and domain corpora.A large number of experiments show that,in the cross-lingual cross-domain combination product review sentiment classification scenario,our method has achieved improvements that are almost better than all baseline schemes.In order to verify the effectiveness and usability of all the methods mentioned above,we use real text classification scenario datasets as support and compares them with various baseline methods in different languages and domains.A large number of experimental results indicate that the methods proposed in this thesis can indeed further improve the performance of cross-lingual and cross-domain text classification applications.In addition,we also supplement a considerable number of ablation experiments and visualization analysis,striving to bring more details and interpretability.This thesis hopes to bring new ideas and references for cross-lingual transfer learning,cross-domain transfer learning,and cross-lingual crossdomain transfer learning,and make a small contribution to the upcoming general natural language processing and even general artificial intelligence.
Keywords/Search Tags:Cross-lingual word embedding, Cross-domain Transfer, Text classification, Autoencoder, Attention mechanism
PDF Full Text Request
Related items