Research On Text Classification And Short Text Clustering Technology Based On Contrastive Learning

Posted on:2024-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2568307094459434

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the arrival of the big data era,text classification and clustering techniques have become extremely important.There is a large amount of textual information on the internet,such as Weibo comments,QQ chat messages,and movie reviews.Accurately classifying or clustering this textual information can help people better manage and utilize text data.Therefore,this article studies a text classification method based on contrastive learning and adversarial training,as well as a short text clustering method based on contrastive learning.The specific research work is as follows:(1)Although current text classification models achieve high accuracy in classification,they still suffer from poor data augmentation consistency and inability to learn noise-invariant representations.As a result,the models are not effective in resisting perturbations,have limited generalization abilities,and exhibit inconsistent prediction distributions for similar samples.These limitations severely impact the accuracy of text classification tasks.To address these issues,we propose a Text Classification model that combines Contrastive Learning and Adversarial Training(TCCA).The TCCA model first performs data augmentation on the original data to generate two samples,then adds adversarial perturbations to each of these samples to create a positive sample pair.The model then inputs the positive sample pairs into a Bert model to extract text features,and utilizes a Bi LSTM and Attention layer to extract deeper semantic information.By fusing Contrastive Learning and Adversarial Training,we construct a new loss function to optimize the model.During the prediction phase,we adjust the model’s predicted results using the empirical distribution of the training set to improve the classification accuracy.Compared to Bert,the TCCA model achieves a 0.67%,2.14%,and 1.77% improvement in accuracy on three datasets,respectively.(2)Due to the low information content and high category overlap of short text data,most short text clustering methods struggle to effectively separate the data.Additionally,the use of Transformer language representation models can lead to representation degradation,resulting in high cosine similarity between word vectors and impairing text semantic representation,which in turn negatively affects short text clustering performance.To address these issues,this paper proposes a Short Text Clustering(STCL)model based on contrastive learning.The STCL model effectively separates data from different categories using the contrastive learning approach,and to some extent,improves the representation degradation problem,thereby enhancing short text clustering performance.Experimental results demonstrate that on most datasets,the accuracy(Acc)and normalized mutual information(NMI)of STCL are significantly improved,achieving 91.7% and 75.2% on the Ag News dataset,respectively.The ablation experiments also confirm the effectiveness of contrastive learning in short text clustering tasks.

Keywords/Search Tags:

Natural Language Processing, Text classification, Short text clustering, Contrastive learning, Adversarial training

PDF Full Text Request

Related items

1	The Research And Implementation Of Text Classification Based On Adversarial Training
2	Telecom Complaint Text Classification Based On Adversarial Training And Contrastive Learning
3	Adversarial Attack And Defense Methods For Text Classification
4	Intelligent Device Text Classification Method Based On Natural Language Processing
5	Short Text-based Adversarial Example Attack
6	Text Adversarial Examples Based On Word-Level Perturbation
7	Research On Text Clustering Based On Self-Supervised Contrastive Learning
8	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
9	Research And Implementation Of Short Text Classification Model Based On Course Knowledge Points
10	Research On Topic Detection And Classification In Short Text