Font Size: a A A

Research On Incremental Clustering Method Of News Text Based On Contrastive Learnin

Posted on:2024-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:R H LiFull Text:PDF
GTID:2568307112952499Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of Internet technology and the development of self-media,more and more news media choose to publish news on the Internet,and users are accustomed to receiving news through online media platforms.Compared with traditional media,news reports published on the Internet have more quantity,stronger timeliness,shorter length,higher topic triviality,and more serious homogenization,which makes it difficult for users to find interesting and valuable information from them,and also takes more time for news analysis and research.In the face of this situation,incremental clustering technology can classify continuously generated news in real time.divide the reports of different media on the same event into the same cluster,so that users can quickly find relevant reports of their interested events,and facilitate subsequent work to analyze from multiple perspectives.There have been a lot of research works on incremental clustering tasks at home and abroad,but there are still following problems that need to be solved:(1)News reports,as high-semantic textual data,need to obtain data features through deep representation learning methods,but currently unsupervised representation learning methods obtain general features that are irrelevant to categories,which are inconsistent with the category semantic features required by clustering tasks;(2)current incremental clustering methods mostly rely on fixed thresholds to determine the emergence of new clusters.Fixed thresholds limit the size of cluster space and affect the clustering results of samples on the edge of cluster space.(3)The objective entities contained in the news text have a strong limiting effect on the news content.The current deep clustering methods only extract semantic features and ignore the entity features;(4)Currently incremental clustering methods lack practical applications in real life.This article focuses on the above issues and achieves the following results:(1)Proposing a news text clustering method based on dynamic adjustment of contrastive learning.This method uses contrastive learning framework as representation learning model.To address the problem of inconsistent objectives between representation learning and clustering tasks,a method of dynamically adjusting the training loss weight is proposed,which gives more weight to contrastive head at the beginning of training to learn sample feature representation,and gradually transitions to clustering head to obtain sample cluster semantic features during training process.In addition,a negative instances filtering method is proposed to address the problem of samples belonging to address the problem of treating samples belonging to same cluster as negative instances in contrastive learning.This method selects negative instance by generating pseudo-labels based on high-confidence clustering results.The effectiveness of proposed method is validated through experiments on 7 news datasets.(2)Proposing a news text incremental clustering method based on momentum contrastive learning with i CVI(incremental Cluster Validity Index).To address the problem of fixed thresholds in new cluster judgment,t this method proposes a dynamic judgment method based on i CVI,which uses i CVI to supervise the clustering situation in real-time,and selects the case with the minimum change in i CVI when determining new clusters.In addition,to address problem of feature drift caused by changes in news topics during the incremental clustering,this method proposes online learning of feature representations through momentum contrastive learning,which updates the encoder based on the incoming text.The effectiveness of the proposed method is validated through experiments on 7 news datasets.(3)Proposing a news text incremental clustering method that integrates entity features.Based on characteristics of news reports,this method extracts named entities from news texts and embeds them as entity vectors,which are then fused with text features for subsequent calculations.Experimental results demonstrate that this method improves the clustering results to some extent.(4)Based on previous research,a prototype system for news text incremental clustering is built.which can partition uploaded news texts into different clusters in real-time.Moreover,the system is designed with visualization interfaces to meet the reading needs of different application scenarios and provides download functions for subsequent analysis needs.
Keywords/Search Tags:news text, incremental clustering, contrastive learning, deep clustering, incremental cluster validity index
PDF Full Text Request
Related items