Font Size: a A A

Research And Implementation Of The Algorithms For Comment Topic Classification

Posted on:2024-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2568306944463244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,expressing demands through websites such as comment boards and complaint platforms has become a common way for users to provide feedback on problems to organizations or institutions.The primary function of the comment topic analysis system is to categorize comments under specific categories in a known tagging hierarchy,thereby enabling staff to summarize and organize the messages.Text classifiers with deep models are widely employed in comment topic analysis due to the advancements in deep learning and natural language processing techniques.In addition,in the comment topic classification system,as the user’s commenting behavior progresses,the volume of comments increases and the data distribution changes.Therefore,it is necessary to regularly update the comment topic classifier using a large amount of new data with manual annotation.However,the daily volume of comments is enormous,and randomly selecting samples from the unlabeled pool for labeling is inefficient and labor-intensive.Therefore,in the algorithmic research for the comment system,this thesis not only designs and implements a highly accurate comment topic classification model,but also studies and implements an active self-training framework with uncertainty-aware clouded logits to address the annotation bottleneck.The main work of this thesis is as follows.(1)This thesis proposes a metadata-based text embedding pre-training based on the characteristics of comment data.This method unifies the representations of metadata and words into the same space and optimizes the embedding representation.Further,this thesis proposes a transformerbased comment classification model to improve the accuracy of comment topic classification.(2)In this thesis,for the annotation bottleneck during model updating,this thesis proposes an active self-training framework with uncertaintyaware clouded logits.The proposed framework combines active learning and self-training learning paradigms to migrate the annotation bottleneck.This thesis proposes the uncertainty-aware clouded logits to boost the performance of the active teacher model,which leverages the phenomenon of softmax saturation to facilitate the learning of clear class boundaries The extensive experiments and visualization analysis have demonstrated the effectiveness of the proposed framework on four datasets and comment dataset compared with state-of-the-art methods.This thesis demonstrates that the proposed method can effectively improve the accuracy of the model and address the annotation bottleneck.(3)This thesis designs and implements a prototype system for comment topic classification,incorporating two proposed algorithms to provide functionalities for comment feedback,comment annotation,and comment processing to users,annotators,and comment administrators.
Keywords/Search Tags:text classification, uncertainty, active learning, self-training, metadata
PDF Full Text Request
Related items