Font Size: a A A

Design And Implementation Of User Comment Classification System Based On Active Learning

Posted on:2023-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:R T DuFull Text:PDF
GTID:2568307061451274Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the Internet and the increasing number of network users,user comment text data began to grow explosively.Text classification based on natural language processing technology can automatically classify users’ comments,which helps product providers and public opinion analysis departments quickly understand users’ interests,needs and opinions,and has become a research hotspot.The current user comment oriented text classification faces the following two challenges:(1)most user comment texts are user’s subjective description,which contains user emotion information that is helpful to text classification,but most of the existing methods ignore this information;(2)Considering the cost of annotation,active learning is introduced into the existing text classification to obtain more valuable annotation data through less cost,so as to further improve the effect of the algorithm.However,most of the user comment texts have the imbalance of data distribution.Most of the existing active learning methods ignore this feature,which affects the performance of classification to a great extent.In view of the above challenges,this thesis proposes a text classification method considering user comment emotion and an active learning method considering the data distribution characteristics of training set.On this basis,a user comment classification system based on active learning is designed and implemented.The main work of this thesis includes:(1)Aiming at the problem that the existing text classification methods for user comments ignore the user emotional information contained in the text,a user comment classification method based on multi task learning with emotional information is proposed.Firstly,a simple emotional information extraction method is used to preprocess the user comment data to obtain the emotional category of the data.Then,multi task learning is used to effectively use the emotional and semantic information of the data to complete the classification of user comments.Experimental results show that this method has good classification performance on user comment data,and it is proved that the emotional information of user comments is indeed helpful for classification.(2)Aiming at the problem that the existing active learning methods ignore the distribution balance of training set data when sampling data,an active learning method based on uncertainty and balanced contribution is proposed.The data sampled by this method not only has high uncertainty,but also helps to build a distributed balanced training set.Firstly,the method samples the data in the unmarked data pool through the margin sampling method based on uncertainty,then calculates the contribution of the sampled data in constructing the training set with balanced distribution through the text semantic similarity model Sim CSE,and finally selects the data with high balanced contribution for manual annotation.The experimental results show that the data sampled by this method is helpful to build a balanced distribution training set,and effectively promote the performance of the classification model.(3)Based on the above work,a user comment classification system based on active learning is designed and implemented.The system is divided into three modules: data processing,text classification and active learning.The data processing module provides data support for text classification and active learning module,which are implemented by the above methods respectively.
Keywords/Search Tags:Active Learning, Text Classification, Neural Network, Balanced Data Set, Text Similarity
PDF Full Text Request
Related items