| With the advent of the Internet information age,people’s life is filled with all kinds of information,which text is the most common information carrier.The text information on the Internet is varies greatly,which is mixed with a large amount of junk information.It can not meet the needs of the times by human filtering these text information,so the audit system based on text classification technology arises at the historical moment.However,the traditional text classification model needs a large amount of data for training,and the training data annotation requires a high labor cost and is prone to errors.Therefore,it has important practical application value to implement a comment content audit system which automatically selects more valuable training data by using a few annotated data.This paper designs and implements a comment content audit system based on text classification by reading literature and investigating the audit system.Firstly,the overall architecture and functions of the system are designed,and the functions of the system are described in detail.At the same time,the table structure of the system database is designed.Then,according to the two audit modes of machine audit and human audit,the process of audit system implementation is designed.Finally,according to the data transmission problem,the system is designed to open the interface.According to the characteristics of the comments content,in order to solve the problem caused by high feature dimension and sparse distribution of feature words in comment content,the rough set scoring method was used to select the features of the text training data when training the text classification model.In order to solve the problems of high cost of manual annotation and training data are not representative,this paper adopts active learning method to select training data,and the initial training data selection phase of active learning is improved.At the same time,in the training data iteration update phase,choose the appropriate classifier model,more representative text data can be obtained for manual annotation and added into the training data.Using comment content data from the Airbnb platform,the experimental results show that the classification model can achieve higher accuracy with less training data by using the scheme in this article.At the same time,the development of comment content audit system is completed,which has a certain promotion value in public opinion analysis units. |