| With the rapid development of network technology, blog, forums and e-mail make the explosive growth of the text messages which make the most percent of network, and it brings more and more convenience to the work, learning and life of people. However, the rapid development of network technology also brings a lot of unhealthy text messages in the network space, and it makes some negative effects to society, families and even the young people. Content security is being an essential of information security. Text categorization is one of the basic issues in information organization, management, recognition and filtering, for which the need of the internet content security poses new challenges.Aimed at the topic of the requiring background from the internet content security, this paper has a comprehensive research of the framework of Text categorization firstly, and has a research of feature selection and feature weight. Secondly, this paper has a compare of several text classifier to select the support vector machine classifier which has an excellent performance. Then, on the basis of understanding the SVM theory, this paper studies deeply on how to build a SVM text classification system.This paper designs a kind of multi-level text categorization based the support vector machine classifier. The training system does the feature selection with information gain, and does the feature weight with TFIDF, the kernel uses linear kernel function of SVM to train the vector to generate the classification model. The classification system extracts the suspicious text from the text set by using keywords at first. Then it makes the suspicious text be classified to multi-level. The multi-level classification is able to refine the sensitive text more steps to the category, such as the text belongs to the Buddhism of the religion, after that the system can organize the text by its category.The experiments show that multi-level text categorization based the support vector machine classifier has a high speed and accuracy. With the sufficient sample database, the system can determine the classification series according to the needs of the user. It can give some help to the multi-level text categorization, and provides the technical foundation for developing the intelligent equipment which can manage the behavior of surfing the internet. |