| Multi-class text classification is a basic task in natural language processing tasks.The improvement of the effect of text classification algorithms will also improve the effect of many other tasks in the field of natural language processing.At present,many kinds of text classification algorithms with different natures have been born in the academic world,However,In actual use,due to their advantages and disadvantages,multiple algorithms are often used for some form of fusion,and the advantages are complementary,so as to meet special requirements of classification algorithms in engineering.In text classification tasks,the current artificially constructed category systems are often greatly affected by subjective factors of service demanders and data labelers.Different categories in the classification system are unevenly separable for text-classification algorithms.In the current single model classification method,the linear classification method has extremely fast training speed and classification speed,which is suitable for training and classification of large-scale text data,but the classification effect obtained by using the linear classification method may not ideal.At the same time,although the neural network classification method is relatively expensive to train,the classification effect is ideal.If the linear classification method and the non-linear classification method are fused,the linear classification will process the classification of several categories with high linear separability,and the nonlinear classification will process the classification of several categories with poor linear separability,which is expected to get better classification accuracy and training speed.based on the above background,it is a reasonable choice to use model fusion to improve the overall classification performance of classifiers.This thesis proposes a text classification method based on category reorganization and model fusion.The author's main tasks are as follows:(1)Designed and implemented the classification recombination algorithm.The class recombination algorithm merged several classes with high misclassification rates into a new class when using a linear classifier for classification,while the classes with low misclassification rates remain unchanged.Applied the above ideas to the original category system,new category system was obtained,and was appropriate to implement linear-classification.The linear-classifier implemented in this thesis would be trained and tested under the new category,a better classification effect was obtained than that of the model trained under the original category system.(2)Designed and implemented the model fusion algorithm for models in this thesis.Based on the category fusion strategy,the linear and non-linear models were fused.The linear classification algorithm was responsible for the classification of new category system generated by classification recombination algorithmthe,and the neural network classification method was responsible for the classification within the reorganized category.For every new category obtained by recombination,a non-linear model would be used to classify it.Experimental results showed that this method can get better classification effect than the linear classification model at the training cost close to that of the linear classification model,so as to improve the overall performance of the text classification model.(3)A classification system was designed and implemented.Combined with above work results as preliminary application,designed and implemented a classification system in an actual engineering. |