| Short text intelligent classification is the basis of controlling massive short text data,such as news comments,video barrage,etc.Sparse features,high dimensionality,and irregular structure are the main problems faced by short text classification research.The thesis is based on the maximum correlation minimum redundancy feature dimensionality reduction algorithm MRMR,combined with the ability to transform nonlinear problems in low dimensional space into linear problems in high dimensional space SVM model.In order to improve the classification accuracy,the core support technology is optimized from two aspects.The effectiveness of the optimization design is verified through experiments and the design of a prototype system for barrage classification.The specific research work and achievements of the thesis are as follows:(1)Research on MRMR optimization algorithm based on word frequency regulation factor and sequence floating forward selection.The calculation of correlation and redundancy is undoubtedly the key to ensuring the effectiveness of the MRMR algorithm.Firstly,the mutual information formula is used to calculate the correlation degree between the feature item and the category label,and the regulation factor of word frequency is introduced.The maximum correlation feature items are screened by considering the mutual information value and word frequency comprehensively,so as to reduce the influence of low-frequency words on the feature tags;At the same time,in the process of filtering the target feature subset,a sequence floating forward selection method is introduced to search the feature items with better performance among the remaining features,in order to avoid the problem of conventional methods easily falling into local optima when searching for subsets.The feature subsets obtained by multi-group feature dimension reduction algorithm are classified and compared to verify the effectiveness of the proposed feature dimension reduction optimization algorithm.(2)Research on Adaboost-IFASVM Model based on parameter optimization and weighted integration.Kernel parameters and penalty factors are the key to ensuring that SVM can train better models.Therefore,firefly intelligent algorithm FA is introduced to optimize these two parameters.In the specific design of the Firefly algorithm,in order to ensure that the algorithm has a strong exploration ability in the early stage and accelerated convergence ability in the late stage at the same time,the number of iterations is introduced as an exponent to change the way the step size factor changes,so that the step size factor can meet a good change rule from large to small.At the same time,to further improve the learning ability of the model,the Adaboost algorithm is used to iteratively strengthen the weight of misclassified data,and the comprehensive expression ability of the classification model is optimized through integration.The comparative experimental results show that the classification model proposed in the thesis performs better in terms of accuracy,recall,and F1 value.(3)Design and implementation of a prototype system for classification of bullet screen short texts.To verify the effectiveness of the proposed algorithm,a prototype system for classification of bullet screen short texts based on the integrated optimization algorithm is designed and implemented.The system includes functions such as data acquisition,text preprocessing,feature representation,feature selection,and short text classification.Data acquisition based on Python web crawler program design,and Vue.js is used as the front-end display page.Functional testing has verified the effectiveness of the system and classification algorithm design. |