| As a carrier of information,text has played a pivotal role since ancient times.In recent years,with the development of network technology,the data on the Internet has been growing "explosively",and more short text data have been generated.How to divide short text data into categories and quickly get the content under related topics has become a hot research topic in recent years.Unlike long text,short text itself has the defect of sparse features,which brings more difficulties to short text classification.Traditional machine learning-based short text classification methods cannot effectively compensate for this deficiency;with the increasing use of deep learning in computer vision,researchers have attempted to classify short texts with this new tool.However,the existing short text classification methods have the problem of low accuracy improvement.Based on this,this paper proposes a short text classification method based on multi-feature fusion and improved capsule network.Firstly,feature extraction is performed using convolutional neural network and attention mechanism,then multi-feature fusion is performed using spliced attention mechanism based on weight filtering,and finally the obtained feature matrix is fed into a capsule network with gate structure for classification.The main work of this paper is as follows:(1)In this paper,we propose a Concatenation Attention mechanism based on Weight Filtering(CATT-WF)to consider the importance of all features.First,convolution operations are performed using different height filters to obtain feature vectors containing different semantic scales,which are spliced in the horizontal direction with feature matrices,and then the feature matrices corresponding to each channel are spliced in the vertical direction to put all features in one dimension for the multi-headed attention mechanism processing,and then the relative importance of all features is obtained to obtain a more comprehensive semantic representation;meanwhile,for the multi redundant and irrelevant data in the scale-based feature matrix,k-means is used to filter the attention weights to obtain short text features that are more important for classification;(2)Although the traditional short text classification method based on capsule network can extract spatial features such as location inside the short text,some words that are not important and do not require location extraction are also extracted in depth,which reduces the model classification accuracy.In this paper,we propose a short text classification method based on improved capsule network(Gate Capsule with Concatenation Attention mechanism based on Weight Filtering,GCapsule-CATT-WF),i.e.,the primary capsule layer of the capsule network extracts vector The conversion gate controls the conversion of scalar features into vector features and the incorporation of location features into word features,while the carry gate controls the existing state of the features to be carried to the next layer.Subsequently,combined with the proposed multi-feature fusion approach,the fused short text features are sequentially passed through the primary capsule layer to extract feature positions,while the gate structure will control the conversion and carry of features,and the obtained vector feature matrix is dynamically routed for feature transfer between the upper and lower layers,and finally the output vector matrix is then used to obtain the predicted short text classification results.(3)To demonstrate the effectiveness of the multi-feature fusion method based on the weight filtering splicing attention mechanism and the short text classification method based on the improved capsule network,experiments are conducted on four short text datasets,including AG-News,MR,TREC and SST-2,respectively.in the experimental setup,comparison experiments and ablation experiments are included,as well as experiments for the parameters involved in the experiments,and detailed experimental analysis was performed.The experimental results show that the proposed method in this paper has a high classification accuracy compared with most of the currently popular models. |