Font Size: a A A

Research On The Application Of Multi-label And Multi-granularity Feature Selection In Text Classification

Posted on:2024-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:J R YangFull Text:PDF
GTID:2568307136952409Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The increasing development of Internet technology leads to the continuous emergence of a large amount of data.Among these data,text data is widely used because of its advantages of less resource occupation and fast download speed.In order to quickly and accurately obtain useful information from a large number of text resources,many scholars began to study the field of data mining and machine learning,resulting in the emergence of text classification technology.Therefore,the core topic of current research turns to how to improve the accuracy of text classification.At this time,the feature selection that can reduce the data dimension becomes particularly important.By eliminating irrelevant features,it constructs a feature subset with high recognition performance and takes it as the basis for training the classification model,which significantly reduces the computing time of the model and improves the learning performance of the model.Therefore,it becomes a key link in text classification.Based on the above background,this paper aims to improve the accuracy of text classification by studying and improving the feature selection algorithm in text classification.Most of the current feature selection methods mainly consider the selection of feature subsets oriented to global targets,ignoring the rich individual information,which results in the inability to complete the identification of local subcategories.Even if there are a small number of feature selection models that can complete the feature learning of subcategories,because they are not combined with global features,it is easy to make classification rigid and inflexible.In addition,most of the feature selection algorithms treat each label as equally important in the process of feature selection,ignoring the fact that different labels have different importance,which leads to poor classification effect.Considering all the above problems,this paper proposes a multi-label and multi-granularity feature selection model.The model integrates information from label space and instance space,which considers not only the structural relationship between labels,but also the structural relationship between instances.The final model realizes the feature selection of "coarse" and "fine" granularity.On the whole,the model is divided into two layers.The first layer is used to select coarse-grained features.In this layer,considering the different importance of labels,fuzzy clustering algorithm is used to assign different weights to each label,so as to select the coarse-grained features for labels more accurately.The second layer selects features based on the first layer,which further selects fine-grained features for instances according to the difference between the original label matrix and the K nearest neighbor matrix of the sample.Finally,the coarse-grained and fine-grained features are combined as the final subset of features to train the classification model.Four text data sets are used in the experiment of this paper.Firstly,the multi-label and multi-granularity feature selection algorithm proposed in this paper is compared with four classical multi-label feature selection algorithms to test their differences in classification performance,and the experimental results show that the proposed algorithm can achieve better classification results in text classification,which proves the superiority and effectiveness of the proposed algorithm.Secondly,the influence test of inconsistent samples is carried out to determine the best parameter value of inconsistent samples.Next,the ablation experiment is performed in this paper.Compared with the model that only considers coarse-grained features or fine-grained features,the multi-label and multi-granularity feature selection algorithm proposed in this paper further improves the effect of text classification.Finally,this paper analyzes the influence of different parameter values on classification results through the parameter sensitivity test.In summary,the multi-label and multi-granularity feature selection algorithm proposed in this paper makes up for the shortcomings of traditional feature selection algorithms to some extent,reduces the complexity of the model,improves the classification accuracy,and provides certain reference value for the research in the field of text classification.
Keywords/Search Tags:multi-label feature selection, multi-granularity feature selection, text classification, machine learning
PDF Full Text Request
Related items