Research On The Application Of Multi-label And Multi-granularity Feature Selection In Text Classification

Posted on:2024-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:J R Yang

Full Text:PDF

GTID:2568307136952409

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

The increasing development of Internet technology leads to the continuous emergence of a large amount of data.Among these data,text data is widely used because of its advantages of less resource occupation and fast download speed.In order to quickly and accurately obtain useful information from a large number of text resources,many scholars began to study the field of data mining and machine learning,resulting in the emergence of text classification technology.Therefore,the core topic of current research turns to how to improve the accuracy of text classification.At this time,the feature selection that can reduce the data dimension becomes particularly important.By eliminating irrelevant features,it constructs a feature subset with high recognition performance and takes it as the basis for training the classification model,which significantly reduces the computing time of the model and improves the learning performance of the model.Therefore,it becomes a key link in text classification.Based on the above background,this paper aims to improve the accuracy of text classification by studying and improving the feature selection algorithm in text classification.Most of the current feature selection methods mainly consider the selection of feature subsets oriented to global targets,ignoring the rich individual information,which results in the inability to complete the identification of local subcategories.Even if there are a small number of feature selection models that can complete the feature learning of subcategories,because they are not combined with global features,it is easy to make classification rigid and inflexible.In addition,most of the feature selection algorithms treat each label as equally important in the process of feature selection,ignoring the fact that different labels have different importance,which leads to poor classification effect.Considering all the above problems,this paper proposes a multi-label and multi-granularity feature selection model.The model integrates information from label space and instance space,which considers not only the structural relationship between labels,but also the structural relationship between instances.The final model realizes the feature selection of "coarse" and "fine" granularity.On the whole,the model is divided into two layers.The first layer is used to select coarse-grained features.In this layer,considering the different importance of labels,fuzzy clustering algorithm is used to assign different weights to each label,so as to select the coarse-grained features for labels more accurately.The second layer selects features based on the first layer,which further selects fine-grained features for instances according to the difference between the original label matrix and the K nearest neighbor matrix of the sample.Finally,the coarse-grained and fine-grained features are combined as the final subset of features to train the classification model.Four text data sets are used in the experiment of this paper.Firstly,the multi-label and multi-granularity feature selection algorithm proposed in this paper is compared with four classical multi-label feature selection algorithms to test their differences in classification performance,and the experimental results show that the proposed algorithm can achieve better classification results in text classification,which proves the superiority and effectiveness of the proposed algorithm.Secondly,the influence test of inconsistent samples is carried out to determine the best parameter value of inconsistent samples.Next,the ablation experiment is performed in this paper.Compared with the model that only considers coarse-grained features or fine-grained features,the multi-label and multi-granularity feature selection algorithm proposed in this paper further improves the effect of text classification.Finally,this paper analyzes the influence of different parameter values on classification results through the parameter sensitivity test.In summary,the multi-label and multi-granularity feature selection algorithm proposed in this paper makes up for the shortcomings of traditional feature selection algorithms to some extent,reduces the complexity of the model,improves the classification accuracy,and provides certain reference value for the research in the field of text classification.

Keywords/Search Tags:

multi-label feature selection, multi-granularity feature selection, text classification, machine learning

PDF Full Text Request

Related items

1	Research On Feature Selection Methods Based On Multi-Label Learning Theories
2	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
3	Research On Multi-label Feature Selection Algorithms And Their Applications
4	Research On Feature Selection And Multi-label Transformation Of Text Classification
5	Research On Algorithm Of Feature Selection With Fuzzy Discernibility Matrix For Multi-label Classification
6	The Research Of Multi-Label Learning Problem About Feature Selection And Classification
7	Research On Feature Selection Algorithm Based On Multi-label
8	Based On Decision Relevance Multi-label Classification And Feature Selection Algorithm
9	Research On Multi-label Feature Selection Based On Weighted Labels And Consistent Neighborhood
10	Feature Selection Algorithms Based On Multi-objective Optimization For Multi-label Classification