Application Of Improved CHI Feature Selection Method In Text Classification

Posted on:2022-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:L J Cai

Full Text:PDF

GTID:2518306524981589

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet age,information and data exist in various ways,such as images,videos,sounds,and texts.Compared with other forms,text is more widely used because of its faster upload and download speed and less network resources.In the massive text database,there are many important information stored.In order to obtain these data quickly and accurately,the automatic text classification technology came into being.Text classification has been widely used in various industries in recent years.It is very important to improve the accuracy of classification results,and this is also the main research purpose of researchers in this field in recent years.Feature selection plays an important role in text classification,which has the functions of eliminating irrelevant features,reducing dimensionality,and improving classification accuracy.It is the foundation of the text classification research field.Therefore,the performance of the feature selection algorithm will directly affect the formation of the feature space in the text classification system,thereby affecting the classification effect and accuracy.This paper studies the CHI feature selection algorithm,and mainly does the following work:First,a theoretical review of text classification is carried out.The research analyzes the definition,theoretical basis,overall framework and common algorithms of text classification;introduces several feature selection algorithms and analyzes their respective advantages and disadvantages;summarizes the optimization ideas of feature selection algorithms;systematically learns Chinese text Knowledge of classification and English text classification.Second,propose improved ideas and improved algorithms.This paper determines that the introduction of new parameters will be the improvement direction of the CHI algorithm,and thus proposes the Var-CV-CHI feature selection algorithm on variance and coefficient of variation.At the same time,this article also analyzes the shortcomings of the TF-IDF algorithm in the feature weighting link,and thus proposes the TF-CV algorithm,which effectively improves the effect of text classification.Third,the algorithm is implemented.In terms of language,the experiment in this article includes two text classification systems,Chinese and English.In terms of classifiers,the KNN algorithm and Bayes algorithm with the best classification results are selected.In terms of data types,two distribution types of data are used: balanced data sets and unbalanced data sets.Fourth,experiment and result analysis.This paper has done 8 comparative experiments and analyzed the results of each experiment.The experimental results obtained are all significantly improved compared to the results before the improvement.

Keywords/Search Tags:

Text classification, Feature selection, CHI algorithm, Variance, Coefficient of Variation

PDF Full Text Request

Related items

1	Improved Feature Selection Algorithm And ITS Application In Text Categorization
2	A Sparse Coding Algorithm For Fractal Image Compression Based On Coefficient Of Variation
3	Classification Of Web Browsing And Video Services Based On Novel Feature Selection Algorithm
4	Research And Design On Service Selection Approchers Based On QOS Attributes Filtering
5	Research Of Feature Selection And Weighting Algorithm In Text Classification System Based On SVM
6	Research On Text Classification Method Based On Improved Feature Selection Algorithm
7	Research And Improvement Of Feature Selection Algorithm In Chinese Text Classification
8	Research On Feature Selection Algorithm And Classification Algorithm In Chinese Text Categoriztion
9	Research On Improved Feature Selection And Classification Algorithm For Chinese Text
10	Research On Text Classification Based On Optimized Feature Selection Algorithm