Font Size: a A A

Research On The Algorithm Of Feature Selection Based On Mutual Information For Text Categorization

Posted on:2012-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:H T FuFull Text:PDF
GTID:2218330335485940Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today's society, along with large numbers of texts in electronic form, how to find the information in the form of electronic resources accurately and fast which people needed has become a hot research topic in the field of information processing. The emergence of Text Categorization technology has relieved the problem to some extent.Research on the related technologies of Text Categorization in this paper is basic,such as word segmentation,feature selection,text representation model,text classification algorithms and so on, focusing on the mutual information of feature selection method, and found that the traditional method of mutual information had not considered the frequency and the distribution information of features in the text sets,which resulting in lower performance of text classification.In order to improve the classification performance of traditional mutual information, the dispersion and the average frequency within the class is introduced to improve the traditional mutual information in this paper, in order to verify the improved mutual information method is effective and feasible, using open-source system which has the function of word segmentation,removing stop words,feature selection,Text Categorization to execute the text classification experiment, then the traditional mutual information, better performance of other feature selection under the same conditions for text classification experiment, the results suggest that the improved mutual information method is indeed improved the classification performance, verify the improvement on the traditional mutual information is feasible and effective.
Keywords/Search Tags:feature selection, classification algorithm, mutual information
PDF Full Text Request
Related items