Font Size: a A A

Application Of Optimal Feature Selection Algorithm In Text Classification

Posted on:2014-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2268330401964760Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the maturity of network technology and database technology, databasesystems are becoming more common. From the text classification search, to the analysisof business decisions, and even more cutting-edge bio-engineering. Hidden informationin large amounts of data stored in the database, these hidden information fordecision-making has a very important role. Although a lot of development tools for theanalysis and processing of these hidden information, a lot of hidden informationconnotations are far from being sufficient use. Data mining as a new type of dataprocessing technology, the process of data mining is the data information processingre-analysis. First need to collect all relevant data, followed by through various modelingapproach such as sampling, analysis, conclusions meet the goals of the key factors.Therefore, data mining and its associated technologies and applications research anddevelopment has been the industry’s attention and considerable progress. Filter featureselection algorithm has a very important role in the field of various disciplines to studya more efficient.Text classification is based on predefined subject categories, according to certainrules to the unknown type text in the document set automatically determine a category,involving data classification, computer science, engineering, information science,management science and other disciplines. So far, most of the machine learningmethods, statistical methods, data classification techniques applied to textcategorization.Study Bayesian networks, Naive Bayesian classifier, Filter feature selectionalgorithm, and then discussed in detail on the basis of optimal feature selectionalgorithm based on minimum joint mutual information loss, and text-basedclassification The application needs the optimal feature selection algorithm in textclassification research and design. Finally, test, validation Filter algorithm can beeffectively applied to the field of text classification.This paper focuses on the following: First of Bayesian Networks, and the definitionof the naive Bayes classifier. Depending on the characteristics, summarized the characteristics and models, and related applications. Secondly, explains the meaning ofthe Filter feature selection algorithm and the basic characteristics. Depending on thecharacteristics summarized existing Filter feature to feature subset search based optimalfeature selection algorithm to select law and characteristic Sort law two categories, andeach category to be in-depth analysis. Striving for the characteristics of each algorithm,the basic principle and shortcomings. Furthermore, through the introduction of thedefinition and application of text classification, leads to the feature selection algorithmon the text classification. And by the code of the algorithm and the experimentconducted in-depth research. Which verifies Filter algorithm can be effectively appliedto the field of text classification, and improve the efficiency of text classification.
Keywords/Search Tags:classification, feature selection, Bayes classifier, filter feature selectionalgorithm
PDF Full Text Request
Related items