Font Size: a A A

Research On Filtering Algorithms Of Text Information Based On SVM

Posted on:2017-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:G X ZhangFull Text:PDF
GTID:2308330482489357Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, information is very rich in resources, information resources has become the focus of competition in the industry, who mastered the information resources who can master the future. With the Internet, the rapid development of information technology, data storage technology is more and more high, so the computer can read the text information is also gradually increased. Compared with some specific users. The information they need is usually part of a little of this information. With specific users and huge information resources demand for specific text information increasing, information users are in urgent need of a set of large-scale text information processing software. However, the user wants to obtain the desired text information from the vast information resources and on text information processing accuracy, high accuracy requirements. The main algorithm of text information filtering is how to accurately express the needs of users, but After the user in the vast information resources automatically obtain the useful information. At present, English in the international exchange is a kind of the most commonly used language, and for the enterprise internationalization, valuable information are generally in English file. Therefore, on English texts information filtering research with practical value and practical significance.Text information filtering is through the computer to text information for automatic filtering process, is in text information content based on text information for automatic classification and filtering core technology. Due to the problem of text information filtering with the correlation among the features of the information, the vector dilute high dimension, large sparsity features and support vector machine(SVM) is applied to the problem of text information filtering and its potential application in text information filtering is very large.In text information filtering process, how to reduce the high dimensionality of the feature vector space, achieve an efficient algorithm of text information filtering is a important problem need to be resolved. Therefore, this article from the feature value extraction algorithm selection, improved information filtering algorithm, algorithm parameter optimization respectively corresponding research, including the following several aspects:1. Analysis of the traditional feature extraction methods, on the analysis of existing feature selection and information gain feature selection on the basis of the advantages and disadvantages of the proposed combination of a dimension adaptive feature selection method. Through the experiments are conducted to compare with the traditional feature selection, information gain and dimension adaptive combination method and the dimensions of adaptive combination method makes the cross validation of the training data accuracy and training verification accuracy is improved significantly.2. For the SVM in the training process exist unbalanced sample cons as well as the useless samples and in text filtering exist is difficult to distinguish the area, this paper adopts the fusion of K-means algorithm is improved, suitable for SVM training subset was selected by means of K-means algorithm, and to optimize the training samples; by determining the positive samples clustering center set region and in text information filtering, for is difficult to distinguish the data can again cluster decision and improve the algorithm in information filtering precision.3. In the practical application of SVM algorithm, penalty factor, the parameter of kernel function and kernel parameters on the classification accuracy influence greatly. This paper uses a variable step size iterative selection algorithm, to the cross validated accuracy best for the purpose, through rough selection, selection of two processes, and to determine the optimal parameters. The experiments show that, improved SVM algorithm, in text information filtering process, precision, recall and value evaluation index, increased significantly.4. In the algorithm on the basis of the research. In this paper, the C# language called C component to achieve a systematic algorithm, and test in the actual operation of the algorithm. The experimental results show that based on SVM text information filtering algorithm of filtering performance is better.
Keywords/Search Tags:Information filtering, English text, feature extraction, support vector machine, machine learning
PDF Full Text Request
Related items