Font Size: a A A

Semi-supervised Agricultural Text Classification Based On Semantic Diffusion Kernel And Support Vector Machines

Posted on:2019-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2393330590957431Subject:Engineering
Abstract/Summary:PDF Full Text Request
Along with the constantly improving of information technology application level,the rapid development of internet related industries and the strongly state support to agriculture,the application of information technology in the agriculture and rural areas is constantly developing,promoting and penetrating.How to learn to quickly and accurately extract valuable knowledges from massive agricultural text data information automatically by utilizing machine has become an important research topic.The automatic text categorization is the research hotspot in data mining field,and it also is the key technology to processing text information in the machine learning.On the basis of traditional text classification model,this paper proposes a semi-supervised agriculture Chinese text classification method based on feature population semantic diffusion kernel and support vector machine,the experimental result demonstrates the method proposed by this paper has higher classification accuracy than classical support vector machine method.In order to conduct the experimental work betterly,this paper designs a semi-supervised Chinese text classification JAVA EE software experimental platform,the principles,functions and advantages of this platform are detailedly introduced.Specifically,the main work of this paper include:(1)Propose a semi-supervised agriculture Chinese text classification method based on feature population semantic diffusion kernel and support vector machine.The method mainly involves the following steps:(1)Data acquisition and data preprocessing.Utilize the crawler of this system to obtain documents of related sections in the Farmers' Daily,China Fishery Network and Chinese Agriculture and Forestry as the agricultural information data set,then utilize the Ansj Chinese word segmentation system based on the ICTCLAS Chinese word segmentation algorithm of Chinese Academy of Sciences to conduct the segmentation.(2)Characteristic Selection.Utilize the stop words list to delete the stop words,calculate the frequency,reverse word frequency,TF-IDF and chi-square value of each word.Experimental result shows that the classification accuracy is improving along with the increase of characteristic term quantity and it will finally approach a limit.This paper selects the top 1,000 words with highest chi-squared statistic as the characteristic.(3)Information vectorization.This paper utilizes the vector space model to realize the text information vectorization.(4)Agricultural text information classification.Based on the generated vector file,utilize the method proposed by this paper and classical agricultural text classification method based on support vector machine to realize the agricultural information classification respectively,the experimental results are analyzed.(2)Design a semi-supervised Chinese text classification JAVA EE software experimental platform based on feature population semantic diffusion kernel and support vector machine which can be applied to small website.The system is based on the Java EE technology and relational database technology to provide the open,flexible,efficient and robust experimental platform to the Chinese text classification based on support vector machine.
Keywords/Search Tags:text classification, agricultural text, support vector machine, semantic diffusion kernel, software experiment platform
PDF Full Text Request
Related items