Font Size: a A A

Design And Implementation Of Government Policy Text Classification System

Posted on:2019-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhuFull Text:PDF
GTID:2416330590475440Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the Internet data age,government information tend to be public and transparent.How to improve the management efficiency of government in public affairs has become a hot topic.The main work of government is to make a series of policies in the management of public affairs.Therefore,how to organize and manage the government’s policy information has become the key point to solve this problem.In this thesis,text mining technology is used to analyze the text data of government policy by text classification,effective analysis means for policy text is provided,and promote the management of government information in the direction of intelligence.Most of the researches on text categorization are based on standard corpus,few researches have been done in specific fields.Based on the policy text of government official website,this thesis uses 18096 policy texts for research.Ten major categories are extracted as the classification target of the data.Through data cleaning and feature engineering,text model is constructed,according to classification evaluation indicators,different classification alogorithms are compared and analyzed,calling the constructed model to accomplish the design and implementation of the classification system.The main contents of this paper are as follows:(1)Through the network crawler to obtain the policy text and category information of the government website,the text corpus of policy notification is constructed to provide the research foundation for the classification of policy text.(2)The feature selection of the words after the policy text segmentation is made,and the mixed feature selection method,MFS,is proposed to reduce the dimension of original text corpus,compared with traditional feature selection method,TF-IDF,MI and CHI,MFS shows a better classification result.(3)The topic model LDA is applied to the study of text classification.As a means of dimension reduction,the LDA theme model can reduce the text space from tens of thousands of dimension to tens of dimensions,and get a good classification effect.(4)The deep learning model is applied to the text classification task.The text classification model is constructed with Word2 vec combined with TextCNN,and the text classification effect on the large-scale corpus and self-built corpus is compared and analyzed.The experiment shows that it is easy to appear overfitting phenomenon in small-scale corpus.In the end,by contrast and analysis,MFS is choosed as the feature selection method,text modeling is carried out by TF-IDF,and SVM is used as the classification algorithm to construct a policy notification text classification model.The classifier obtained through training set achieves an average accuracy of 92% on each class of test set.calling the trained text classification model,a webpage based text categorization system is implemented.
Keywords/Search Tags:text mining, policy notification, text classification, machine learning, deep learning
PDF Full Text Request
Related items