| In recent years, more and more data needs to be processed in tax systemfollowing the automation of office. Therefore, it sets higher demands in dataclassification and knowledge extraction while tax system still remains in thephase of simple data retrieval.As the core of office automation, tax document system takes charge ofdocument processing and is the main way of releasing the tax policies. Sincethe operation of the system, a lot of document data which contains unknownand potentially useful tax information are generated. Tax staff need grasp andimplement the newest tax policy through the official documents processingsystem, how to identify tax policies quickly and effectively become theimportant challenges of tax information. Currently, the tax documentprocessing system selects and retrieves tax policies artificially and is time-consuming and not systematic. According to the characteristics of tax policies,document text mining methods are researched in this paper. Through theBayesian algorithm base on weighted subset of attributes, attribute clusteringalgorithm and the regular automaton model, tax policy document recognitionis discussed.Tax policy document recognition algorithm has the followingcharacteristics: First, applying regular property automaton model to clusteringand classifying of attribute subsets to solve the identification of key attributessubsets, the normal attributes subsets and interference attributes subsets;Second, computing the weight of attribute subsets according to improved TFalgorithm and enhancing the precision of documents text classificationthrough the simple Bayes algorithm on the basis of weighted subset ofattributes. |