| Before the Internet era,data acquisition is subject to the influence of time and other factors,because the network is not developed,the difficulty of data acquisition,high speed,fewer sources for data acquisition,lead people to feel very difficult.In data processing,can handle data types are mostly structured data,for unstructured data also failed to deal.Now,with the rapid development of the Internet,access to information,increase the breadth of speed,it is no longer subject to time and space,plus the unstructured data accounted for more and more,the traditional structured data can not meet people’s needs analysis.At this time,the data acquisition is limited by the integration of information.Because of the massive data,especially the unstructured data such as text,sound,image and so on,how to integrate the useful information has become an important problem.At present,most of the information integration depends on human resources,so how to realize the integration of automated information has become a challenge.The purpose of this study is to obtain economic data as an example,through the realization of Python urllib crawler technology data;screening formula to achieve data screening using expert opinion;descriptive statistics correlation method for data processing to achieve the overall grasp of the text data;using Natural Language Processing technology;data clustering with vector space model;data system the integration of automatic self written article.So that we can automatically process the text data,so as to achieve a set of automatic processing of text flow.For people to play a supporting role in decision-making,greatly improve the efficiency of people to facilitate people’s lives. |