| With the popularization and the development of the network, the Internet has become an important tool in people's study, work and daily life. In recent years, the content of Internet grows very fast. The amount of it should be calculated with TB. There are tens of millions of websites, billions of users, and this number is still growing .The rapidly improvement of web has changed people's lifestyle, more and more people use the Internet to publish and search information .Because the web information extraction can extract the content of pages, and the Internet has become a big database, it is possible that the data can be reused in many different ways. Constructing vertical search engines is one application. The information extraction of the Web text which related to the theme is the key technology effect the vertical search performance. The web information extraction of the vertical search is to extract the information of the same theme. The web information extraction has become a focus in today's natural language processing.The paper analyses an information extraction method which based on artificial neural networks, and also analyses its defects, and propose an improvement method at last. The paper explained the improvement method and proved the method by experiment. The method contains three terms:1) We add the correlation calculation into the principles of filtering.2) We first merge the text lines which didn't write with a good rule, aimed to impress the processing rate.3) The BP neural network model is used for training. We want to get more rational threshold with its characteristics.In additional, the paper designed an information extraction and classification model.This paper studies how to extract the content of Web documents, and classify these web documents based on these results. |