| Information on the Web increases day by day, kinds of methods are proposed to make use of the information. The vertical search engine solves the problem partly. It's professional or topic-oriented search engine, only collecting professional or topic-related information and it extracts specific information from the Web. This paper is focused on extracting topic-related information from the Web pages.At present, the implementation of Web information extraction is complex, a Web pages Topic-Information extraction method based on tag sequence has designed in the paper. The method sets a strategy, with the strategy and samples, we build the rule library and use the rule library to extraction Topic-Information from the Web pages, it reduces the complexity of processing Web pages and the processing time of the pages. Through using the method to extract phone parameters Web pages from some sites, we get a profect result on Recall and Precision in every site, it proves that our method is the application of feasibility.For the problem that the Wrapper can't adapt to the change of structure for appearing Topic-Information new attribute on the Web pages, the paper presents a method to discover the Topic-Information new attributes on Web pages based on the credibility. Through analyzing the characteristic of the attributes which will be extracted and the attribute which has been extracted, introducing the theory of the credibility, it quantizes the credibility of what needs to be extracted of the attribute based on some rules and evidences, and decides whether the attribute is need to be found. Through using the method to find phone parameters attributes from some sites, it proves that our method can find the Topic-Information new attributes accurately.At last, a vertical search engine prototype is designed and we mainly complete the special search spider module' design in detail. It integrates the Web topic information extraction method and new Web pages attributes discovery method which is proposed in this paper to collect the Topic-Information in the Web pages. |