| Web page objects locating is the key technology for web information extraction.Through this technology, the valuable information in web pages can be automaticallyand exactly located. On the basis of this, it becomes easy to extract data from webpages. Therefore, Web page object locating technology is fundamental in areas of webdata mining, vertical search, search engine and so on.In this paper, a web page object locating method based on multi features fusion isproposed. By fusing multi locating methods, the precision rate and stability of webpage object locating method perform better than any of the single method. Thelocating method can be divided into two phases, which are feature extraction phaseand web object locating phase.At web page object feature extraction phase, a feature description language isfirstly defined to express all kinds of web object features. The language is open andextendable, in which new features can be added in the future. Then, a method ofextracting web object DOM tree path feature of a web page was realized. On the basisof this, two web page object location method are proposed one by one, which arecompressed DOM tree based method and reference point method. The above threemethods extracts three different features of a web page respectively.To verify the validity of the method, a test is continued and the result says themulti feature fusion method performs better than others. |