| It is a crucial issue for users to acquire effectively the inquired information instantly from a large quantity of Web pages as obtained from a search engine. Therefore, researchers have proposed automatic Web page sorting (hereafter referred to as WAS) technology, so that the search results will be displayed by categories and the efficiency of searching is thus enhanced.The WAS generally includes the Web page purification, feature selection, vector representation, training algorithm, classification algorithm process and so on. This paper researches WAS deeply and mainly researches purification algorithm and feature selection algorithm, and apply research production to a WAS experimental system that deal with searching result.The major content of the paper contains the following aspects:1. The general procedure of the WAS is introduced. A detailed analysis of the vector representation and classification algorithm process is made.2. It proposes one kind of purification algorithm based on the partial semantic Web page. The new algorithm overcame shortcomings of too detailed classification of Web page contents thus causing incomplete withdraws of content block characteristic. It could automatically adapted to the scope of Web page content block. The experiment has proved that the algorithm is effective.3. It also proposes an improved characteristic choice algorithm CD-DF, which introduces the concept of "frequency difference". This effectively removes the noise feature out of characteristic word's space and improves the capacity of characteristic word expression. Experimental results have indicated that the CD-DF algorithm has enhanced classification performance of the system.4. It introduces a WAS experimental system that deal with the raw searching results. Empirical tests have shown that the experimental system has enhanced the efficiency of search engine for users, and meanwhile it has proved the validity of the new Web page purification algorithm and the improved characteristic choice algorithm in the practical application. |