| With the booming of Internet audio and video websites,this new type of media has gradually begun to replace the traditional information media.Audio and video site as a public service for all people,all users can upload,share their own audio and video information,which has caused great influence and trouble,so the establishment of a full network of audio and video sites for unified supervision of the audio and video analysis system has become an urgent demand.Audio and video analysis system needs real-time collection of audio and video information,in the process of collection,we need to filter the information of the video web,filter out the video advertisement information unrelated to the analysis content,thus optimizing the whole operation efficiency and improving the system performance.For the above reasons,it is necessary to study the technical means of filtering the relevant web information efficiently and accurately in the real-time computing environment.There are many problems in the existing Web page information classification means,such as low efficiency,high bandwidth occupancy,poor real-time processing performance,this thesis,based on the existing Web page classification technology,studies the Web page classification technology based on link feature,and combines the traditional Web content based classification technology,A kind of model which is more suitable to solve the problem of real-time Web page classification is given.The main works of this thesis include:(1)This thesis studies the Web page classification technology based on the link feature,improves the feature extraction method based on the word weight,and determines the weight of the word by calculating the correlation factor of the word,this method can effectively reduce the feature dimension after the URL segmentation,and improve the processing speed of the classifier,in order to realize the Web page classification in real time system;(2)In this thesis,the improved Web page classification technique based on link feature is used in combination with traditional content-based Web page classification,and a multilevel Web page filtering model for real-time computing environment is presented.The model can dynamically adjust the traffic flow,balance the speed and accuracy of the classification of web pages,and ensure the classification of web information stably and efficiently in the real-time system.To sum up,the main work of this thesis is to give a new feature extraction method,this thesis improves the Web page classification technology based on link feature and based on the traditional Content-based Web page classification technology,presents a multilevel Web page filtering model,and verifies the feasibility of the model by designing and implementing a prototype system. |