| With the booming of World Wide Web, the users can browse abundantinformation from Internet and the Internet has become the world biggest informationreservoir,which contains precious and malignant messages.Of course, it becmesincreasingly important to distinguish bad information from good one. We make effortto achieve this objective through information filtering.Internet information filtering is the handling process that adopts technic andnon-technic ways and mix software and hardware devices to show Internetinformation from dynamic data stream based on some rules and selective criterions. Itimproves the searching efficiency of network content for users, reduces uselessnetwork information to disturb the browser..Firstly, the thesis implemented and designed several Internet bad informationfiltering techniques. These techniques include URL filtering technique and Textcontent filtering technique. And URL filtering technique adopts URL filteringalgorithms and Text filtering algorithm. URL filtering algorithm uses black and whitelist of URL and Text content filter algorithms contains key word mathcing and LatentSemantic Index algorithm.There are two levels of the filtering algorithm for the system, the first one isURL filtering algorithm, which decides to filter the HTTP request by Black and Whitelist; the second one is the text content filtering technique, which consists of twoprocesses, one is the Keywords matching algorithm and another is the LatentSemantic Indexing algorithm. Adopting this solution can retrive a high success ratefor filtering. Finally, we improve the correctness for the system with the abovetechniques.At last, in the thesis, combining URL detecting, Black-white list, timecontrolling an intelligent network filtering system, based on personal computer,isplaned and built. The filtering system can be adopted as aid of the company firewallsystems which often work in LAN and WAN and it will protect internal network fromharmful information. |