Font Size: a A A

The Research On The Technology Of Web Classification In Search Engine

Posted on:2012-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2248330362963472Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet technology, people have entered into theinformation era. In this information era, information means the wealth, how to obtainaccurate and valuable information quickly has become a key link. As time goes on, thereappeared a large number of information resources with different structure, and most of theseresources exist in the form of Web texts, which contain a large number of valuableinformation for people, so how to extract useful information form the mass Web resourceshas become a question which need to be solved. The technology of Web text classificationhas been developed which based on the existing text classification theory and technology. Itabandons the original artificial classification and saves lot of manpower, material resources,also it can effectively improve the speed of retrieval for users, and can classify the retrievalresults accurately, it has become a hot research in the fiele of information orocessing.This paper introduces the background and status from domestic and abroad, andexpounds the related theory and technology of text classification. It have a clear idea of howto solve the question which based on summarizing the relevant theoretical knowledge andthe analysis of the structural features of Web page. The first step we use robot to collectWeb page from Internet, extract the text information from Web page and then textinformation should have a preprocessing, converted to text format, finally, we construct aclassifier and classify the Web text by classification algorithm. In this paper,proposeddenoisied method based on the block of information, combining the text frequencyand CHI to select items, classifyied Web text by multiple classification of decisions SVM,and proposeied a design idea of classification search engine.It verified the theoretical method which proposed in this paper by experiment, theresults show that the extraction of information and Web classification is more accurate.
Keywords/Search Tags:Information Extraction, Characteristic Selection, Text Categorization, SVM
PDF Full Text Request
Related items