Font Size: a A A

Applied Research On Web Text Mining In Web Log Data Preprocessing

Posted on:2008-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:H P WuFull Text:PDF
GTID:2178360215951489Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of Internet, network has become an effective platform for people to exchange and manipulate information. We put increasing expectation on the network service, hoping to obtain richer and more suitable Web services in an all-round way. However, due to its inherent openness, dynamism and isomerism, users usually find it difficult to pick up useful information from WWW in an exact and efficient way. Therefore, it becomes urgent to find a way to get the exact information, to obtain the potential knowledge besides the information, as well as to provide personalized services. This is also the hot study subject in Web data mining.Web Mining can be divided into three parts: Web Content Mining, Web Structure Mining, and Web Usage Mining. Among them, the usage mining takes the widest research area, in which the data preprocessing plays an important role. This dissertation emphasizes on the data preprocessing technology in Web Usage Mining, analyzing and studying the key problems and technology involved in the preprocessing. The main works in this dissertation are as following:1. The basic frame, procedure and mining technology are summarized. The study on the process, key technology and methods of the web log preprocessing is made.2. The related theory of text mining as well as its analyzing technology is discussed systematically and the process of Web data mining is proposed.3. The text clustering algorithm to the process of transaction identification is introduced, suitable improvement has been made in association with the web contents, based on the analysis of the disadvantages of the traditional recognition. The text clustering algorithm is modified in accordance with the requirement of the modified transaction recognition.4. We propose a model for Web journal data preprocess and validate its rationality through experiments.
Keywords/Search Tags:Web usage mining, Web Log, data preprocess, document clustering, transaction identification
PDF Full Text Request
Related items