The Study Of Web Data Mining Based On Sequence Pattern

Posted on:2013-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:M C Bo

Full Text:PDF

GTID:2248330377455658

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Finding the pattern and the rule of a large amount of data is what the significance of data mining is. As a subject, data mining is only for the traditional database at first. Along with the development of the data mining technology, more and more people were aware of its potential value, and more and more research and attention were invested, of course, Data Mining gave us so many nice achievements. In last few years, the rapid development of computer drove the risen of the Internet which was based on computer, the development of the Internet made the data of Web more and more larger, and the data mining technology is also applied to the Web data, but because the Web data from large and heterogeneous, Web data mining is facing many problems and difficulties.According to the different forms of objects of mining. Web data mining can divided into the Web content mining, Web structure mining and Web log mining, this paper combines sequence pattern mining of the traditional database, analyzes and researches information of access to the Web log. Sequence pattern mining is one of data mining method, and it excavates data from the direction of data sequence. But the information from the Web access log is not directly processing, because these log messages are out of order, and contains a lot of data which is abnormal or no need for data mining. If do not remove or deal with these data, the mining process is difficult to carry out, and the mining result is difficult to meet the needs of users. So, the preprocessing to data is essential. Through pretreatment, the original data will be arranged well, and the redundant will be removed to conform to the requirement of algorithm.Based on the existing algorithm of sequence pattern, this paper presents an improved algorithm for Web log mining. This algorithm uses the bitmap method, and designs a structure to save data. With this structure, excavating on Web log not only removed the process of generating candidate sequence, but also makes the computing to support degree of the sequence efficient. Also, the improved algorithm has also absorbed the prefix method of algorithm PrefixSpan, using the prefix, and modifies the new structure to reduce the range of sequences in the process of scanning the database. Through comparison of experimental data, the improved algorithm has improved significantly on performance, but what can not be neglected, the algorithm has high efficiency to treat small scale database, when to mass scale database efficiency is not very clear.

Keywords/Search Tags:

Web log mining, sequence pattern, bitmap, prefix projection

PDF Full Text Request

Related items

1	Research On Approximated Sequnential Patterns Mining Algorithm Based On Bitmap
2	Research Of Sequential Pattern Algorithm Over Data Streams Based On Prefix Sequence Tree
3	Based On Markov Chain Web Access Sequence Mining Algorithm
4	Study On Techniques Of MC-pattern Based Sequence Projection Clustering
5	Research On Graph Frequent Substructure Mining Method
6	The Log Pattern Cluster Mining Algorithm Based On Prefix Tree
7	Study On Sequence Patterns Mining And Its Application In Intrusion Detection
8	A Bitmap-based Approach For Fast Name Lookup In Named Data Networking
9	The Studying Of Sequence Pattern Mining Based On DF2Ls
10	Research On Parallel Sequential Patterns Mining Algorithm Based On Prefix Tree