| With the help of the development of the technology on the field of internet, www becomes more and more popular. As a result, many websites are being built. As the violent competition in the internet economy, only the one who attracts the customers can survive. The behaviors of the customers become digital, which makes it possible to collect a lot of data in order to further investigate the behavior of the customers. It is one of the most important problems which we confront that how to find the valuable and understandable information from the "no sense" and boring data. The technology of Web data mining is the method to solve this problem.In this thesis, the investigation of the web log mining technology and its process are focused on and the process of the data preprocess, method of this process and the solution of the problems, including identifying the users and completing the path of the users are investigated. The classic algorithm of association rule Apriori algorithm is introduced. After investigating some of the improvement of the Apriori algorithm, the IApriori algorithm is given, which is based on the the technology of reduce the scale of the database and the improvement of the process of join. The time complexity and space complexity of IApriori algorithm is less than Apriori in theory. In order to demonstrate the efficiency of IApriori algorithm and to apply the technologies which are investigated into practice, the logs of the 50th birthday of heu celebration website are processed and analysed through IApriori algorithm and Apriori algorithm respectively. The result of this experiment shows that IApriori algorithm is much better than Apriori algorithm in time complexity and space complexity. In order to make the compareion more universality, after given different minsupp, the same logs are analysed by IApriori algorithm and Apriori algorithm respectively, the result of this experiment shows that I_Apriori algorithm is more efficient than Apriori algorithm when given different minsupp. Finally, the logs of the website are analysed by I_Apriori algorithm. With the help of the result the disadvantages of the website are found and then the improvements are given. |