| The Internet is more and more indispensable to people’s work and daily lives since the rapid development of the Internet technology, the effect of which has promoted further development of the Internet. As a result, however, huge amount of data are created and widely present in web sites. In recent years, people began to pay attention to those data in hope of finding useful information. For instance, for a web site with potential to have a large number of visits, the structure of which can be optimized to attract more users using the habits of its users. Aiming to fulfill the requirement, web data warehouse and web data mining are proposed by combining web technology with relational database and data mining.In this paper, a method based on the combination of data warehouse and web log mining is proposed. The background of web data mining is introduced with systematic elaboration of the thoughts, theories and methods of web data mining. And, the process and key techniques of web log mining are analyzed and data preprocessing technology of web log mining is highlighted. As a result, user session identification method is improved. Then, using the sequence of users’session derived from the data preprocessing for analysis, a logical data warehouse model that fits for general sites are proposed and a physical model are established. Finally, using users’ session sequences derived from the multidimensional analysis process in the data warehouse as input, an improved Apriori algorithm is designed for the habits mining and analyzing in order to improve the site structure to increase click rate. To sum up, the contributions of this paper are as follows:(1)New user session identification algorithm is proposed to acquire more precise users’session sequence.(2)Multidimensional Web data warehouse model is established to realize multi-angle analysis by integrating web log mining technology into Web Data warehouse. (3)Mining algorithm can reflect user habits better since the measurement of data warehouse is set to users’session sequence rather than click rate.(4)An improved Apriori algorithm is designed to reduce database access when the number of single items of users’session sequence is large for the sake of algorithm efficiency. |