| With booming development of social productive force and desire for higher living quality, ownership of motor vehicles is showing a trend of sharp increase. As a result, more cars lead to more accidents. It is a big problem that not only drivers, but also insurance companies need to face. Higher compensation scale and operating costs, together with lower space of profit, make us know clearly about the high quality customers. Although as we all know that an insurance product with low premium but high coverage is always popular among high quality customers. Companies are always paying much more attention to those who are no claim in fact. Insurance applicants who have accidents are of greater importance, compared with those who have not. Rough classfication of applicants by the number of accidents is weakening in modern society. Now how to find a new and scientific method is desperately needed.This thesis is based on classifying and analysing clearly huge data of applicants who are out of danger. By the method of mining data to find out specific characters and relationship between different types of applicants is a fast and accurate way to support the following high-tech and specialized experiments. The main work of this thesis as follows:1. The parallel K-Modes Algorithms and customer clustering.For Big Data considered, optimized parallel K-Modes algorithms according to MapReduce frame, running on Hadoop computing platform in order to improve efficiency of the algorithm. At the same time, improving the clustering center update method makes the algorithm better to use in the area of insurance. By using data mining algorithm to work out new clustering center, and applying iterative computation to deep analyze of every type of preset applicants, then drawing to a conclusion that every type of applicant has his own characters.2. The weighted parallel Apriori algorithm and the attribute association analysis in cluster.Customer attributes analysis is the behavior of association rules based on clustering results in the same cluster. Through the classical Apriori algorithm optimized to achieve a weighted algorithm and parallel processing. Author adopted the Apriori which based on some MapReduce computer frame, and weighted computed association rules, screening regardless association rules and revealing relationship between label features and other characters, with the hope of finding out potential rules among different features.3. The clustering of claimed customers and discovery of characteristics.The accumulated data of claimed customers for years extracted to count and analyze according to different attributes classes. Then the characteristics of customers are found to cluster through data mining. The records by counting scores of various attributes are classified and new cluster center is formed from the counting records. After iteration, the classes of preset customers are got and analyzed in different dimensions, so the labeled characteristics of every cluster are clarified.4. The associated attributes analysis of the same kind of claimed customers.By analysis of associated attributes of the same kinds of claiming customers, the deep data mining of customer attributes for every classes are performed. In the process of counting related rules among different attributes, the unwanted relevance among those attributes is filtered out. The potential business links can be found by adding weights to emphasize the links between the attributes of labeled characteristics and others. After the relationship between customer attributes are revealed, the business attributes and natural attributes of customers and subjects are associated, so the consuming tendency of clients in every class will be found. Thus the future technical/theoretical support can be easily implemented for later business and new product development. |