| As the end of 2015, the national highway network toll system work has been completed and which has stored large amounts of toll data. But how to use professional knowledge to analyze the data attributes and data process to acquire useful information has become a key problem. With the complicated network system, parts of the drivers intended to cause toll fraud behaviors due to profit driven, which brought too much management problems for the operation of network system. Under the big data background, this paper uses data mining tool to find out the hidden information and build a highway toll fraud probability model in order to solve the problem of toll fraud and this can facilitate the manage department to make the scientific solutions to reduce the workload of toll fraud detection and improve the work efficiency.Firstly, this paper not only systematically introduces the functional architecture of highway network toll system and the data transmission model, but also explains the typical and new means of toll fraud behaviors. It classifies the characteristic variables of toll fraud, which offer the basic knowledge for data mining technology application for the first time.Secondly, through the technology way of data warehouse establishment and combining with the Microsoft SQL Server 2008 platform to achieve the function of data extracting, transforming and loading. Which help to build the vehicle trip chain as the subject of inspection of toll fraud data warehouse.Finally, based on inspection of toll fraud data warehouse and using SAS 9.3 data mining tool to analyze the characteristic variables, it is pointed that the way of combining K-means clustering with discriminant analysis can quickly and effectively identify the toll fraud ETC card and improve the hit radio. With the analysis that the probability of toll fraud is significantly influenced by a series of factors, including the properties of vehicle, travel link, overweight and time abnormality. This paper innovatively uses logistic algorithm to establish the tool fraud behavior prediction model. Through the evaluation and validation of the model, shows that the prediction model not only to achieve the classification of the tool fraud groups, but also offer the dynamic prediction function for unknown tool fraud record and the correct prediction ratio of toll fraud probability model is as high as more than 95%. |