| With the high demand for the safety guarantee capability of Chinese railway system and the arrival of the era of big data,it is urgent to conduct comprehensive learning and analysis of railway accidents in the past,and to convert tacit knowledge in railway accident data into explicit knowledge,so the analysis of railway accident dataset will be of great significance for accident prevention of railways.In this study,an establishment and mining analysis method of railway accident dataset was designed and the law and key causes of accidents were studied to provide reference for further improving the safety level of railways in China.The main contents of the paper are as follows.(1)According to the characteristics of railway accidents in China,the Cognitive Reliability and Error Analysis-Railway Accidents(CREAM-RAs)classification model was designed to accurately classify and extract the man,technical and organizational factors in railway accidents.The accuracy of the CREAM-RAs model was verified by calculating the Kappa value of Inter-rater Reliability and Intra-rater Reliability.Under the CREAM-RAs classification framework,811 railway accident reports were decomposed and coded to construct a Multi-Attribute Railway Accidents Dataset(MARA-D).(2)The Unified Distance Matrix(U-Matrix)of the Self-Organizing Maps(SOM)algorithm was improved.The parameters were optimized on the basis of considering the total number of existing neighbor units of each unit,which solved the edge problem and the boundary problem in the SOM graph.By improving the U-Matrix SOM graph,the cluster number and centers can be more intuitively obtained.(3)The integrated clustering method of railway accident based on SOM and K-Means was designed.The weights of SOM winning neurons were used as the input data of K-Means clustering.The cluster number and center obtained in SOM graph were used as the initial input parameter of K-Means.Thus,the accuracy and visualization of accident clustering were improved.Through the integrated clustering method,the MARA-D was divided into different feature accident clusters.(4)Based on the weight difference between different levels of accidents,an improved association rule algorithm(Accident Level Apriori,AL Apriori)was proposed.The accident clusters were analyzed by AL_Apriori algorithm and the strong association rule sets were dug out.Then,the improvement measures of the key accidents in each cluster were given.By comparing with the strong association rule set under the overall MARA-D,it was proved that compared with not using the cluster analysis method,the association rule based on the clustering helped to identify better and useful strong association rule results.(5)A comprehensive weight determination method based on the association rules and DEMATEL was designed,comprehensively considering the initial weight,influence degree and influence degree of each accident cause,so that the weight calculation was more accurate.Finally,the traditional support degree,weighted support degree and designed comprehensive weight determination method were used to calculate and rank the weights of accident causes in MARA-D,which verified the validity of the comprehensive weight determination method.The key causes of railway accidents in China were obtained and corresponding improvement measures were proposed. |