| Data breaches happen frequently and are becoming more widespread in the digital age.Accordingly,it is critical to model data breaches to deepen our understanding of their size and frequency characteristics.However,spatial effects are almost seldom taken into account in existing data breach risk studies.Recent research has demonstrated that incorporating spatial information about data breaches into models significantly improves the fitting of data breaches’ size and frequency.Inspired by this,this paper further innovates and optimizes the study of data breaches,which in turn enhances the ability of a business or individual entity to evaluate and defend against data breach risks.Existing research on spatial clustering of data breaches has focused on the PRC(Privacy Rights Clearinghouse)dataset as a basis for their studies.Specifically,they established state-based ARMA-GARCH models for the size of data breaches in each state of the United States and then clustered them based on the average latitude and longitude of each state,as well as the relevant ARMA-GARCH parameters,and finally established ARMA-GARCH and autoregressive conditional duration(ACD)models for breach sequences within each cluster to compare the fitting effects of data leaks before and after clustering.Based on these,this paper makes the following contributions:(1)We optimized the clustering algorithm in terms of both similarity measures and integration techniques to explore a more suitable spatial clustering algorithm for data breach scenarios.Considering that DBSCAN is very sensitive to parameters and the clustered data contains noise,we trained DBSCAN with various parameters as the base clusters and combined them with Jeffreys divergence,which is robust to noise,to design a new integrated clustering algorithm called integrated DBSCAN based on Jeffreys divergence.The results demonstrate that the improved DBSCAN algorithm is feasible and effective.(2)Based on clustering results,we further improved the prediction accuracy of data breach risk by using the Copula structure to fit the dependency of data breach size and frequency within each cluster.With value-at-risk(VaR)as the prediction effect metric,we compared the prediction effect of data breach risk within each cluster before and after considering the dependency structure,and the results show that the prediction performance under the dependency structure is much better than that of the independent structure.(3)We applied the theoretical approach explored in this paper to the field of cybersecurity insurance pricing from the perspective of spatial clusters.The comparison reveals that there are significant differences in insurance prices among different clusters,which shows that it is reasonable to include spatial information in consideration of data breach risk from a practical level.In summary,this paper proposed a novel approach to optimize the spatial clustering algorithm for improving the fitting effect of data breach risk.Based on clustering results,we considered the size-frequency dependence structure from the spatial clustering perspective to further improve the prediction accuracy of breach risk.Finally,the theoretical approach was applied to the field of cybersecurity insurance pricing to provide feasible suggestions for differentiated pricing of insurance companies. |