Font Size: a A A

Examine Manipulated Datasets With Topological Data Analysis

Posted on:2020-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuoFull Text:PDF
GTID:2370330623963640Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the value of data has increased dramatically,but data security issues have emerged with many new features.The first,as the increase of the value of data,cyber attackers may be more inclined to manipulated the data records in the database,rather than stealing the them for ransom.Former National Security Agency Director Keith Alexander pointed out that data manipulation is the latest technique in the "art of war in cyberspace".In addition,due to the low density of data information,locating of effective information becomes difficult,resulting in that the network attackers have more opportunities to hide manipulated data records in the normal parts.Moreover,with the development of deep learning,a new type of data manipulation has emerged recently:adversarial examples of deep learning algorithms.The adversarial example is a sample that can make the deep learning model wrong,it is obtained by manipulating clean sample with a small distortion and it can be generated by diverse manipulations.The existence of the above problems makes the verification of data authenticity more urgent,but the China Information and Communication Research Institute pointed out in the "White Paper of Big Data Security",there is no strict data authenticity identification and detection means.In view of the difficulty of directly detecting based on numerical values,this paper proposes to use the mutual relationship of data records to find the essential properties of the records,these properties are unchanged under small and diverse distortion.In order to find the essential properties of manipulated records,we propose to introduce Topological Data Analysis(TDA)into the detection of the above data manipulations.TDA is a data processing technology based on topology,computer science,statistics and computational geometry.Unlike conventional data processing methods that focus on the data values themselves,TDA pays more attention to the shape characteristics of the data,and these shape features often do not change with the distortions of data records.In this paper,simplicial complex is applied as a topology tool to judge the categories of the true and false data.The simplicial complex is an approximation of the original data space,which has the same topological characteristics as the original data space.In the actual processing,we fix the system parameters to make the data with different class have different simplicial complexes,and classify the new data via their simplicial complexes.In this paper,classification methods against data of different scales are designed.For the banknote dataset in chapter 3,we first manually found topological features that are able to discriminate genuine and manipulated datasets,these features are stable in different manipulations.And then,we further used them to infer the class of manipulated datsets.For the subset of CIFAR10 dataset in chapter 4,this paper realized the auto-matic recognition of the complexes of images with large scale by convolutional neural network.Finally,the model successfully defended adversarial examples involved in this experiment.Morever,the model also showed its versatility comparing to the adversarial training method in the face of different attacks.These results confirmed that:1)under certain parameters,the complexes of different types of data have different shapes;2)the overall manipulations of the data records cannot change the simplicial complex of them;3)the detection results of TDA are not affected by the mode of manipulations.This demonstrated the effectiveness of TDA-based detection method and its versatility for different manipulations.
Keywords/Search Tags:Data Security, Topological Data Analysis, Mapper, Simplicial Complex
PDF Full Text Request
Related items