| Binary code similarity comparison is a basic technology of software security technology,which can be used in software plagiarism detection,malicious code detection,software patch analysis and automatic reverse analysis.However,the current binary code similarity comparison technology generally has problems of efficiency,accuracy,scale,and aspect.The efficiency and accuracy of code similarity comparison techniques is a contradictory attribute.In theory,the more complex and detailed the contrast technique,the higher the accuracy of the comparison.However,this makes the comparison process very time consuming and cannot cope with large code comparisons.Conversely,a simpler and coarser comparison method is used.It can reduce the time consuming of the comparison process,but it does not guarantee the accuracy of the comparison.In terms of scale,the current binary comparison tools are mainly based on memory,and the features extracted from the code cannot be stored permanently,which makes repeated feature extraction steps for each comparison process.The basic research problem of code similarity comparison techniques is to detect whether a component in one program is similar to a component in another program and quantitatively measure the similarity between them.A component can be a single function,a set of functions,or an entire program.In this paper,we study the comparison technique at the binary function level for the problems of binary code comparison technology.The main work is as follows:1.Aiming at the problems existing in the current binary code comparisontechnology,this paper designs a binary code code comparison algorithm,mainly proposes the selection and extraction methods of binary functionfeatures,and designs a heuristic function comparison algorithm based onthe extracted features.Experiments show that the accuracy of the algorithmis high.2.This article implements a binary code parallel analysis framework.In orderto further improve the efficiency of the comparison algorithm,fully exploitthe parallel part of the algorithm,and implement a binary code parallelanalysis tool by using the distributed programming toolkit.Experimentsshow that the tool has high performance in binary code comparison 3.A set of massive binary code feature storage and retrieval mechanism isimplemented.In terms of scale,the current binary comparison tool is mainlysuitable for one-to-one comparison.In the case that there are only binaryfiles to be analyzed and it is impossible to grasp similar analyzed binary files,such a method does not apply.In addition,as security Analysts,we hope tobe able to store the analyzed binary data,such as a library with a certainvulnerability,in a convenient way to compare.Aiming at this problem,thestorage mechanism of binary function features is proposed.The graphdatabase Jaunsgraph is used to store the binary function features,and thedata model is developed.The binary code sample library needed foranalysis is constructed. |