| With the further study of blockchain technology,cryptocurrency based on blockchain technology is gradually known by people.Due to the anonymity,non tamperability and convenience of cryptocurrency transaction data,various services based on cryptocurrency transaction have emerged,such as market investment,gambling services,investment fraud and drug trafficking.In order to effectively supervise the circulation of cryptocurrency,taking bitcoin as an example,this paper uses machine learning method to identify the types of services users use bitcoin to participate in transactions,and reveals the user information hidden behind the bitcoin address to a certain extent.This paper summarizes the existing methods of bitcoin address classification,summarizes the shortcomings in the application process,including the complexity of data collection and the comprehensiveness and accuracy of classification methods.In view of the above shortcomings,this paper proposes a new bitcoin address classification model based on machine learning by combining the advantages of traditional heuristic address clustering method and address classification method based on machine learning.This model can not only evaluate the existing heuristic address clustering work with a certain probability,but also add feature selection to the address classification method based on machine learning to improve the performance of the algorithm.In the aspect of data collection,we describe the block data structure in detail and redefine the feature set.In order to select the important features which are more effective than the special currency address classification task in the feature set,this paper proposes two different feature selection methods,including the wrapper feature selection method based on random forest and the improved random forest feature selection method.The former method is simple and easy to implement,but the training cost is high.The latter improves the former method,reduces the training time,improves the classification accuracy,and has a wide range of application.In this paper,the effectiveness and feasibility of the model are verified by a number of comparative experiments.The experimental results show that the improved random forest based feature selection method is superior to the random forest based wrapper feature selection method in terms of classification accuracy and the importance of features.The former only needs fewer features to classify any address with higher accuracy,so as to complete the evaluation of heuristic address clustering.At the same time,the feature selection method is verified on some public data sets,which proves the feasibility of the method. |