| The popularity of big data sharing and big data transactions has brought convenience to work and life while bringing new challenges to big data traceability technology.As a classical data traceability technology,digital watermarking technology can be used to authenticate,track,copy control and authenticate the content of digital works by embedding some specific watermarks in the carrier data.The research direction of mainstream digital watermarking technology focuses on achieving a better balance between carrier distortion and watermark robustness,and this philosophy has gained approval in the field of traditional multimedia data traceability.However,in the era of big data,especially structured big data,data are more often used to build data mining models to obtain their potential values,and this requirement also brings challenges to the traditional digital watermarking techniques.To address this problem,this thesis considers the traceability requirements of big data in the application stage and transmission stage,and proposes four different traceability techniques for solving the problem of incompatibility between traditional watermarking techniques and existing big data traceability requirements.For big data application stage traceability requirements,the thesis proposes two digital watermarking algorithms based on decision tree classification for the most common decision tree data mining model of structured big data,eliminating the impact of traditional watermarking algorithms on decision tree model construction.In addition,for multimedia data,a misplaced-free digital watermarking algorithm is proposed to overcome the misplaced problem,which can recover the data by reversible algorithm during data mining and ensure its mining model remains unchanged.Finally,for the collusion attack in data transmission,a watermark generation algorithm against average collusion attack is proposed,which can effectively ensure the security of traceability information.The specific research contents and innovation points of this thesis are as follows:(1)Digital watermarking algorithm based on decision tree reconstruction.In order to eliminate the influence of watermark embedding on the decision tree mining model,a decision tree reconstruction method is designed,in which label state statistics,label state transfer and split value reconstruction can make the watermarked data construct the same decision tree model as the original data.After embedding the watermark in the data by the traditional watermarking algorithm,the proposed algorithm first counts the label states of the decision tree.Then,top-down label state transfer is performed for each layer of the watermarked decision tree,and the structure of the watermarked decision tree after the state transfer will be consistent with the original decision tree.Finally,the split values of the watermarked decision tree are reconstructed by constraint equations,and the reconstructed decision tree will be exactly the same as the original.The experimental simulation shows that the proposed algorithm can not only perform effective traceability after data leakage,but also ensures the consistency of the decision tree model without any difference.And the process of decision tree reconstruction does not significantly increase the embedding distortion.(2)Digital watermarking algorithm based on decision tree shift correction.The decision tree reconstruction algorithm has better results when the data volume is small and the number of decision tree layers is simple.However,when the data volume is large,there are problems that the number of adjustment layers of decision tree is difficult to control and the complexity of decision tree reconstruction is relatively high.A self-defined decision tree shift correction algorithm is proposed in order to better apply to large data volumes and to meet the user’s needs for different layers of decision tree model construction.The algorithm can first specify the number of layers to be adjusted in the decision tree model.The change in the decision tree during the iteration is then controlled by the defined directional deviation and purity deviation.The iteration is stopped until the decision tree model with the specified number of layers is consistent with the original tree.In addition,decision tree shift correction does not introduce additional distortion to the data and watermark information can be extracted without errors.Experimental results show that the proposed algorithm can satisfy the construction of decision tree models with different number of layers,which reduces the time complexity of the algorithm.And the shift correction process does not need to record all the label states,which reduces the space complexity.(3)Digital watermarking algorithm based on histogram shift without misalignment.When the data is exposed to some non-malicious attacks,the embedded watermark area and the unembedded watermark area in the traditional histogram shifting algorithm will overlap,resulting in a misalignment problem in watermark extraction.To address this problem,the proposed algorithm designs a new histogram shift rule to embed watermarks in regions that are not embedded in the traditional histogram shift,which solves the misalignment problem in the traditional algorithm.And the algorithm can effectively recover the original data and ensure that the parameters remain unchanged when the data is used for mining model construction.The proposed algorithm first generates a histogram based on the image pixel difference,and then designs corresponding shift rules for different regions of the histogram so that all regions embed watermark information without overlapping.The experimental results show that the proposed algorithm not only solves the misalignment problem of conventional histogram shift,but also obtains higher embedding capacity.And the proposed algorithm is also effective in extracting watermark information when the image suffers from some regular attacks.(4)Watermarking generation algorithm for resisting average collusion attacks.Traditional digital watermarking techniques focus more on common attacks during big data transmission,such as modification and deletion attacks faced by structured data,geometric attacks on images,etc.,but less consideration is given to collusion attacks jointly performed by multiple users in data transmission.Therefore,in order to resist the average collusion attack by multiple users,a watermark information generation algorithm with special properties is proposed.The proposed algorithm gives the properties of the watermark information generation algorithm based on the quantized watermarking algorithm,and demonstrates that the combination of the watermark information generated by the special property and the quantization-based watermark embedding algorithm can achieve the purpose of resisting the average collusion of multiple users.Moreover,the proposed algorithm can trace the conspiring users in a deterministic way based on the conspiracy data.And the proposed traceability code can be combined with traditional,e.g.,QIM watermark embedding methods,which are universal. |