Font Size: a A A

Research On Data Cleaning And Repair Methods For Vessel Status Data

Posted on:2019-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:H R DuFull Text:PDF
GTID:2392330596965414Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Vessel status data contain a large amount of valuable information.Data mining work for vessel status data can provide strong support for the development of smart water transportation.However,there is a large amount of problematic data in these data,so cleaning and repairing these problematic data before data mining is an essential step.There are many existing data cleaning and repair methods,but if they are directly applied to the vessel status data,the results are not satisfactory.To ensure the accuracy of data cleaning and repair,each problematic data needs to develop a specific method for cleaning or repair.Based on this,this thesis will focus on the two types of problems that have the most serious impact on data mining work in vessel status data.That is the problem of duplicates and the problem of missing trajectory data.The main works of this thesis are described as follows:(1)An improved sliding window strategy is proposed,and combined with the filtering strategy based on Top-K to solve the problem of the existing similar duplicate data detection algorithm-SNM(Sorted-Neighborhood Method,SNM)which has low detection efficiency.The improved sliding window strategy is a dynamically variable strategy for sliding window size.The strategy changes the size of the window at any time according to the detection situation,which significantly reduces unnecessary comparisons and reduces the number of missed matches.The filtering strategy based on Top-K makes the comparison of two data that do not meet the conditions end in advance,and the detection time is greatly saved.The comparison experiments prove that the improved algorithm in this thesis is superior to other algorithms in detection efficiency.(2)The process of setting the weights of the fields is too subjective,resulting in the problem that the accuracy of the SNM algorithm is low.To solve this problem,this thesis proposes an improved method for calculating field weights.The improved method combines the subjective hierarchical calculation method and objective statistical method to calculate field weights,this makes the setting of field weights more reasonable,which in turn makes the calculation of similarity more accurate.In addition,the improved sliding window strategy also help improve algorithm detection accuracy.The comparison experiments prove that the improved algorithm in this thesis is superior to other algorithms in terms of recall,precision,etc.(3)To solve the problem of low accuracy of existing vessels' missing trajectory data repair algorithm,this thesis first solves the pseudo-continuity problem caused by erroneous data in trajectory data,and then proposes a repair algorithm for the vessels' missing trajectory data.The repair algorithm first builds a vector model of the ship's motion,linking the vessel's spatial position with time,then the vessel space location interpolation function is constructed based on the idea of polynomial interpolation,and the existing parameters of the vessel are used to optimize the parameters so as to achieve accurate repair of missing trajectory data.The comparison experiments prove that the repair algorithm of this thesis is superior to other algorithms in the repair accuracy.
Keywords/Search Tags:vessel status data, data cleaning and repair, similar duplicate data, missing trajectory data
PDF Full Text Request
Related items