Font Size: a A A

Research On Integration Method Of Multi-source Heterogeneous Civil Aviation Passenger Service Data

Posted on:2020-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:W HuFull Text:PDF
GTID:2392330596994473Subject:Air transportation big data project
Abstract/Summary:PDF Full Text Request
With the vigorous development of Chinese civil aviation business,more and more passengers choose to travel by air.Airlines and travel websites generate a large number of passenger service data every day.However,these data come from different sources,and there are some problems such as heterogeneous schemas and data redundancy,which seriously affect the effective use of data.Data integration is the key method to solve such problems.Therefore,it is of great significance to study data integration methods to eliminate schema conflicts and data redundancy of multi-source heterogeneous civil aviation passenger service data so as to improve data quality.Firstly,for the problem of schema heterogeneity,a multiple schema matching method based on SimHash and mixed similarity is proposed.This method constructs signatures of attribute columns based on PMI-SimHash algorithm to represent attribute features in order to reduce feature dimension,then calculates the mixed similarity of attributes based on attribute clustering analysis and constructs attribute mapping graph to show the matching relationship between attributes.Secondly,for the problem of data redundancy,an unsupervised self-learning method for entity matching is proposed.The method divides the multi-source data into blocks by local sensitivity hash algorithm,divides the records with similar features into the same blocks,and reduces the number of candidate matching pairs.Then,the training set is selected based on unsupervised seed selection algorithm and a self-learning algorithm based on RVM is proposed to generate labeled entity data sets,which avoids the additional cost of labeling data manually.Finally,experiments on real multi-source heterogeneous civil aviation passenger service data prove the feasibility of this method,which provides an efficient and scalable solution to the schema conflict and data redundancy in the integration of multi-source heterogeneous civil aviation passenger service data.
Keywords/Search Tags:civil aviation passenger service data, multi-source heterogeneous, multiple schema matching, mixed similarity, entity matching, data integration method
PDF Full Text Request
Related items