Font Size: a A A

Research On Missing Data Imputation For Civil Aviation Passenger Classification

Posted on:2021-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:W YuanFull Text:PDF
GTID:2392330611468918Subject:Air transportation big data project
Abstract/Summary:PDF Full Text Request
Civil aviation passenger service information data system is one of the important information management systems of civil aviation.It has rich passenger-related mass business data and has important commercial mining value.However,in a real production environment,data values will inevitably occur during the process of collecting,transmitting,and saving data in the data system,which will affect airlines’ classification of passengers and other related data mining operations,which will greatly affect airlines.Economic losses.In order to improve the quality of data and the revenue of airlines,it is of great significance to study the method to complete the filling of missing passenger data.This paper first analyzes the reasons for the lack of characteristics of passenger data in the passenger information system of civil aviation and the impact on subsequent data mining tasks such as classification of civil aviation passengers,and summarizes the domestic and foreign experts’ solutions to such problems.In the context of the two categories of classification services,civil aviation passenger churn classification prediction and civil aviation passenger value classification,first,a SMOTE algorithm based on partial distance is proposed to solve the problem of imbalanced sample categories under missing data.Secondly,in view of the lack of missing samples in the production environment and the problem of high labeling costs,a new network model of missing data that combines multi-task deep learning and active learning is proposed to actively learn the missing samples of the minority class during model training Annotation generates highquality training samples,improves the robustness of the model,and thus improves the filling accuracy and classification accuracy of the model.Finally,considering the production big data environment,the Spark computing engine is introduced,and a Spark-based active learning filling framework is proposed so that the model can efficiently fill the missing data of civil aviation passengers in the big data environment.Relevant experiments show that the method proposed in this paper achieves good results in data filling and classification tasks,and has great engineering practical value.
Keywords/Search Tags:civil aviation passenger data, missing values, SMOTE algorithm, multi-task deep learning, active learning, Spark
PDF Full Text Request
Related items