| To check the rationality of the fuel consumption data reported by aircraft operators,third-party verification agencies are generally based on point estimation methods.However,as an independent third party,the verification unit does not have all the data of the enterprise,so it is necessary to establish a fuel consumption prediction model based on the actual historical fuel consumption data of the aircraft operator.This method takes the estimated value of fuel consumption corresponding to the voyage as the threshold value,but data that is greater or less than the estimated value in a certain interval is reasonable.Therefore,it is more reliable to use interval estimation and give a reasonable estimation range.At the same time,fuel consumption data is imbalanced,which leads to the lower quality prediction interval.Based on the above problems,this paper first proposed an interval estimation model based on XGBoost,and then from the data level and algorithm level,proposed two methods based on this model to improve the quality of the interval estimation interval for unbalanced data.First,an interval estimation model based on XGBoost algorithm is proposed.The XGBoost algorithm is used as the main machine learning method,and the quantile loss is used as the loss function of the XGBoost algorithm for the interval estimation task.Among them,the quantile loss function of the XGBoost algorithm is improved,and the small area around the origin of the first derivative is smoothed,which solves the problem that the quantile loss function causes the tree in the XGBoost algorithm to not split;the upper and lower bounds of the interval is obtained by training the interval estimation model.Finally,this method is compared with other commonly used interval estimation methods to verify its effectiveness.Second,improve the quality of the estimation interval of unbalanced fuel consumption data from the data level.Based on the XGBoost interval estimation model,an interval estimation model based on the SMOTE-XGBoost algorithm is proposed.Synthesize new samples through the SMOTE algorithm,increase the number of minority samples in the fuel consumption data set,make the number of minority samples in the training set the same as the majority samples,and balance the fuel consumption data set;then use the training set to train the interval estimation model of the XGBoost algorithm,Finally,the upper and lower bounds of the estimated interval are obtained.The experimental results show that this method can effectively improve the quality of the estimation interval.Finally,improve the quality of the estimation interval of unbalanced fuel consumption data from the algorithm level.Based on the XGBoost interval estimation model,a cost-sensitive XGBoost interval estimation model is proposed.This model makes fuel consumption samples of different voyages have different effects on the objective function,that is,different cost information.Because the traditional misclassification cost is not suitable for interval estimation,this paper designs a cost-sensitive function based on the misestimation cost,and conducts two trainings to obtain the estimated interval.Finally,the three methods proposed in this paper are compared,and the results show that solving the imbalance problem from the data level and the algorithm level can improve the quality of the estimation interval,and the quality of the interval obtained at the algorithm level is higher. |