| In recent years,with the rapid development of mobile Internet technology,the acquisition and collection of mobile locations in various application service platforms has become particularly easy.In order to provide users with more humanized services,by analyzing and learning the collected spatial data,analyzing the data generated by taxis can help applications such as traffic planning,traffic monitoring,and location recommendation.If the model trained based on the original taxi data is released directly,personal information may be leaked.After all,the taxi data contains rich personal sensitive information of users.The trained model is provided directly to third-party data managers or collectors for analysis and research without any protection measures,and user personal information may be leaked.In response to the above problems,a very rigorous method of protecting differential data for privacy data came into being.Differential privacy is widely studied because it does not rely on background knowledge obtained by attackers and has strong protection capabilities.The regression analysis method under the protection of differential privacy has attracted widespread attention from researchers.Regression analysis is an important research work in machine learning.Regression analysis can predict the behavior of users such as time and place of taxi.Aiming at the problem of regression analysis under the differential privacy model,the dissertation takes taxi data as the research object,and uses the regression analysis method to predict taxi costs and taxi demand under the condition of differential privacy.Aiming at the taxi fare prediction problem,this paper proposes three model parameter calculation methods: Differentially Private All-attribute Algorithm(DPAA),Differentially Private Distance-attribute Algorithm(DPDA),and Differentially Private Single-distance Algorithm(DPSA).The DPAA method uses the longitude and latitude corresponding to the start and end positions of the taxi as sensitive attributes,and the remaining attributes are non-sensitive attributes and adds noise to the polynomial coefficients formed by them.The noise parameter and the objective function are used to solve the model parameters.The DPDA method first Under the condition of satisfying the difference privacy,the latitude and longitude of the starting position and the ending position are converted into spherical distance by using the cosine theorem and the haversine formula,and then the spherical distance is used as the sensitive attribute and the remaining non-sensitive attributes are used to solve the model parameters;while the DPSA method only uses the spherical distance Solve model parameters as sensitive attributes.Experimental results show that the prediction accuracy of the above methods is better than similar algorithms.Aiming at the taxi demand forecasting problem,this paper proposes a taxi demand forecasting method TDDP(Taxi Demand prediction with Differential Privacy)based on smooth sensitivity and synthetic training set under differential privacy.This method includes two linear regression model parameter methods,OP(Output-based Perturbation method)and OFP(Objective-Function-based Perturbation method)In each method,multiple data are first assembled into a training set that satisfies differential privacy through a join operation,and then the starting position of the taxi is converted to the corresponding area code by using the geopy geocoding library.Finally,the training set containing the area code attribute is used to calculate Model parameters.Experimental results show that the above methods have better prediction accuracy than similar algorithms. |