| In recent years, the environmental problems caused by economic development have become more serious and get more attention. The traditional method to study the relationship between environment and economy in the previous researches is the empirical analysis method. However, the general empirical analysis method needs to assume the model framework beforehand and then estimate the parameters of variables. So on one hand, it needs expert domain knowledge to assume the model form. On the other hand, it will narrow the solutions space and miss other potential model because of the assumed model. Environmental problems caused by the economic development have been studied for many years and we find that in some studies the model is not continuous with boundary. In other words, there are different states under the relationship between environment and economy relying on the independent variables. The traditional research method for discontinuity problems with breakpoint is named Regression Discontinuity Design (RD). However RD still have the same shortcomings as the traditional empirical analysis methods, it also needs rich knowledge of experts to identify critical value, and then estimate the models on both sides of the critical value using the traditional regression method. Compared with traditional regression empirical analysis methods, symbolic regression based on data-driven does not require prior assumption model and can discover automatically models hidden in the data-generate system. The computer program in symbolic regression is used to reproduce, crossover and mutate to generate next generations for finding more compatible candidates, which based on Darwin’s theory of evolution. However, the classic symbolic regression method cannot model automatically for discontinuity problems due to the discontinuous point or edge, so this paper puts forward a new method of symbolic regression, namely symbolic regression discontinuity (SRD) method to solve the discontinuity problems between the environment and economic development.Therefore, this article will use symbolic regression method and symbolic regression discontinuity method to analyze the relationship between environmental pollution and economic development based on data-driven, and the basic results can be concluded as follows: (1) Modeling the relationship between carbon dioxide emissions and economic development for 67 countries based on data-driven. We find that the environmental Kuznets curve model obtained from the previous empirical researches can also got by symbolic regression method. What’s more, a new developmental pattern can also be discovered by symbolic regression method with high precision and larger coverage.(2) Putting forward the symbolic regression discontinuity method (SRD) to automatically model the discontinuity problems. Through testing benchmark function set, the symbolic regression method can effectively find the discontinuity locations and correctly estimate models on both sides of discontinuous points.(3) Studying the cross-section data of 67 countries in 2010s based on symbolic regression discontinuity method. Based on the different patterns between economic development and CO2 emissions obtained from SRD, we can divide these countries into three stages, including developing-stage, transition-stage and developed-stage.In summary, symbolic regression method and symbolic regression discontinuity method can realize to automatic model based on data-driven, namely, symbolic regression method can automatically build models of continuous problems while the symbolic regression discontinuity method can effectively solve the discontinuous problems between environmental pollutions and economic development. The two data-driven methods are effective supplements of the traditional empirical method to explore the relationship between economic development and environmental pollutions and provide a new ways to study it. |