Font Size: a A A

The Application Of Clustering And Principal Component Regression In The Economic Indicator Data

Posted on:2011-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:2189360305454971Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Application of Clustering, Principal component Regression analysis in the economic indicator data. Tremendous changes have taken place on China's urban social economic construction since it set up in 1949.Urban development is becoming more rational layout with rapid progress of urbanization. The economic structure has been further improved. Economic of the urban plays an important role in the national economy. Rapid urban construction, urban quality of life and living conditions greatly improved.The main source of data in the paper is in the file named 10-3:main economic indicators of capital cities and cities with independent plans, it is included in 2009 China Statistical Yearbook. The data mainly described the 23 economic indicators of the capital cities and cities with independent plans (a total of 36), economic indicators from these numerical show the gap between the development of cities, which mainly described in the medical, public health, education, transportation and other aspects.The paper researches on the data of the economic indicators by the SPSS statistical software. SPSS has a complete function of data management and statistical analysis. SPSS has amount of characteristics, such as simple, no programming, powerful and convenient data interface. In addition, it has a flexible combination of function modules. The functions of SPSS include data inputting, editing, statistical analysis, reporting, graphics, production and so on. It has 11 types of 136 functions of its own. SPSS provides both simple statistical description and complex multi-factor statistical analysis methods, such as exploratory data analysis, statistical description, contingency table analysis, two-dimensional correlation, rank correlation, partial correlation, analysis of variance, nonparametric tests, multiple regression, survival analysis, analysis of covariance, discriminant analysis, factor analysis, cluster analysis, nonlinear regression, Logistic regression and so on.The data on economic indicators were operated through SPSS, and the main research in this paper involved the two aspects as follows:1,Clustering analysisIt mainly used the application of clustering analysis to classify the data on the economic indicators of 36 cities, according to 22 attributes. We can arrive at the gap between cities category through classification of the city's economic indicators of capital cities and cities with independent plans.2,Principal Component RegressionPrincipal Component Regression is the focus of the study in the paper. It combined the principal component analysis with regression analysis together. First, it made principal component analysis of several properties to achieve the purpose of dimension reduction, then it established the regression relationship between target variables and a few independent variables separately.The main purpose of principal component analysis is using fewer variables to explain most of the variation of the original data, and it can change a number of related variables in our hands into a highly independent r or irrelevant variables between each other.It usually chooses several new variables fewer than the original number of variables which can explain most of the information in the variation, called principal components, and it can explain a comprehensive index of information. Principal component analysis is actually a dimension reduction method.The main purpose of regression analysis is to establish regression model. It determined the causal relationship between variables and established the regression model through the provisions of the dependent variable and independent variables, and solved the parameters of the model based on experimental data, and then evaluated whether the regression model fit well the measured data; if it fit well, we can predict the independent variable further. This paper describes the applied research of the main component regression in the economic indicator data. It studied the relationships among the total urban population (Y) and a number of economic indicators by principal component regression.First of all, it should determine collinearity by regression analysis. It established the regression model among general population and the 21 economic indicators, and it got the 10 economic indicators related to the total population by "the back-out method". Because the model revealed the existence of collinearity,10 economic indicators needs principal component analysis.Secondly, principal component analysis will need to check the suitability of extracting principal components. After testing, KMO's value was 0.8 or above, and gravel figure shows a straight line presented "steep slope" shape, it was suitable for component analysis. As a result, it extracted two principal components from the 10 economic indicators, and the two principal components can reflect more than 80% of the information of the 10 economic indicators, the first two eigenvalues cumulative contribution rate has been achieved to 83.887%. After the calculation of the original load factor, it obtained the expressions among two principal components (F1, F2) and the 10 economic indicators. It obtained the principal component score by multiplying the feature vector and standardized data. In addition, it reached a comprehensive principal component.Finally, the paper established the regression model between a total urban population and 10 economic indicators separately. First, it established the regression model between the city's total population and the two principal components named Fl and F2 separately through SPSS regression operation. Then it built the regression model between the city's total population and 10 economic indicators synthetically by the expression of principal component.Through researching on economic indicators data, I understood clustering,principal component analysis and regression analysis better. I learned the ideological principles of principal component regression. Besides, I mastered several operations of SPSS.
Keywords/Search Tags:SPSS, Clustering Analysis, K-means Clustering Analysis, Principal Component Analysis, Regression Analysis
PDF Full Text Request
Related items