Font Size: a A A

Analysis And Research Based On Multivariate Statistics And Machine Learning

Posted on:2020-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y J GuoFull Text:PDF
GTID:2417330578462966Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As the main way to select talents,examinations have long been commonplace in our daily lives.From the student education of quality education to the workplace life of adults,test scores are generally an important indicator of our ability.Especially in the middle school era of quality education,from the important entrance examinations to the variety of classroom tests,such massive data is flooded around students and teachers,but sometimes the use of these data is limited to some simple descriptive statistical analysis for convenience for school teachers and students.This not only causes waste of such data resources,but also does not provide effective,scientific,comprehensive,and timely information for teachers to manage students.Therefore,it is necessary to find a wide range of learning and data analysis techniques to analyze the students' achievements and to find out the key and beneficial information hidden behind the cumbersome data.The paper first introduces the theoretical knowledge of cluster analysis and factor analysis in multivariate statistical analysis.The cluster analysis focuses on the R-type clustering of clustering variables.The parameter estimation method of the principal component method and the orthogonal rotation with the largest variance.Then the basic principles and ideas of k-nearest neighbor classification and support vector machine classification in machine learning are introduced.Then the paper selects several comprehensive grades of a high school student in a representative school to analyze the case,and then clusters the original nine variables into three categories through R-type cluster analysis,and then uses factor comprehensive evaluation model to analyze the factor and extracts the three effective factors of science thinking ability,language thinking ability and liberal arts thinking ability.The three effective factors extracted at this time are completely consistent with the three types of results obtained by cluster analysis,indicating that the results of factor analysis are more meaningful,and then based on the factor scores of the three factors of science thinking ability,language thinking ability and liberal arts thinking ability are scientifically classified.For example,the highest score of the science thinking ability factor is classified into the first category,and the language thinking ability score is classified as the second category.The maximum score of liberal arts thinking ability is classified into the third category.Then,using the k-nearest neighbor method and support vector machine classification in machine learning,the samples that have been classified into three categories are fitted.According to the principle of minimum false positive rate of ten-fold cross-validation,the ksvm function of the support vector machine is classified as relatively optimal.The classifier model can predict student performance using the ksvm function of the support vector machine.The support vector machine classifier also provides a scientific model for the analysis and prediction of student achievement.
Keywords/Search Tags:Factor analysis, K-nearest neighbor classification, Support vector machine classification, Ten-fold cross-validation
PDF Full Text Request
Related items