Font Size: a A A

Models And Application On University Subjects And Students Data Analysis Using Multiple Data Mining Strategies Method

Posted on:2022-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YangFull Text:PDF
GTID:2507306491985709Subject:Master of Engineering Software Engineering
Abstract/Summary:PDF Full Text Request
Higher education is moving towards popularization and high-quality gradually with the rapid progress of domestic college reforms.The construction of an efficient infor-mation management method is an important part of colleges and universities during the reform.Informatization construction can help colleges and universities to collect,orga-nize and analyze the massive amounts of data generated in school quickly and efficiently.At the same time,for the universities participating in the fifth round of China Discipline Ranking,informatization construction can help them analyze current situation and eval-uate future development effectively.With the results of the informatization construction of a domestic university,this paper established four models to analyze various aspects of school data and draw corresponding conclusions based on some actual data since2014 and a variety of data processing and data mining methods with the results of the informatization construction of a domestic university.This model will provide colleges and universities with more data support.Make them adjust corresponding strategies bet-ter,and improve the construction level of all aspects.The specific work is as follows.(1)Prediction of the number of scientific research papers based on grey model.This paper establish a model that uses the publication history of papers to predict the publication of papers in the next year,using the school-wide paper publication entry data from 2014 to 2018.First,the original data is summarized into six parts statistical results through the methods of missing value filling and data integration.Then,prove that common time series models are not applicable to the data.Finally,Use the gray model as an algorithm to build a prediction model,and the test result is the prediction result for 2019.The gray model is very suitable for this application since the summarized data has the characteristics of conforming to the time series distribution and a small amount of data.Finally,prove the effectiveness of the model by calculating the relative error of the sequence and showing 10 randomly selected results.(2)Prediction of academic warning based on classification.This paper establish a model that uses the student’s historical performance to predict whether the student will be academic warned or not,based on the basic data,student status data and score data of the first three semesters after enrollment of all undergraduates in the 2017-2019 grade.First,preprocess the data use the methods of missing value filling,data integration and data discretization.Then,calculate the academic warning label and associate it with the preprocessed original data according to specific rules,and extract the characteristics of scores and number of courses.Next,choose ten feature combinations with analyzing the correlation between features.Finally,use data mining tool called Rapid Miner to train Gradient Boosting Decision Tree(GBDT),Artifiical Neural Network(ANN),and Na?ve Baysian Classifier(NBC)respectively,which get the training results of each feature combination.The analysis results show that the number of courses whose score are lower than 70 and some non-score features,such as the students’ specialty and culture mode,can influence the succeeded courses greatly.Moreover,compared with the GBDT and ANN models,the NBC model has a higher accuracy and recall rate,which can reach more than 90 %.(3)Analysis on the influencing factors of students’ grades performance based on multiple-strategies.This paper establish a model that analyzes the influencing factors of students’ grades performance,which use the basic data,student status data,score data of the first two semesters after enrollment,score data for one year after enrollment,and credit card data of 2018-2019 grades.First,use the method of missing value filling and data discretization to process the data.Then,extract the features that reflect the students’ grades performance level.Next,analyze the consumption level of students,especially dining consumption,is closely related to student grades performance using exploratory data analysis and data visualization methods.Finally,the Light GBM algorithm is used to analyze the relationship between the student’s birthplace and grades performance,and use the decision tree algorithm to obtain feature combinations include all the attributes and the basic information attributes that are closely related to the grades performance.In addition,this paper analyzed the relation between the grades performance in differ-ent courses of the two colleges’ student,through data preprocessing,feature extraction,correlation atrix and FP-Growth method,using the specific course performance data of the two colleges in the 2017-2019 grade.(4)Classification of students based on clustering.This paper establish a model to classify students through clustering and analyze the characteristics of students in dif-ferent grades performance categories based on the basic data,student status data,score data of the first two semesters after enrollment,score data for one year after enrollment,and credit card data of 2018-2019 grades.First of all,use data preprocessing and feature engineering methods to process data.Then,divide the students into 5 clusters according to all their characteristics using the k-means algorithm,and analyze the performance of each cluster in various continuous value attributes.Finally,the distribution of academic warning labels for each cluster is counted.Validate the results of the analysis of the influencing factors of student performance and the analysis of consumption factors,and found the characteristics of the number of times that students with different grades per-formance levels enter and exit the card,which by comparing the continuous numerical mean of all clusters especially the clusters with the highest and lowest rates of academic warnings.The model established in this paper can predict the publication of school papers and the academic warning of students’ performance,and analyze students’ grades per-formance from multiple angles.This model can help colleges and universities analyze data and make decisions effectively,and improve their information construction level.
Keywords/Search Tags:Big data, data mining, prediction of the number of scientific research papers, academic warning, the influencing factors of students’ grades performance
PDF Full Text Request
Related items