| Mastering the behavioral patterns and characteristics of students is an important channel for universities to improve their educational effectiveness.In the context of modern society,the level of information technology in universities has been increasing year by year,storing the entire training data of students from enrollment to graduation.These datasets are collected,stored,and kept by different departments,containing a large amount of regular and related information,truly recording the growth footprints of students.Traditional educational decision-making has not yet fully explored and utilized the valuable information hidden in data resources.Although some scholars have conducted research on campus data mining,there are still many unresolved issues in the application of decision-making in universities.This thesis is based on the idea of "data-driven decision-making" and combines the data characteristics of campus big data to establish a model solution for student behavior analysis and prediction by applying multiple machine learning algorithms.Furthermore,focusing on the application environment of data-driven decision-making,combined with the theories of evidence-based analysis and behavioral analysis,the interpretability analysis of model algorithms is explored.Finally,from the perspective of decision-making optimization,effective countermeasures and suggestions are proposed for the precise management and scientific decision-making of universities.The main research contents and results of this thesis can be summarized as follows:First,the thesis collected data on the growth process of approximately18000 undergraduate students in three consecutive grades,constructed a multisource heterogeneous education dataset,and applied advanced machine learning methods based on this dataset to provide an analysis of the process of "basic information dimension--training process dimension--graduation behavior dimension" from the perspective of educational data mining.Second,the thesis analyzed the academic behavior performance of students in the context of multiple categories of educational data.A clustering analysis framework for processing multi type campus big data was proposed,and the clustering results were described using group features.By introducing the KPrototype algorithm,the problem of traditional clustering algorithms(such as K-Means)not being able to adapt to multiple categories of educational data attributes has been effectively solved.Research has found that students can be divided into three groups: high academic performance,medium academic performance,and low academic performance.Furthermore,analyzing the characteristics of the three groups of students,it was found that there is little correlation between the high and low academic performance of students during their university years and their admission scores in the college entrance examination;there is a strong trend of maintaining students’ academic performance from their first year of college;the academic performance of students during their university years is not necessarily related to their family economic status.Third,the thesis constructed an interpretable employment prediction model and achieved a good analysis of driving factors for students’ employment choices.The employment prediction model is constructed using Bayesian parameter optimization and XGBoost algorithm,and the prediction performance is good(F1 value is 0.872,better than alternative random forest,SVM and other models).Then the SHAP method is used to conduct interpretability analysis on the model.The results indicate that students with high grades in their enrollment have stronger employment advantages and are less likely to face problems such as delayed graduation or employment difficulties;students who receive high total scholarships during their university years tend to pursue further education in China after graduation;students with average academic performance tend to seek employment directly after graduating from undergraduate studies;there is a negative correlation between factors such as rural graduates and family economic difficulties for further education abroad.Fourth,a heterogeneous integration method was used to construct a default repayment prediction model for students,which achieved good prediction of their post loan default repayment risk through their school behavior.Firstly,the SMOTE-ENN method was used to balance the data,solving the problem of imbalanced real sample data.Then,the four models of XGBoost Light GBM CATBoost random forest are used for Voting integration,and the results of Stacking experiment are compared to get the best integration model.Then,the SHAP method is used to analyze the interpretability of the model.The results indicate that there is a significant negative correlation between the total amount of scholarships obtained during college,academic performance during college,and college entrance examination scores,as well as students’ post loan default situations;The impact of the types of candidates and the status of students receiving school level honors on the risk of student default cannot be ignored.Fifth,the thesis summarizes the students’ behavior patterns,and provide suggestions for guiding their behavior.The thesis conducts evidence-based analysis based on data mining results,combined with decision-making practice experience,summarize the general laws of student growth and development and typical characteristics of different student groups,and provide strategies and suggestions for decision-making optimization.The main contributions of this thesis are summarized as follows:(1)This thesis constructs a multi-source heterogeneous education dataset,providing a process analysis of "basic information dimension--training process dimension--graduation behavior dimension" from the perspective of education data mining.(2)This thesis proposes a clustering framework for analyzing students’ academic performance in the context of multiple types of campus data,and analyzes the characteristics of student groups.(3)This thesis uses interpretable machine learning methods to explore the influencing factors of students’ employment choices and provide corresponding decision-making suggestions.(4)This thesis uses heterogeneous integration methods to explore the influencing factors of students’ default repayment behavior after loans,and accordingly proposes decision-making suggestions for student education management.(5)This thesis is based on the idea of "data prediction decision",which innovates the mode and method of educational decision-making and promotes the application research of big data science in the field of education. |