Font Size: a A A

Research On The Application Of Network Log In Student's Performance Prediction

Posted on:2018-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q L SunFull Text:PDF
GTID:2347330536469192Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big Data technology has been deeply integrated with various industries during its development.Educational data mining can be regarded as the application of data mining techniques in education.Educational data mining concerns how to use the technology in computer science,statistics,psychology and education domains to solve practical problems in education and teaching.One of the popular research domains is student performance prediction.This study belongs to the research of the performance prediction sub-module of "Chongqing University student behavior analysis platform".The purpose of the module is to use the network logs and previous grades to predict student academic achievement.In the past,model was always designed for prediction of certain course.This thesis attempts to propose a way to predict whether student would fail in the exam with no such limitation.In order to optimize model's performance,this paper also involves transaction records of campus card.This article focuses on the preprocessing part and feature extraction part of previous grades and network logs.First of all,the network logs were analyzed and found that the original log files have too much noise,and too many domain names.In case of these problems,two methods of judging noise data are proposed,which are by judging the resource name extension or host name of URLs.In order to solve the problem of too many domain names,URL classification library is introduced.Based on the classification library,the log records and specific web site categories are associated,which can be a base for further analysis.This article proposed a method to analyze the visit pattern difference of students on web site categories,which overcomes the short comes of frequent pattern mining algorithm while performed on records with very unbalanced distributions.It performs frequent pattern mining algorithm and k-Means clustering algorithm on web site groups divided by access support.To optimizing the performance of predicting model as well as getting to know the time students spend on online videos and games,the paper also proposed two methods of online games and video time estimating.One is estimating by heartbeats information and the other is by exploring key patterns in URLs.By performing the cleaning,transforming and feature selection steps on almost 50 billion records of net logs,previous grades,and campus cards,we got seven features.They are number of previous examination failures,previous credit grades,web category visit counts,online video and game time,count of lunch,count of breakfast and lunch,count of breakfast,lunch and supper.Seven different feature combinations were generated and be used as input features of model training with logistic regression algorithm and AdaBoost algorithm.It is proved that,compared with performance of model trained with previous examination failure counts,models with features from net logs and campus card has better performance.The Specificity is up to 74.07% and Sensitivity up to 74.67%.The GMean also improves by 22.81% compared to base model.The method proposed in this paper is not limited to the prediction for certain course performance.It can be used to predict whether student would fail in current term,which is a practice-valued method.Also,network logs used in this article is general Internet access logs,not logs of customized systems,which make the proposed method a more generalized one.
Keywords/Search Tags:EDM, Net Access Log, Performance Prediction, Data Preprocess, Online-Time Estimation
PDF Full Text Request
Related items