Font Size: a A A

The EAST Experimental Data Access Log Analysis System Based On Big Data Technology

Posted on:2020-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q H ZhangFull Text:PDF
GTID:2428330575966253Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the EAST device experiment,the total amount of experimental data generated is increasing.Most of the experimental data is stored in MDSplus,and the total amount of data of MDSplus has reached the PB level.In order to facilitate the experimenter to regulate the user behavior and experimental data on the MDSplus server,it is necessary to effectively monitor the MDSplus experimental data.An MDSplus experimental data access log analysis system needs to be designed based on the existing MDSplus server.The system mainly includes a log improvement module,an offline processing module,a real-time processing module,and a monitoring data display module.The existing MDSplus management of data does not contain comprehensive log information.Therefore,the system needs to improve the MDSplus log module,record the user and data access information in real time,and the log can be cut and dumped according to time according to the Logrotate mechanism.The generated large amount of log information needs to be backed up to the cloud server in time,and the offline calculation of the massive log information is performed,thereby obtaining user behavior and data information data that change according to the time period.Offline log data calculations use the highly available Hadoop technology which is the traditional big data framework.During the experiment,the calculation of a single offline data cannot provide timely MDSplus server status information,including but not limited to server inbound and outbound traffic information.Based on this information,the server can determine the load status and whether there is abnormal access.The analysis of real-time logs is also essential.The existing log real-time analysis system is based on the SparkStreaming real-time data computing model in the Spark ecosystem.The system also uses key technologies such as Flume and Kafka for log monitor-ing,aggregation,and distribution,making it possible to process MDSplus massive log data and process thousands of unprocessed MDSplus log messages at the second level.The entire EAST experimental data access log analysis system is developed based on the Linux environment,and uses WEB front-end technology to display monitoring in-formation.The system has been able to be used in experiments and meet the design requirements.
Keywords/Search Tags:MDSplus, Logrotate, HadoopMR, SparkStreaming, Flume, Kafka
PDF Full Text Request
Related items