Font Size: a A A

Research On HBase-Based Abnormal Log Analysis And Related Algorithms

Posted on:2017-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q X MengFull Text:PDF
GTID:2308330485484504Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are many protective measures to deal with attacks and invasions from external network,but we are lack of means to detect threats from the users from the internal system. Especially in the army forces,public security,finance fields, unauthorized access,information theft,security problems from the system have been the important threat to the security work.System log can record all events of the system completely and can be used to analyze the system’s internal security situation. Most of the current study of log analysis is security audit technology based on expert systems which can be used to analysis abnormal behavior Under the condition of small amount of data,but the technology can not effectively deal with the problem of huge amounts of data processing and hidden relationship mining.The purpose of this study is to realize massive log data collection,storage, analysis and mining the potential rule of log information through combining data mining technology and distributed computing technology to find system security holes and abnormal behavior of internal users.The thesis analyzes the advantages of Hadoop and NoSQL database in the field of huge amount of data processing and compares data mining techniques and log analysis methods. We choose the Apriori algorithm to improve on the basis of the requirements of the study. The Apriori algorithm needs frequent scanning database and will generate too many candidate frequent item sets. In order to solve these problems,the thesis proposes a method to translate the horizontal data structure into vertical structure and optimize the data storage format, which can effectively reduce the number of candidate frequent item sets and improve the algorithm efficiency.The thesis designs a prototype system of log analysis based on distributed platforms- Hadoop and HBase which include functions such as distributed logging acquisition and processing, mass data storage,log rules mining,abnormal behavior discovery and so on. The system can help security management find abnormal behavior within the system more timely and accurately through three types of rules matching, namely attribute characteristics,time and location limit,and sequencing based on statistics.The tests on an experimental platform tell us that the efficiency of the algorithm on log data mining is improved by about 20%.
Keywords/Search Tags:HBase, abnormal log analysis, Apriori algorithm, data mining
PDF Full Text Request
Related items