The Research Of Inner Network Behavior Analysis Based On Spark

Posted on:2018-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:B S Li

Full Text:PDF

GTID:2348330542965261

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Intranet trust mechanism assumes that in the default institutions the relevant personnel who get in touch with the network are safe and trustworthy.However,it is normal for an insititution that the foreign staff visit to do some work on the user operation with computer,which is one of the insecure elements of the network.The intranet users are the main groups in the network,whose activities are flexible and difficult to predict,and many security incidents are caused by the illegal operation of intranet users.There are few constraints that limit the internal users’ behaviors so far.In order to identify threats in a large number of user operation logs effectively,we need the power of Big Data Computation to analyse network behaviors,rather than only rely on intranet trust.At present,the relevant algorithms based on decision tree in Spark platform,for example,are only Decision Tree,Random Forest and Gradient Boosting Decision Tree.The Decision Tree itself has the shortcoming which is easy to overfit,so it’s not applicable to the Intranet defense.Although the Random Forest can take full advantage of the parallel computing capacity in the actual operation of the Spark calculation,the complexity of the algorithm is still high under the premise in the pursuit of rapid convergence of the model.The Random Gradient Decision Tree has a complete mathematical theory support,but the dependency among the training data sets cannot give full play to the parallel performance in the distributed computing.This paper puts forward the Frequency of Eigen(Eigen Frequency)、the Frequency of Forest(Forest Frequency)and the Pseudo Boosting Decision Tree Algorithm(PBDT),according to the integration methods relevant to Decesion Tree and Combining with TFIDF algorithm idea.What’s more,the paper solves the problem that GBDT with the increasing of the iterations,whose wrong data could be marginalized.In PBDT,all decision trees are created based on original data set,respectively.It is unnecessary to sample data sets within each iteration,which contributes to the full use of the parallel performance in distributed computing.This paper also carries on the related experiment about Intranet defense on the proposed method on the distributed clusters.A series of different experimental results of RF algorithm and PBDT algorithm are obtained by changing the number of iterations and the scale of the training data set.It is indicated that the PBDT algorithm has better prediction accuracy in a certain scale training set.

Keywords/Search Tags:

PBDT, distributed cluster Spark, Inner Network Defense

PDF Full Text Request

Related items

1	A High-Performance Chinese Distributed Computing System (CH-Spark)
2	The Design And Implementation Of Honeypot System Based On Spark
3	Design And Implementation Of Big Data Resource Sharing Platform Based On Spark
4	Research On Data Stream Clustering Method Based On Spark
5	The Research Of Load Comprehensive Evaluation And Dynamic Resource Scheduling For Spark Cluster
6	Design And Implementation Of Distributed Network Intrusion Detection System Based On Spark
7	Research On Spark Oriented Fuzzy C-means Clustering Algorithm
8	Distributed Active Defense System
9	Research On Spark Shuffle Process Performance Optimization
10	Research On Big Data Distributed Storage Technology Based On Spark