Scalable Mining Of Contextual Outliers Based On Relevant Subspace

Posted on:2018-06-28

Degree:Master

Type:Thesis

Country:China

Candidate:X L Yu

Full Text:PDF

GTID:2348330536468014

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As an important research content of data mining,outlier refer to the data which are inconsistent with others and have significant differences in a given dataset.With the explosive growth of data volume and data dimension,the shortcomings of traditional outlier mining algorithms are becoming more and more obvious,and they are difficult to be adapt to massive and high dimensional data processing.Traditional outlier mining methods focus on the efficiency and precision,but the interpretability and comprehensibility of the mining results are rarely addressed.In this thesis,a parallel contextual outlier mining algorithm has been studied by using relevant subspace.The main research works are as follows:(1)A contextual outlier data mining algorithm based on MapReduce programming model is proposed.Firstly,the relevant subspace of the data object is determined by the local sparse difference degree,and the outlier factor of the data object is calculated in this relevant subspace.And,the outlier factor and the correlation attribute dimension set in the relevant subspace are defined as contextual information.Secondly,selecting N data objects with the largest outlier factor as the contextual outlier.Thirdly,a parallel outlier mining algorithm is implemented by using MapReduce programming model.Finally,the experimental results verified that contextual information could improve the interpretability and comprehensibility of the outlier on the UCI dataset.(2)A contextual outlier data mining algorithm based on relevant subspace is proposed by using in-memory computing platform Spark.The KNN,the matrix of local sparse degree and local sparse difference degree are cached in memory by using RDD,which impoved the efficiency on outlier mining and reduced I/O cost.The experimental results verified the scalability and extensibility of the algorithm on the stellar spectral dataset.

Keywords/Search Tags:

Outlier, Contextual Information, Relevant Subspace, Comprehensibility, In-memory Computing

PDF Full Text Request

Related items

1	Contextal Outlier Mining And Parallelization Based On Weighted Probability Density
2	Based On Information Entropy And The Subspace Outlier Mining Algorithm
3	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
4	Research On Local Outlier Detection Algorithm Based On Subspace
5	Research On Outlier Detection Algorithm For High Dimensional Big Data
6	Research On Outlier Mining Algorithms Based On Subspace And Its Application
7	Outlier Mining Method Based On Gini Indexes And Sub-space Research
8	Research And Application Of Outlier Detection Algorithm Based On Subspace
9	Study On Outlier Detection In Subspace
10	Outlier Detection Methods For Complex Data Types