| With the development of information Technology, large amounts of data have been stored in the database, this data generally has the properties of high dimensional, large data quantity and distribution sparse, which is a large challenge for the outlier mining algorithm. Most of traditional outlier mining methods identify outliers from a global point of view, which is inappropriate for high dimensional and large data sets. In this paper, an outlier mining algorithm based on the subspace is presented for local outliers in low dimensional subspace by adopting attribute relevance analysis. The main research work can be shown as follow:(1) An outlier mining algorithm is presented by taking attribute relevance analysis. Firstly, the irrelevant attributes, which are dimensions constituted from dense regions of data set, are removed from the date set by using the attribute relevance analysis, so that the data set and dimensions can be reduced effectively, and the outlier mining efficiency is improved. Secondly, sparse subspaces are searched by using particle swarm optimization based on sparsity coefficient threshold, and local outliers are identified in the sparse subspaces. In the end, experimental results validate the correctness and effectiveness of the algorithm by adopting the star spectrum data set.(2) An outlier parallel mining algorithm is presented by taking attribute relevance analysis. Firstly, main node distributes attribute relevance analysis task, then each sub-node finds out irrelevant attributes of data set in parallel, and these attributes are returned to the main node. The irrelevant attributes are removed by the main node. Secondly, the main node assigns search task, and each sub-node takes particle swarm optimization algorithm to search local outlier spaces in parallel .The main node works out the outlier spaces to establish the global outliers. In the end, the experimental results validate accuracy and effectiveness of the algorithm by using star spectrum data set in parallel computing environment. (3) On the basis of above, the outliers mining system for star spectra data based on attribute relevance analysis are designed and realized by using VC++6.0 and Oracle 9i as development tools. The experimental results show that the outliers mining by the system are feasible and valuable for mining star spectra outliers. |