Font Size: a A A

Study About The Identification Method Of Multi-dimensional Data's Abnormal Points

Posted on:2011-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X GaoFull Text:PDF
GTID:2132330338478886Subject:Humanities and sociology
Abstract/Summary:PDF Full Text Request
The quality of statistical data has been disturbing the statistical circle all the time. This is a problem that the government and all circles in the society are all concerned about. We need a quantitative method and means that can inspect and audit the reliability and accuracy of the data, it also can distinguishing the abnormal data.It occasionally appears some abnormal and illogical statistical results during the process of data. The appearance of the results is caused by abnormal data that included in processed data. But the identification methods of abnormal data that is provided in present surveying, they all aim to One-dimensional data. That is to say, they only aim to single measuring norm to check the data's inspection How ever, we always feel that it can not find all the abnormities in measurement data if we only use single measuring norm in statistical practice.This text by means of documentary, method of comparative analysis and exploratory experiment in order to grope the method to solve this kind of problem and improve the reliability and accuracy of the statistical data. Based on the analysis to the identification method of common abnormal data, we try to find an identification method of abnormal data that can be applied to multi-index; this method should provide convenient auxiliary means to data's analysis. Set up one identification method for abnormal points. First, we should try to find a quantitative index that it can show the relationship of point and point under multi-index and it is easy to use. If one point is an abnormal point in samples, it must stay away from other points in these samples. Turn over to say, the greater the distance with most of the points, the greater the average distance. On the contrary, if one point is normal value, then it must close to some points in these samples. The smaller the distance, the average distance is not great. Based on this kind of thought, this text tries to use the average distance between points and points as a method to distinguish the samples'abnormal value in Multi-dimensional norm. And we should have a try for Euclidean distance, Mahalanhois distance and Oblique space distance separately.Using distance to set up the identification method of abnormal points in multi-index, it is only a try though, the effect is better in simple and intuitive Two-dimensional data. It has a better representation in Multi-dimensional, it worth to studying deeply and continuing to improve because of the lack of intuitive and effective evidence. For this example, the differences that the three kinds of distance's results are not very big, but from the difference's analysis of distance's definition, Mahalanhois distance is much superior to Euclidean distance and Oblique space distance. This method is not perfect at the moment, it still have many problems that need to continuous improvement in further studies.
Keywords/Search Tags:Abnormal data, Multi-dimensional index signs, Examine a method, Distance
PDF Full Text Request
Related items