Font Size: a A A

Depth Function Based Statistical Method For Establishing Multivariate Reference Ranges

Posted on:2006-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:F B XueFull Text:PDF
GTID:1104360152996098Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Reference ranges are very important tools in data analysis and decision-making aid in medical field. Methods for establishing univariate reference ranges have been well developed and have been playing an important role in data analysis in medical field. Besides univariate data, multivariate data are other common data types in this field, which also need dealing with through certain kinds of reference ranges. Therefore, statistical method for establishing multivariate reference range is also an indispensable tool for data analysis in medical field.Among those currently used for establishing multivariate reference ranges, multivariate normal distribution method, which is based on mature theory and can be easily used, is the widely adopted and commonly utilized one. Multivariate normal distribution method, however, requires that the data to be dealt with have or can be transformed into a multivariate normal distribution. This kind of requirement, however, often can not be satisfied for multivariate data from medical field. Such the applicative scope of multivariate normal distribution method is extremely limited. Nowadays, a common way for establishing multivariate reference range is multiple utilizations of univariate reference ranges. Multiple utilizations of univariate reference ranges, however, can only be used effectively on limited occasions, i.e. it can only be used for those data types with relatively minor associations between variables in certain multivariate data because of its disability of dealing with the correlations between variables.In the results of computer simulation experiments and practical data analysis, we found that multiple utilizations of univariate reference ranges can make sense on occasions that multivariate data have minor correlations between different variables. But it is not axactly a valid method for establishing multivariate reference ranges because it can not meet some basic requirements for establishing multivariate reference ranges. As a standard method for data with multivariate normal distribution, multivariate normal distribution method appears to be reasonable and valid in many aspects and can be a standard competitor in finding new methods for establishing multivariate reference ranges.As a key step in establishing multivariate reference range, dimension reducing process is the most important problem which should be solved in constructing a method for establishing multivariate reference ranges. The concept of statistical depth function popularized in mathematical field in recent years supply us a completely new route to solve such kind of problems. Because of their nonparametric features, statistical depth functions make it possible to construct a certain kind of methods for establishing multivariate reference ranges with wider applicative scopes. After screening with some certain requirements, Mahalanobis depth function, abbreviated as MHD, was selected in this research to be the basic part of the new method for establishing multivariate reference ranges. For improving robustness, revisions were made to Mahalanobis depth function and thus two new depth functions, namely MDS and MDM, were brought forth to construct other two new methods at the same time.Through computer simulation experiments and practical data analysis, we have found that all three types of statistical depth have positively skewed distribution for almost all multivariate data types envolved in this research. To keep new methods as robust as possible, we decided to use nonparametric method, namely percentile method, to establish the univariate reference ranges for statistical depth obtained from original multivariate data through three types of transformation using three types of depth functions. Such the whole structures of new methods were constructed and then we made some principles for utilizing them. The structure of the new method constructed in this research can be briefly described as a process consists of following steps. Firstly,multivariate data of reference sample are transformed into statistical depth, i.e. a univariate variable, using certain depth function. Such the deminsion reducing process is accomplished. Secondly, Univariate reference range is established for obtained statistical depth using percentile method. Thirdly, observations in reference sample are classified into two groups, namely nominal normal and nominal abnormal, according to the univariate reference range of statistical depth. Proportion of normal observations can be calculated to validate the actual coverage of the reference range. At last, for the classification of new observations, we obtain the statistical depths of the new observations corresponding to the location parameter and variant parameter of reference sample and then compare them to the reference range established above to see if they are in the reference range or not. The new methods using three types of statistical depth functions mentioned above, which are different only in the demision reducing process, are named as MHD method, MDS method and MDM method.We compared the new methods and multivariate normal distribution method in different aspects through computer simulation experiments and practical data analysis. The results showed that the false positive ratios for the new methods are more accurate than that for multivariate normal distribution method in most cases. For the data produced through computer simulation, actual false positive ratios for three new methods fluctuate around and exactly average out to the expected levels despite the types of multivariate distributions, and for the practical data, there are small differences between actual false positive ratios and the expected ones in most cases. As for the multivariate normal distribution method, the false positive ratios fluctuate around and exactly average out to the expected levels only for data with multivariate normal distributions. For data with non multivariate normal distributions, however, the agerage levels of false positive ratios are much different from the expected ones. Thus, there will be certain systematic errors in the multivariate reference ranges established with multivariate normal distribution method in the case of non multivariate normal distribution data. In the aspect of distinguishing abilities, three new methods have almost the same performances as that of multivariate normal distribution method, and in some situations they appear to be superior to multivariate normal distribution method. The corresponding geometric shapes of the bivariate reference ranges established with three new methods appear to be nearlyelliptical and it shows that the new methods satisfy the basic requirements for methods for establishing multivariate reference ranges. For the data of multivariate normal distribution, multivariate reference ranges established with three new methods are highly consistent with that established with multivariate normal distribution method, e.g. the consistency rates are always higher than 98% in iterative tests in computer simulation experiments. All the results metioned above show that three new methods are all effective and valid in establishing multivariate reference ranges. Result of practical data analysis for screening of soldier candidates showed that new methods proposed in this research can be effective methods for practical jobs. For jobs without any special requirements for multivariate reference ranges, new methods are completely competent in accomplishing them. For certain special jobs such as those involving one-sided reference ranges for certain variables, new methods at least can be used simultaneously with the methods currently used to improve the effectiveness of the outcomes.In the results of comparing and analyzing processes, we have not found any distinct differences between three new methods in all those interested aspects. Although there were some trends of differences between three new methods in results of computer simulation experiments, the absolute differences were too minor to be concerned. In results of practical data analysis, not only the differences between three new methods were minor, but none of the three methods was always relatively superior to or inferior to others. These results show that the three new methods constructed in this research have similar performances and may have their own advantages and disadvantages in different aspects. The data types involved in this research, including those from computer simulation and practical jobs, are far from covering the whole scope of multivariate data types in medical field because of the limitation of research magnitude. Therefore, the results and conclusions are not perfectly representative for all multivariate data types in medical field. This also might be one of the reasons why we have not found distinct differences between three new methods and thus could not evaluate the effects of revisions made to Mahalanobis depth function. To obtain more reliable results and some advice for improving the performances of the new methods in future, we will make much more effort to collect practical data of more multivariate data types and carry out the research in a wider scope of aspects.
Keywords/Search Tags:multivariate reference range, computer simulation experiment, depth function, statistical method, multivariate normal distribution
PDF Full Text Request
Related items