Font Size: a A A

Integrative Sliced Inverse Regression And Its Application

Posted on:2022-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:H H XieFull Text:PDF
GTID:2530306326474544Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data and the rapid development of science and technology,data collection and storage have been easier and more convenient.The highdimensional characteristics of the datasets we collected are more and more prominent.And multiple datasets collected from different sources are available for the same or similar research.How to reduce the dimension of multiple high-dimensional datasets from different sources,especially when the sample size of a single dataset is small,how to employ the information of multiple datasets from different sources as much as possible is a difficulty and challenge in data analysis.Integrative analysis is an effective approach to deal with multiple datasets simultaneously,and avoid the instability of the model caused by the small sample size of a single dataset or the difference of data sources.Therefore,this thesis will propose an integrative analysis method to simultaneously reduce the dimension of multiple datasets.In order to retain more information to explain the changes of response variables in dimension reduction process,this thesis will consider the integration of sliced inverse regression(SIR),which is one of the sufficient dimension reduction methods.This thesis will use sliced inverse regression method to do sufficient dimension reduction of multiple datasets simultaneously.In order to realize the information’borrowing’ and ’sharing’ among datasets,this thesis consider that the central dimension reduction subspaces of multiple datasets have a certain degree of similarity in the integrative analysis,the ’similarity’ should be measured by the projection distance of two central dimension reduction spaces,and controlled by tuning parameter.The penalty function is proposed by limiting the projection distance of central dimension reduction subspaces to be close enough,which gives similar central dimension reduction subspaces.The gradient descent method based on curvilinear search,that is,the adaptive stochastic gradient descent algorithm for ensuring the orthogonal constraints is used for solving the proposed model.Then,the similar dimension reduction subspaces will be derived directly and simultaneously.The proposed approach requires sequential χ2 test to choose the best dimension of central dimension reduction subspaces.Finally,the performance of the proposed method is evaluated by simulation experiments and empirical data analysis of breast cancer data from TCGA.The results show that when the central dimension reduction subspaces of multiple datasets are similar,that is,the projection distance is small,and the data is relatively scarce,that is,the number of samples is small,the sliced inverse regression method based on multiple datasets fusion performs better than the independent sliced inverse regression method based on single dataset and the method based on solving the same central dimension reduction subspace for all datasets.In particular,the correlation loss is smaller after dimension reduction.
Keywords/Search Tags:Sufficient Dimension Reduction, Sliced Inverse Regression, Central Dimension Reduction Subspace, Integrative Analysis
PDF Full Text Request
Related items