| With the rapid development of gene chip technology,we can quickly and accurately obtain tumor gene expression profile data.The feature selection and sample classification are two basic problems of tumor classification based on gene expression profile data.It can provide powerful tool for early diagnosis of tumor and research from the molecular level by the analysis of these data.Recently,the classification technologies based on sparse representation have received increasing attention.However,there remains some problems in sparse representation based classification:(1)high dependence on sufficient training samples;(2)ignorance of information embedded in test samples;(3)instability of small disturbance on reconstruction error.Moreover,design an efficient gene selection method which has biological significance simultaneously is the current development trend.Focus on above problems,in this paper,we does following research work:On one hand,an inverse projection representation(IPR)based tumor classification(IPRC)is presented,whose feasibility and stability are proved theoretically,further.Firstly,a new inverse projection representation model which digs the information embedded in test samples is constructed for alleviating influence of the number of training samples;then,in order to match IPR for completing classification,a novel classification criterion——category contribution rate(CCR)is proposed;at last,a new statistical index——classification stability index(CSI),which is used to quantify stability of different classification criteria,is defined.On the other hand,based on the previous work,a tumor classification method which combines two-stage hybrid gene selection and IPR model is proposed.In two-stage hybrid gene selection method,the first stage is to synthesize BW,SNR and F test for primary selection;the second stage,statistical Lasso method is used to re-select primary information genes and then obtain the candidate’s pathogenic genes.Further,the two stage hybrid selection is combined with IPR model to complete classification.In experimental section,as for the first work,the effectiveness of IPR for small sample size is verified,and then the stability of criterion based on CCR is performed by CSI.Finally,the robustness of IPRC is analyzed.As for the second work,first of all,the necessity of gene selection and the feasibility of Lasso areverified,and then high efficiency of two-stage hybrid method are tested by the visual projection distribution maps based on principal component analysis and classification performance in different selection stage.It is worth mentioning that with the help of this hybrid method,candidate’s pathogenic genes and their biological analysis are given. |