| With the development of the times and the improvement and innovation of data collection methods,ultrahigh dimensional data is becoming more and more familiar.Unlike traditional low-dimensional data,the processing of ultrahigh dimensional data has been a challenge due to the large amount of data and its sparsity.In this paper,feature screening methods are proposed for ultrahigh dimensional binary and multicategorical data,respectively,and their large-sample theoretical and finite-sample properties are verified by theoretical proofs and numerical simulations.The details of the paper are as follows.The first chapter systematically introduces the theoretical background and practical significance of this paper,details the research related to high dimensional data variable selection,ultrahigh dimensional data feature screening,discriminant analysis and semi-supervised learning at home and abroad,and introduces the overall research content and the innovation points of each content in sub-chapters in a hierarchical manner.The second chapter proposes a feature screening method LDA-SIS for ultrahigh dimensional binary data.Linear discriminant analysis(LDA)is one of the most widely used methods in discriminant classification and pattern recognition.However,as the dimensionality of the collected data becomes too high,it leads to the failure of LDA.We propose a screening procedure based on Fisher’s linear projection and marginal score test for ultrahigh dimensional binary classification problems and demonstrate its deterministic screening properties,ensuring that important features are retained.The finite sample nature of the proposed screening process was evaluated by a Monte Carlo simulation study and a real data example.Since the LDA-SIS proposed in the second chapter does not work well in the multiclassification case,the conditional interval quantile feature screening method WQ-SIS is proposed in the third chapter to deal with the multiclassification feature screening problem by combining the conditional quantile,interval quantile idea and higher power idea of Y given the predictor variable X.The method eliminates the effect of perturbations at the specified quantile on the accuracy of screening results and the limitations imposed by squaring.This chapter demonstrates the deterministic screening property of WQ-SIS and verifies its finite sample property through numerical simulations and specific example analysis.The fourth chapter combines semi-supervised learning techniques to propose a feature screening method SSMV-SIS for ultrahigh dimensional data under semi-supervised learning for ultrahigh dimensional data where only a small number of response variables have labels and the number of unlabeled data is much larger than the number of labeled data.This method innovatively combines ultrahigh dimensional data feature screening with semi-supervised learning in machine learning,and explores the method of ultrahigh dimensional data feature screening under semi-supervised learning,which can effectively utilize a large amount of unlabeled data information and satisfy the determination of screening properties,and illustrates its feasibility and application prospects by combining Monte Carlo simulation and microblog comment analysis.The fifth chapter provides a systematic and detailed summary of the research content and ideological approach of this paper. |