Mass spectrometry data is a type of data that describes the mass-to-charge ratio and relative intensity of chemical compounds,and serum mass spectrometry data can provide crucial information for medical diagnosis.Existing methods for mass spectrometry data analysis mainly focus on single-label classification methods,but these methods ignore the correlations between compounds and the existence of multiple features,while the introduction of multi-label learning can effectively address these issues and improve the accuracy and efficiency of mass spectrometry data analysis by utilizing multiple features in the data.Therefore,this paper applies multi-label learning to mass spectrometry data,taking into account the characteristics of mass spectrometry data,and focuses on research and improvement in the following aspects:(1)In order to apply multi-label learning to mass spectrometry data analysis,this paper combines mass spectrometry data with physical examination data to construct a multi-label mass spectrometry dataset.The construction of this dataset is divided into two key parts: feature set and label set.The feature set is obtained by constructing features for non-regular mass spectrometry data,including row-column transposition,interval selection,and filling values;the label set is obtained from physical examination data,including label selection and division.The impact of different processing methods on the classification performance of mass spectrometry data was experimentally compared using a multi-label model,and the optimal processing scheme was obtained.(2)In response to the data imbalance problem commonly found in multi-label and medical data,this paper uses the Multi-Label Synthetic Minority Over-sampling Technique(MLSMOTE)algorithm to resample the data.By redistributing the number of samples in the label space,MLSMOTE balances the number of samples for each label,avoiding the impact of data imbalance and improving the recall rate for imbalanced labels in the mass spectrometry data.(3)For high-dimensional mass spectrometry data,multi-label classification often faces the curse of dimensionality,where algorithms are vulnerable to the influence of redundant features,resulting in reduced classification accuracy.This paper proposes two feature selection evaluation criteria,namely mean and standard deviation,to remove redundant features and effectively reduce the dimensionality of the feature space,while improving the classification accuracy of multi-label classification algorithms.This makes it faster and more accurate to process high-dimensional mass spectrometry data.(4)This paper proposes a multi-label classification-based mass spectrometry data classification model and applies it to serum mass spectrometry data.The model converts serum mass spectrometry data into classified results of physical examination indicators,providing new ideas and methods for the application of mass spectrometry data,and providing new possibilities for the classification of physical examination indicators and the identification of abnormal indicators.This study proposes a new multi-label mass spectrometry data classification model,which can achieve effective multi-label classification of mass spectrometry data while considering both classification accuracy and recall and reducing computational complexity with good stability.The performance of the model was validated on serum mass spectrometry data,and compared to the original multi-label model structure,the proposed multi-label mass spectrometry data classification model can achieve higher accuracy classification by incorporating the characteristics of mass spectrometry data.This research has certain theoretical significance and practical application value for the further development of mass spectrometry data classification field. |