| Property classification of Molecular is the main problem in the field of molecular structure and property association modeling,and by establishing the association model between molecular structure and its corresponding physical and chemical properties,it can better mine molecular structure information and accelerate the development and application of molecules or drugs,thereby greatly reducing the research and development cost of new drugs.From the perspective of deep learning modeling,especially from the modeling of natural language processing,this thesis proposes a XLNet_Bi LSTM_CNN_Attention molecular property classification model,which can effectively mine molecular structure information and improve the prediction accuracy of molecular property classification.Through a large number of experimental comparisons,the classification accuracy of the model is generally better than the molecular property classification model under the BERT pretraining condition,and its classification effect is improved by about 1% on average,and the accuracy index of the model is 81.16% and the F1 index is 79.61% on the four sets of datasets(unstructured SMILES text data).The ablation experiment further verifies that the fusion between the models can better improve the classification accuracy,and compared with the basic XLNet_FC classification model,the model improves the accuracy index by 3.38%.In addition,by constructing a Gibbs-SVM control variable sampling model and extracting some similar molecules on dataset 1(structured molecule descriptor data),this batch of molecules is clustered.The experimental results show that,on the one hand,the Gibbs-SVM model constructed in this paper can better obtain the optimal variation range of the corresponding descriptors of molecules,and at the same time,there is a high similarity between the molecules screened in this range.On the other hand,under the conditions of deep learning model,the dimensionality reduction and clustering analysis of batch molecular word embedding vectors shows that it can also achieve a highly similar clustering effect as that of Gibbs-SVM model,which indicates that the molecular embedding feature vectors learned by the pre-trained model have good molecular feature representation ability,and can replace molecular descriptor data to a considerable extent to complete molecular clustering,classification and other tasks.At the same time,the experimental results of classification models under different word segmentation levels show that the experimental results of atomic-level word segmentation granularity are better than those at the functional group level,and the model accuracy index is improved by 4.6%,which indicates that the model can better capture the interaction between atoms.Therefore,different data granularity has different model learning difficulty for the same model.Based on the above,the proposed model has good experimental effects in molecular property classification tasks,and is suitable for molecular property classification tasks. |