| Organic compounds widely exist in nature and are closely related to human life.The structure elucidation of organic compounds to determine their molecular structure has extremely important application value in the fields of organic chemistry,metabolic analysis,and drug synthesis.At present,mass spectra,infrared spectra and Nuclear Magnetic Resonance(NMR)spectra are mainly used to elucidate the molecular struc-ture of organic compounds,in which the key step is to identify the molecular structural fragments(i.e.,functional group)of corresponding absorption peaks in the spectra accu-rately and efficiently.Therefore,the identification of functional groups in NMR spectra of organic compounds has attracted wide attention of researchers.In the field of computer-assisted structure elucidation of organic compounds,NMR spectra are widely used due to the advantages of high reproducibility,simple sample preparation process,and reusable samples,etc.At present,the classification of func-tional groups is mainly inferred by the position and area of absorption peaks in NMR spectra.However,due to the large amount of NMR spectral data,the traditional method of integrating absorption peaks manually has high requirements for spectrum elucida-tion of researchers,which consumes a lot of time and energy and has a low elucidation accuracy.Therefore,exploring the automatic identification method of functional groups is helpful to improve the efficiency and accuracy of organic compound structure eluci-dation by NMR spectra.In this thesis,based on NMR spectra,the following studies are mainly carried out,focusing on the identification of functional groups in organic compounds:(1)In a one-dimensional Convolutional Neural Network(CNN),a Long Short-Term Memory Network(LSTM)and a Temporal Convolutional Network(TCN)are introduced to identify the functional groups of NMR spectra,which has achieved good results.Through the comparison of different algorithms,it is shown that using CNN and TCN at the same time can extract more complete features of coupled splited peaks than using CNN alone,thereby improving the model’s functional groups identification effect.In addition,in previous studies,functional groups were usually identified only by NMR hydrogen spectra.In this thesis,NMR hydrogen spectra and NMR carbon spectra are fused as training data,and the recognition effect of the proposed model outperforms that of single type spectrum.The F1 score of CNN-Bi LSTM model reaches95.75%,and the CNN-BOTCN model proposed in this thesis improves the F1 score in the NMR carbon spectra functional groups identification by 6.66 percentage points to94.53%.Meanwhile,CNN-BOTCN obtains the best performance of functional group identification on the fusion data of NMR hydrogen spectra and carbon spectra,with the F1 score of 97.54%.(2)By using Mest Re Nova software,72942 NMR spectra have been generated to build a simulated NMR spectra dataset,and the functional groups information contained in the dataset has been automatically labeled.(3)A sorting-based labeling method of functional groups is designed,which effec-tively solves the problems of functional groups inclusion~1and substructure overlap~2in the molecular structure of organic compounds.(4)Using the CNN-BOTCN model proposed in this thesis,the F1 score for func-tional groups identification in the real NMR spectra dataset is 87.67%,which shows the effectiveness of the model.The method proposed in this thesis can achieve good recognition results,and has a certain reference value and promotion effect for the further application of Computer-Assisted Structure Elucidation in the research of organic compound structure.It can also be used as a basis for future research. |