Font Size: a A A

The Interpretation Of Organic Compounds Based On Support Vector Machine

Posted on:2008-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Y FengFull Text:PDF
GTID:2121360242963937Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
An important trend of the development of technique is the informationization of science and techniques. Historically, the accumulated collection of the scientific data always results in the discovery of important scientific rules. This provides the opportunity to mine the data of chemometrics. With the bigger amount of the infrared spectra database, the deeper development of the infrared technology and of the computer, it is urgent to find a solution about how to utilize and enlarge the application of infrared spectra. Along with the computerization of the commercialized infrared spectrometry, there are many computer- assisted interpretation of infrared spectra emerged. The automatic structure elucidation of infrared spectra generally falls into three groups: library search, knowledge-based systems, or pattern recognition. Among the last group of method, artificial neural networks (ANNs) and partial least squares (PLS) were most frequently used. Automatic interpretation of infrared spectra by using pattern recognition techniques such as artificial neural networks has dominant focus on specifically sub-structure prediction. The whole organic compounds and absorption bands of compounds are ignored on classification. This paper tried to discuss the rule of infrared spectra of organic compounds. Furthermore, ANNs have several major drawbacks: unsteadiness, local minima and very low speed of convergence.A recently actively used intelligence algorithm, support vector machine (SVM), is introduced to build classifiers for a hierarchical classification structure of 6352 compounds. In this system, the organic compounds were firstly separated into four classes: aromatic compounds, hydrocarbons, oxygen-contained compounds and nitrogen-contained compounds; then a detailed separation was taken on based on the characteristic of infrared spectra for each kind of compound: aromatic compounds were subdivided into four kinds on the base of the substituted types and adjacent functional groups of benzene, hydrocarbons were separated into saturated hydrocarbons and unsaturated hydrocarbons, oxygen-contained compounds were separated into four classes: hydroxyl, carbonyl, ether, carboxylic acids, nitrogen-contained compounds were comprised of aliphatic amines, aromatic amines, amides and hydrazones; in the next place, a more detailed separation were taken on for each compounds according to their characteristic absorbtion in infrerad spectra. Results from support vector machine were compared favorably with those obtained by using artificial neural networks methods. Obviously, support vector machine shows better performance.In addition, aromatic compounds were more studied by support vector machine. Five characteristic infrared absorptions are contained in aromatic compounds: C-H stretch vibration, the overtone and combination of benzene, C=C stretch vibration, C-H wagging in-plane vibration and C-H wagging out-plane vibration. The five segmental spectra aromatic compounds and various combinations of the segmental spectra are fed to SVM to build classifiers respectively.The results showed that in distinguishing the organic compounds, SVM behaved appreciably better than ANN which suggested that SVM approach can be an efficient tool for the information extracting of infrared spectra; in the process of analyzing each Segmental spectrum, it can be concluded that C–H and C–C wagging out-of-plane vibration was the most important vibrational mode in judging different substituted types of ordinary benzene derivatives of all five absorption of aromatic compunds to affecting its substituted types, which agrees with related known research results; When the results from segmental and entire spectra were compared ,we found that some compounds can be well recognized by using only one or two segmental spectra with reasonable results. It means that some segmental spectra may represent the most significant structure information concealed in entire spectra. In another word, the best results are not always got by entire spectra in computer-insistent interpretation of infrared spectra.Support vector machine as a good tool in interpretation spectra shows excellent performance in the filed of infrared spectra. This article provides the quantitative methods and introduces a new strategy for the establishment of infrared spectra intelligent interpretation system. And SVM approach can be an efficient tool for the information extracting of infrared spectra.
Keywords/Search Tags:infrared spectra, support vector machine, organic compounds, aromatic compounds, information extraction
PDF Full Text Request
Related items