Font Size: a A A

Application Of Pattern Recognition Techniques In Bioinformatics And Infrared Spectra Analysis

Posted on:2008-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:F Y TanFull Text:PDF
GTID:2120360242463794Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
With the development of computer and measurement technology, there has been an unprecedented increase in the production of biological and chemical data, which resulted in thousands of database. Hence, the way how to extract as more information from the vast data as possible is the challenge faced by biologist and chemist. Pattern recognition, as one of the main method in data mining, has been widely used in the fields of industry, agriculture, national defence, biomedicine, aerography and astronomy, etc. This paper respectively constructed recognition system for protein sequences and infrared spectra, on the base of optimal characteristics selected by pattern recognition techniques. The details are described as follow:The subcellular localization prediction is the cornerstone and possesses indispensable status in bioinformatics. In this particular paper, the author initially transform the protein sequence in to numeric series by dipeptide composition technology; and then the Genetic algorithm and partial least square tool (GA-PLS) was implemented to weigh which dipeptide composition of Mitochondria or potassium channel proteins was the most important; finally, the prediction models were constructed only using these most important diepetides. The results proclaimed that not all dipeptides are informative in identifying proteins and some dipeptides are with useless information. This work describes a statistical prediction method to recognize proteins only using raw sequence data. Therefore, it can be helpful to annotate unknown proteins and predict their subcellular localization in the absence of experiment data. And the further researches base on these selected dipeptides may make it possible to progress further study of protein structure and biological function. It is anticipated that it can play a supplementary role to biochemical experiments and help to provide insights in selecting the target for drug design with least cost in further study. Similarly, we also extracted the local information of transmembrane segments toidentify the voltage-gated potassium channel proteins (KV) using dipeptide composition technology. The result suggested protein can be identified only using its important function regions and the predictor will perform better than its global sequence information-based model by leaving out its redundant information. The topology information of transmembrane segments is one of the most important content in protein secondary structure prediction. This paper focused on potassium channel proteins, and transformed the sequences into numeric series with hydrophobic values which is the main characteristics of transmembrane proteins. Then, discrete wavelet transform, a useful signal processing tool, was employed to fulfill the multi-resolution analysis by decomposing origin hydrophobic numeric series and reconstructing its approximation coefficients at different scales, respectively. The reconstruction spectra exactly indicated the numbers and the location of transmembrane domains of Kv proteins.Infrared spectra (IR) analysis is an effective tool for structural identification of organic compounds. However, it consists of tremendous complicated absorption peaks, making interpretation of IR spectra a difficult and time-consuming task. In this paper, wavelet transform (WT), a multi-resolution analysis approach, was developed to make a tentative alternative to traditional Fourier transform infrared spectra. For this purpose, we made an infant study to compare the information difference about various function groups by decomposing the origin spectra and reconstructing its approximation coefficients by different wavelets in each decompomstion scale. Considering keeping as more function information as possible but with least dimensions of characteristics, dmey wavelet at scale3 showed the best performance, on the based of which, a novel compact library named as Fourier Transform Infrared Wavelet Coefficients Library (FTIR-WC) was constructed. Two tools, library search and structure elucidation, are developed to illustrate the advantages of the new library system by comparing favorably with those in original FTIR library and got an approving result in limited time. Hence, we can draw a conclusion that the idea of using wavelet coefficients to express IR spectra is feasible. The building of this library virtually evolves a wavelet transform IR spectra system, which is expected to be the cornerstone of further studies. It may address challenges to traditional FTIR and bring a new strategy for the establishment of IR spectra intelligent interpretation system based on wavelet transform.
Keywords/Search Tags:Characteristics Extraction, Signal Process, Sequences Analysis, Infrared Wavelet Coefficients Library
PDF Full Text Request
Related items