Font Size: a A A

Studies Of New Informatics Methods For Biology Polymer System

Posted on:2007-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H D LiuFull Text:PDF
GTID:1100360185951904Subject:Polymer Chemistry and Physics
Abstract/Summary:PDF Full Text Request
Ⅰ. Background and meaningWith the accomplishment of Human Genome Project (HGP), life science is coming into post genome era. Up to now, millions of data has been generated, which may lead to the important discovery. How to obtain the useful information from these data is the most important problem facing researchers, and the informatics about the complex systems associated with DNA require researchers analyze, obtain and mine the valuable information from data with all kinds of informatics methods. Although many efforts have been done, and many discoveries have been found, there is still a long way to go before resolving all the mysteries, which requires researcher develop new methods and tools.Aiming at biology polymer system (such as DNA and protein), using bioinformatics and chemometrics (principal components analysis(PCA), artifical neural net(ANN), multianalysis, Fourier analysis and wavelet analysis), the following works have been done: 1.new methods of gene finding; 2. studies of interaction between DNA and target molecule; 3. new methods of frequency analysis for complex chemistry and biology system; this part included the following studies: (1) resolving overlapped signals of complicated chemistry system;(2) new frequency analysis method and phylogenetic studies of nucleic acid sequences based on wavelet transform, (3) prediction of transmembrane protein based on maximum spectrum of continuous wavelet transform;4.recognition for nucleic acid sequences with Hidden Markov Model.ⅡContents and results1. New method of gene findingAn integrated method of gene finding was proposed. First, compute four characteristics: three periodicity, D value, GC content and New Z_Curve; then deduce the possible protein coding regions using the four characteristics; at last, establish the precision gene structure. New Z_Curve is new characteristic derived from Z_Curve, by which the possible region and number of coding region can be estimated. The predictions for five genes indicated that the proposed method was feasible and reliable2. Studies of interaction between target molecule and DNA Interaction between target molecule and DNA has the importantsignificance on understanding the drug function, designing molecule and filtering drugs. The present studies aimed to find interaction mechanism with experimental techniques. But, with experimental methods, it is difficult in finding the structure factors that affect the interaction, and for batches of target molecule, the interaction information is difficult to predict.In this study, based on experimental data and quantified structure data, PCA and ANN were applied to study the interaction. The influence factors were tested. Two models of predicting interaction constant and one model of predicting interaction mode were established. Among 24 quantified parameters, it was found that there were 12 parameters that affected the interaction markedly. The proposed models had the good accuracy in predicting constant and mode. These studies could provide much valuable information in filtering drugs and designing molecular.3. New frequency analysis for complex chemistry and biology system(1) Resolving overlapped signals of complicated chemistry systemIn order to estimate number of peaks and find individual peakspositions in overlapped signals, a new method called maximum spectrumof continuous wavelet transform (MSCWT) was developed by extracting themaximum coefficients of continuous wavelet transform (CWT). Peak position in MSCWT was the same as that in original signal. In this process, CWT was performed not on a single dilation but on an appreciation dilation range. To obtain such a range, a new criterion was introduced to choose a center dilation, which was used to form the dilation range. If Cdilation denoted the center dilation, the proper dilation range was [Cdilation-6±2, Cdilation+1±1]. Mexican Hat function was as analytical wavelet. Utilizing the information of peak number and position detected by MSCWT, a fitting route was performed to recover the original signal. One simulated and four true overlapped signals, including high performance liquid chromatography (HPLC), ultraviolet-visible (UV) spectrum and differential pulse voltammetry (DPV), were processed, the results indicated that MSCWT could detect overlapped peak number and position, and the curve fitting based on information of MSCWT had higher accuracy. The proposed method was an efficient one in resolving different types of overlapped signals.(2) Frequency analysis and phylogenetic studies based on wavelet transformTwo new tools, wavelet frequency spectrum (WFS) and wavelet transform Fourier spectrum (WTFS), were proposed for analyzing the frequency of nucleic acid sequence. The results indicated WFS and WTFS had the ability of detecting three periodicity of protein coding sequence, i.e., there was a signal at 0.333Hz. Compared with Fourier spectrum, WFS was free of noise and it could change frequency range freely to view information. Besides advantages of WFS, WTFS presented frequency information as line form, which made frequency peaks easy to detect. Using the results of WFS and WTFS and other gene characteristics, a good gene finding tool could be developed.WFS could also be used to represent nucleic acid sequence in frequency domain. Based such representation, phylogenetic studies of 11 isolates sequence of SARS_CoV were performed and the results indicated that they had clustering inclination in group and located different branches, which suggested that they had different mutation ways.(3) Prediction of transmembrane protein based on MSCWTMSCWT was proposed to predict the transmembrane segment of membrane proteins. Eight SARS_CoV membrane proteins were processed; the results were compared with the software (TMpred) and single scale CWT. It was found that the proposed method had high accuracy, and its results could provide help in studying the folding way of transmebrane segment.4. Recognition for nucleic acid sequences with Hidden Markov ModelIn this paper, a hidden Markov model was established to recognition special segments of DNA or RNA. An operable process was designed to detect GC and TA rich region. The results indicated that the model was successful. The model could be extended to recognition of other special sequences.ⅢInnovationThe novelties of this paper are following:1. Base on Z_Curve, New Z_Curve was derived, which could be used to find the number of protein coding regions and possible location. The new Z_Curve and its property have not been reported in references. The other novelty is that a new idea of integrating multi-characteristics was used to find gene, which can achieved higher prediction accuracy.2. For the studies of interaction between DNA and molecule, this paper proposed a new idea of combining experimental data and structure data to find the influence factors and predict interaction constant and mode by chemometrics methods. This study could provide valuable information for filtering drugs and designing molecular.3. A new spectrum called maximum spectrum of continuous wavelet transform (MSCWT) was developed to identify the each single peak of overlapped signals, which provided hard proof for fitting methods. 4. Wavelet frequency spectrum (WFS) was derived from CWT and applied in frequency analysis and representation of nucleic acid sequences. The phylogenetic study based on WFS representation has not been reported in reference.5. A new method called wavelet transform Fourier spectrum (WTFS) was developed to analyze three periodicity of coding region. The advantage was WTFS could freely change frequency region and it presented data as line form, which made it have higher resolution.6. In processing membrane proteins, MSCWT could provide not only the information of the transmembrane segments but also their folding information.
Keywords/Search Tags:Biology polymer, Interaction with DNA, Gene prediction, Overlapped signal, Wavelet transform
PDF Full Text Request
Related items