Font Size: a A A

Research On Protein Sequence, Structure And Mass Spectrometry Data Based On Measure Of Information Discrepancy

Posted on:2008-07-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z K WuFull Text:PDF
GTID:1100360218955530Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The protein is a main component of the life organism and is also the main component thatlinking the molecule operation and the biological function. The study on protein facilitates un-derstanding the molecule mechanism and rule of the life activity further. At present, Proteomicsthat study protein based on mathematics, informatics and computer technology has become oneof unusually active research fields.In this thesis, several problems related with the protein sequence, structure and proteomein cell or tissue are investigated by methods in informatics and mathematics. The main work in-clude research on protein sequence comparison and its application, protein structure comparisonand mass spectrometry data classification. Our achivements are summarized as following:In chaper 2, we firstly formulate the protein multiple sequence alignment problem as ainteger programming model, the existence of optimal solution is also proved in brief. We alsoconstruct a optimization algorithm to solve the integer programming model. Secondly, wepresent a novel computational phosphorylation sites prediction method based on the topologicaldistribution of hydrophilic amino acids surrounding potential phosphorylation sites, in whichthe topological distribution is used to characterize the physical-chemical environment of exper-imental verified phosphorylation site. Finally, a measure based on information discrepancy isapplied to the discrimination of outer membrane protein. Different from the previous aminoacid composition based methods, the approach focuses on the comparisons ofsubsequence distri-butions, which takes into account the effect of residue order in protein primary structure. Theapproach outperforms all previous methods on the same benchmark data set.In chapter 3, The work focus on protein structure comparison problem. Fistly, a novelrepresentation of protein structure (subsequence distribution of C_α-C_αdistances, SSD) is for-mulated at first. Then an FDOD score scheme is developed to measure the discrepancy betweentwo representations. Numerical experiments of the new method are conducted in four differentprotein datasets and clustering analyses are given to verify the effectiveness of this new proteinstructure discrepancy measure. Secondly, a novel hybrid representation of protein structureis proposed by utilizing two sources of information. One is the distribution of C_α-C_αdis-tances with sequence separation three, which describes the local-geometry property and is usedto identify contents of regular secondary structures; the other is the linear sequence distancedistribution of medium and long range interactions, which represents packing arrangement and topological connections between secondary structures. Furthermore, we introduce a new proteinstructure comparison method based on information theory. Cluster analysis and structure clas-sification experiments on several data sets demonstrate its effectiveness on measuring proteinfold similarity. Finally, based on contact vector representation, we compared FDOD function,cross entropy and Euclid metric by functional prediction experiment. The experiment resultsshow that FDOD function are more suitable for measuring the discrepancy between contactvector representations.In chapter 4, a classifier based on FDOD is used to discriminate mass spectrometry data ofcancer patient from that of normal person, the performance is satisfying. Becauese of the highdimentionality of mass spectrometry data and the need for finding biomaker, it is necessary for usto study the problem of feature selection from mass spectrometry data. The problem is modeledas a multi-objective programming, then it is tranformed into a single objective programmingmodel byεmethod. Finally, the existence of this model's optimal solution is analysised briefly.
Keywords/Search Tags:Bioinformatics, Function of Degree of Disagreement(FDOD), Optimization Model, Optimization Algorithm, Protein Structure Comparison
PDF Full Text Request
Related items