Font Size: a A A

A Novel Graphical Representation Of Dna Based On Physico-chemical Properties Of Amino Acids And Similarity Analysis

Posted on:2009-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:F L LiuFull Text:PDF
GTID:2190360245987860Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The primary structures of DNA(deoxyribonucleic acid), RNA(ribonucleic acid) and protein are all macromolecules which are unbranched polymers built up from smaller units. In the case of DNA , these units are the four nucleotide residuse A (adenine), C (cytosine), G (guanine) and T (thymine), which for RNA, the units are the four nucleotide residues A, C, G and U (uracil). For protein, the units are the twenty amino acid residues A(alanine), C(cysteine), D(aspartic acid), E(glutamic acid), F(phenylalanine), G(glycine), H(histidine), I(isoleucine), K(lysine), L(leucine), M(methionine), N(asparagine), P(proline), Q(glutamine), R(arginine), S(serine), T(threonine), V(valine), W(tryptophan) and Y(tyrosine). Thus, a DNA(RNA)sequence can be identified with a word over the alphabet N={A,C,G,T,(U)}, and a protein sequence can be taken as a string over twenty letters. So the tools and methods in Combinatorics and Statistics will play important roles in studying linear sequences of bimolecular units. In this paper, we consider a representation of DNA sequences of nondegeneracy based on the physico-chemical properties of amino acids: getting a amino acid sequence coding from DNA sequence into H-curve,C-curve,P-curve and G-curv, in gerenal, we can only using three of them to represent the DNA sequences'information, and this method also give us a novel graphical representation of protein sequences and similarity or dissimilarity analysis of sequences are made on this method.The main contents are listed as follows:In Chapter 1, we introduce some basic knowledge of molecular biology. Most of the terms and concepts used in this paper are explained briefly here. In Chapter 2, we consider a graphical representation of DNA sequences based on the physico-chemical properties of amino acids and their numerical characterization, we also give a 2D graphical representation of nucleoside acid triplet and protein sequences. Based on the representation, we draw the 2D characteristic curves of the first exon of beta-genes of 10 species.In Chapter 3, we provide a invariant based on C matrix, which overcomes the limitation of traditional matrics even in the symmetrical curves, we make similarity or dissimilarity analysis of sequences of ten exon-1 genes of beta-globin DNA sequences based on this method and draw phylogenetic trees.In Chapter 4, we construct a set of matrices called related matrix on three diffient reading frame of DNA sequences, the leading eigenvalues from the constructed matrics are selected as invariants, we make similarity or dissimilarity analysis of sequences of eight avian influenza sequences and draw phylogenetic trees by using UPGMA.In Chapter 5, we makes a summary and prospect.
Keywords/Search Tags:sequence, graphical representation, C matrix, variation, related matrix, protein, similarity/dissimilarity analysis
PDF Full Text Request
Related items