Font Size: a A A

The Bioinformatics Analysis Of EST Related To Fiber Development In Gossypium

Posted on:2008-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:C B WangFull Text:PDF
GTID:2143360245498825Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
As a major source of industry fibers,cotton is an important economic crop and plays an important role in the global economy.With the rapid development of sequencing technology and bioinformatics,the sequence numbers related with many important molecular biology databases are increased quickly.Up to April,2007,67,218,344 nucleotide sequences have been released in 3 databases,including NCBI,EMBL,DDRI, totally 71,292,211,453 base pairs(bp).281,233 cotton ESTs were displaied in those databases.Therefore,it is importmant that ESTs released with cotton fiber development are large-scale analyzed via bioinformatics methods.In this research,138,086 ESTs related to cotton fiber development were analyzed with bioinformatics methods and EST-SSR markers were developed.Furthermore,the sequences function was also analyzed by GO software.These studies will put a firm foundation for constructing high density genetic mapping,tagging genes,comparing genomics analysis,gene network analysis,and molecular evolution.To develop EST-SSRs functional markers in cotton,63,485 ESTs sequences derived from Gossypium raimondii Ulbrich publicly released in NCBI were downloaded and characterized by bioinformatics.By eliminating their redundancy,58,906 non-redundant sequences were detected.Among 58,906 non-redundant ESTs,2,620 microsatellite sequences containing 2,818 EST-SSRs were found,which were 4.45%of the whole non-redundant sequences,and equivalent to one EST-SSR in every 14.8kb EST sequence in G.raimondii.In different motifs ranged from 1bp to 6bp,trinucleotide repeat type was the most abundant(38.31%),followed by dinucleotide repeats(24.09%) and mononucleotide repeats(23.35%).Among all identified motif types,A/T motif had the highest frequency (18.67%),followed by AT/TA(14.83%).In compound motifs,tandem trinucleotide motifs were the most frequent(48.65%).Based on Primer 3 software,1,554 EST-SSRs primer pairs were developed.Out of them,1,554 primer pairs were screen the polymorphism between G·hirsutum acc.TM-1 and in G·barbadense cv.Hai7124,which were two mapping parents to construct the linkage groups in cultivated allotetraploid cotton in our laboratory.Among them,744(47.9%) primer pairs detected polymorphism between them. These EST-SSRs have been effectively utilized in the comparison of EST-SSR distributions among different cotton species,chromosomes location and so on.138,086 Gossypinm EST sequences were used to compare ESTs homology among different Gossypinm,which were from 14 cDNA libraries constructed from different fiber development phases and including A(G.arboreum,39,117,7 to 10 DPA),D(G.raimondii, 32,316,-3 to 3 DPA) and AD(G.hirsutum,66,653,12 libraries,including -3 to 25 dpa) cotton species,16,100(A genome),11,912(D genome) and 14,327(AD genome) Unigenes sequences were detected respectively with 42,319 in total.The similarity sequences, detected from Unigenes of different genomes,were respectively 2,441,937,2,717,2,188 in four sets(A∩D∩AD,A∩D∩(?),A∩AD∩(?)和D∩AD∩A).The specially expressed sequences in A,D and AD genome(A∩D∩AD,D∩A∩AD and AD∩A∩D ) were 7,996,6,346 and 4,467,respectively.Those similarity sequences were further analyzed in Gene Ontology and Metabolism Pathway.The results were as follows:(1) The 27,092 similarity sequences from seven sets were categorized into 3 main classes,namely Biology Process,Cellular Component and Molecular Function.The most two categories were the same in different sets.Among Biology Process class,cellular process was the most abundant,followed by metabolic process.As to Cell Component,the highest frequency was cell/cell part,followed by organelle.Two abundant classes were catalytic activity and binding in Molecular Function,that is to say,the catalytic activity genes play an important role in fiber development among different Gossypium.In 27,092 similarity unigenes,known function sequences and unknown function sequences were 13,845 and 13,247 respectively.In known function sequences,2,543 sequences were with known metabolism pathways by KEGG database analysis.Among those metabolism pathways,63.19%(1,607) were belonged to Energy/Carbohydrate Metabolism and 28.23%(718) to Amino Acid Metabolism.The sequences related to expansion,cellulose synthase,sucrose synthase,UDP-related,phosphoenolpyruvate carboxylaae,sugar transporter,MYB family and Vacuolar atpase among 13,845 known function sequences were found.Though the seeds of D(D5) genome were coated useless undercoat,38.86% ((2188+2441) /11912) ESTs had the same function against AD(A2D5) genome.It was shown that there were many similary expressed genes between D(D5) genome and AD (A2D5) genome.D genome has some genes related with fiber development,which was implicated by high homology between D and AD genomes in transcription level. (2)There were 5158 similarity Unigenes between A and AD genome.847(16.42%) Unigenes had metabolism pathway in KEGG.The most pathways were Energy/Carbohydrate Metabolism and Amino Acid Metabolism.3378 similarity unigenes between A genome(7 to 10 DPA) and D genome(-3 to 3 DPA) were analyzed for metabolism pathway and function.Among these sequences,the function sequences were 2916.748 sequences had metabolism pathway in KEGG.It was shown that genes expressed the same function were present during fiber initiation and elongation development phases.(3) 4,522(31.61%,A∩D∩AD) unigenes in allotetrapolid AD genome were co-expressed to transcription production between A and D genome comparing A and D with AD genome.It was shown that those genes were conservative repeated genes and independent evolution.2,717(18.99,A∩AD∩D) unigenes were similar between A and AD genome,but they could not be found similarity sequences in D genome.That is to say those unigenes were differential express in A genome or AT subgenome.In the same way, 2,601(18.18%,D∩AD∩A) unigenes were similar between D and AD genome,but they could not be found similarity sequences in A genome.It was shown that those unigenes differential express in D genome or DT subgenome.But,4,467(31.22%,AD∩A∩D) unigenes could not be found similarity sequences in A and D genome.Those unigenes might be particular transcript productions produced by polyploidy of AD genome,then the function was changed.(5) 4,797(29.77%,A genome) unigenes were similar to 3378(28.36%,D genome) unigenes.It was shown that those unigenes were expressed during fiber initial and elongation development.No-similar sequences were 11,303(70.20%) and 8,534(71.64%) in A and D genome,respectively.It was shown that those genes were particularly expresed genes or genome specific express genes in different genome.These results analyzed by bioinformatics will put a firm function for studying fiber development mechanism,further illuminating the relationship of fiber development among different Gossypium and improving fiber quality.
Keywords/Search Tags:Bioinformatics, EST, Function analysis, Similarity, Metabolism Pathway, Unigene
PDF Full Text Request
Related items