Font Size: a A A

Novel Methods For Mining Polymorphic Microsatellites From Insects Omics Data And Characteristics Analyses For Genomic Microsatellites

Posted on:2021-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:R Z TianFull Text:PDF
GTID:1363330620973186Subject:Agricultural Entomology and Pest Control
Abstract/Summary:PDF Full Text Request
With the development of sequencing technology,omics data is exploding quickly,providing good resources for molecular markers mining.Microsatellite,also known as simple sequence repeats(SSRs),is composed of core sequences and highly conserved flanking sequences.Microsatellite is a type of codominant Mendelian markers with characteristics of high polymorphisms,accuracy of allele characterization,and abundant evenly distributed in genome,which is widely used in researches of population genetics and other related areas.It is important to find methods for mining or developing SSR efficiently.So far,there is not any method for mining polymorphic SSRs from genome and transcriptome data.Reports about the characteristics,distributions,functions of SSRs in organisms are scarce.In this study,two novel methods for mining SSRs from genome and transcriptome data were established,respectively.The accuracy of the methods was validated by genotyping of SSRs in insect individuals.Analysis of the characteristics and functions of SSRs in various genome regions of insects was performed.We found that mutation of SSR in gene coding regions can affect the functions and structures of amino acids.The length change of SSRs in 5’flank region could regulate the expression of genes.In addition,we identified some core genes that play important roles in the aphid polyphenism and phenotypic plasticity using the gene expression profiling.1. Development and validation of a novel software for mining polymorphic SSRs from genome and resequencing dataA polymorphic SSRs mining software consisting of five subroutines and being named as GSSRt(Genomic SSR Mining Tool)was developed in this study.These subroutines were responsible for SSR identification,tandem repeat sequence insertion-deletion(Indel)information screening,SSR sequence indel information mining,duplicate information removal,and the calculation of polymorphic information content(PIC)on each SSR.The software runs fast and is simple to use.GSSRt can accurately capture and integrate the polymorphism information from the result of resequencing data analysis.To avoid the slow running speed caused by complex logic,the program was optimized during the programming process,such as almost no regular expressions for tandem repeats and microsatellite sequence identification,and more operations performed in a single loop.To validate the efficiency of the software,125219 polymorphic microsatellite loci and 32408 highly polymorphic loci were mined using the genome and resequencing data of Cydia pomonella.A total of 77 loci were selected from the 32408 highly polymorphic loci for genome typing of SSR in 12 C.Pomonella individuals.The results showed that all 73 of the 77 loci(94.81%)exhibited polymorphisms in the samples.The mean PIC was 0.5610,and the PIC for 58 loci exceeded 0.5(highly polymorphic).The new software can efficiently and accurately mine SSRs from genome data,which can promote the development and application of polymorphic SSR from genome sequences.2. Characteristics of SSRs in insect genomesSSR mutations provide abundant variability and potential selective advantages,which promote evolution of organisms.The newly developed GSSRt software was used to analyze the genome and resequencing data from Apis mellifera,Drosophila simulans,Leptidea sinapis,Bombyx mandarina,Bombyx mori,and Cydia pomonella.The results showed that dinucleotide SSRs had highest percentage of polymorphic loci among six insect species(12.15%~59.80%),following by mononucleotide,trinucleotide,tetraucleotide,pentaucleotide and hexnaucleotide SSRs,which were 7.69%~63.91%,10.10%~48.41%,6.96%~41.75%,2.31%~25.25%and 0%~26.79%respectively.Apart from the dinucleotide SSR,the percentage of polymorphic loci was decreased with an increase of motif length in the genome sequences of six insect species.C/G and CG/CG SSRs had the lowest polymorphic percentage in mononucleotide and dinucleotide repeats,respectively.The percentage of polymorphic loci of CCG/CGG was lowest in trinucleotide repeat.Of all the SSRs,77.36%~93.21%located in the intergenic regions.For the SSRs in the genes,3.51%~22.12%were found in the introns,whilst 0.45%~7.05%in the extrons,and the percentage of polymorphic SSRs in the coding regions of genes was significantly lower than the loci in the intergenic or intron regions.In comparison of the SSRs from the azinphos methyl and deltamethrin resistant samples with insecticide susceptible samples of C.pomonella,there were significant differences in the classification of heterozygote numbers between the insecticide resistant and susceptible samples.A total of 59 polymorphic SSR loci were identified in the coding region of C.pomonella genome.The 59 SSRs located in 24 chromosome and most were found in the 13thchromosome.Among these loci,eight polymorphic microsatellites had significant differences in genotypes between the two resistant and sensitive lines.These results showed that the changes in the length of the SSRs may play important role in the regulations of the related proteins.3. Analyses of SSR loci in the 5’flanking regions of gene familyPrevious studies found that SSRs in the 5’flanking regions of some specific genes in insects can affect gene expression by length change.Cytochrome P450s(CYPs)play a key role in the metabolism of many endogenous and exogenous substrates.In this study,insect CYP genes were used as an example for the analysis of the characteristics of SSRS in the 5’flanking region of gene family.The CYP gene families of 27 insect genomes were annotated,and the polymorphic SSRs in the 5’flanking regions that had potential effects on gene expression of these genes were screened.A total of 1216 SSRs were indentified in the CYP genes,of which 643(27.19%)locate in the in the 5’flanking regions.Among these SSRs,the number of mononucleotide SSR was the largest,reaching 860,following by dinucleotide,trinucleotide,tetranucleotide,hexanucleotide,and pentanucleotide repeats.Previous studies have suggested that the SSRs co-existing in the 5’flanking regions from allied species may be the“adjuster knob”for the gene expression regulation.To analyze the co-existing SSRs in 5’flanking regions,the further analysis for CYP gene family of 4 aphid speices in 27 insects was performed.Based on the phylogenetic tree and multiple sequence alignment,16 orthologues of the CYP gene family from Acyrthosiphon pisum,Myzus prsicae,Myzus cerasi,and Diuraphis noxia were identified,and three groups of co-existing SSR loci were obtained.Among the three groups,the characteristics of the SSR loci in the two groups are consistent with the traditional co-existing SSR,that is,the type of SSR motif is the same and the flanking sequences of SSR loci are highly similar;the other group,although the relative positions in the 5’flanking region were highly consistent and the type of the SSR motif was the same,the flanking sequences differ greatly between the allied species,which is different from the characteristics of co-existing SSR.To further study the special co-existing SSRs,the orthogroups of the four aphids were screened using their protein sets.A total of 58 groups consisting of co-existing SSRs were found.Among the 58 groups,the characteristics of 55 SSR loci were consistent with the traditional co-existing SSR,and the other three SSR loci were consistent with special co-existing SSRs.Among the 58 SSR loci,trinucleotide microsatellites accounted for the largest proportion,reaching 53.45%(31),following by mononucleotide(16)and dinucleotide(11)SSRs.The dominant repeat types in trinucleotide,mononucleotide,dinucleotide repeats were AAT/ATT,A/T and CG/GC types respectively.These cross-species co-existing SSRs can exist in different species in the long-term evolution process,indicating that they may play an important role in the regulation of gene expression.4. Evaluation of development efficiency of SSR loci from genome survey dataUsing Illumina sequencing technology,the genome survey data of Rhopalosiphum padi was obtained in this study.Comparison of the SSRs from the survey data and the newly released whole-genome data confirmed that SSRs development from genome survey data can be accomplished in the absence of full genome sequences and sufficient transcriptome data.5. Development and validation of a new method for polymorphic SSR mining from insect transcriptome dataIn current study,we developed a a software PSSRdt(Polymorphic SSR digging tool,PSSRdt)firstly,and then established a new method for mining SSR from insect transcriptome data.The method consists of three stages:raw data processing,PSSRdt software application,and sequence extraction and information check.The new PSSRdt software for SSR identification and polymorphism information acquisition is the core of the method.To validate the efficiency of the method,all 1940 unverified polymorphic SSRs were excavated using the analysis results of the 44 Acyrthosiphon pisum transcriptomes.To validate the accuracy of the method,genotyping of 52 loci were performed in 25 A.pisum individuals.The results showed that nine to 21 individuals were successfully genotyped in the 52 loci.More than 92%of the sites were polymorphic,and 73.1%of the sites were highly polymorphic,indicating that the method can efficiently and accurately identify polymorphic SSRs from transcriptome data.This novel method opens a new path for mining polymorphic SSRs from transcriptome data.6. Characteristics,functions and SSRs of core genes in pea aphids using gene expression profilingGene expression profiles were constructed by analyzing 15 transcriptomes data from five aphid types of each of the three pea aphid genotypes.A total of 418 specific expression genes were excavated and annotated,of which 11 specific expression genes were verified by q PCR.The RNA samples were isolated from five morphs of pea aphids.The trends of the relative expression levels from q PCR in the 11 genes were consistent with the gene expression profiles,which confirmed the accuracy of the gene expression profile data.A dynamic regulatory network for aphid gene expression was constructed.A total of 263 core genes and 8 transcription factors were identified,and 223SSR loci were extracted from the 5’flanking regions of the core genes.A gene co-expression network and molecular markers that could be used to predict the morph of pea aphid were calculated using the ARACNe(Alogorithm for the Reconstruction of Accurate Cellular Networks)and Random Forest algorithms.The molecular markers screened by multiple methods may be of importance of analyzing aphid polyphenisms...
Keywords/Search Tags:Microsatellite, Polymorphic SSR loci, Coding region, 5’flanking region, Gene expression profile
PDF Full Text Request
Related items