Font Size: a A A

Development Of A Novel Molecular Marker Optimized For Low Coverage Genomes And Molecular Identification Of Fruit Flies

Posted on:2024-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:R HeFull Text:PDF
GTID:1523307301979199Subject:Agricultural Entomology and Pest Control
Abstract/Summary:PDF Full Text Request
The family Tephritidae,known as fruit fly,is one of the largest groups in the order Diptera.Fruit flies have a wide range of hosts and mainly damage fruit,vegetable,flowers,etc.Many fruit flies have high invasiveness,threatening the agricultural production and ecological environment.Therefore,many species of the Tephritidae are listed as important quarantine pests.Though significant advances have achieved thanks to the development of molecular techniques,there are still two major problems remained to be addressed.First,the phylogenetic relationships of some key groups in the Tephritidae have not been clarified.For instance,for higher level relationships among subfamilies,whether the subfamilies Dacinae and Trypetinae are a monophyletic grouping.At the genus level,the relationships of three genera:Zeugodacus,Bactrocera and Dacus need be clarified.At the shallow phylogenetic level within the subgenus Bactrocera,whether B.dorsalis is more closely related to B.tryoni or to B.latifrons.Second,the results of the accurate identification of the species are the important basis for customs and quarantine departments to formulate the relevant quarantine measures.Some related species and complex species cannot be distinguished with the current molecular technology.Therefore,it is highly necessary to systematically analyze the phylogenetic relationship of the Tephritidae,and improve the identification of the fruit flies.In this study,the genomes of fruit flies from were sequenced using Illumina sequencing platform.A new molecular marker,orthologous fragment(OGF)was developed and compared with other three molecular markers including the three molecular markers,Benchmarking Universal Single-Copy Orthologs(BUSCO),Anchored Hybrid Enrichment(AHE)and Ultraconserved Elements(UCE).Next,all these molecular markers were used to reconstruct phylogenomic tree of the Tephritidae.In addition,species specific sequences of fruit flies were predicted from low-coverage genome data,and the specific primers of fruit fly species were designed and confirmed for the molecular identification of quarantine fruit flies.The main results are as follows:(1)Low-coverage genome analysis of 17 species of fruit fliesBased on Illumina sequencing technology,the low-coverage genomes of 10 species of fruit flies in three genera were sequenced,including the genus Bactrocera:B.correcta,B.invadens,B.philippinensis,B.rubigina,B.thailandica,B.zonata,B.tsuneonis;The genus Ceratitis: C.rosa;the genus Zeugodacus: Z.tau,Z.scutellata.Together with with the low-coverage genome sequencing data of seven fruit flies previously reported by our group,including Anastrepha ludens,A.suspensa,B.minax,Carpomya vesuviana,Dacus ciliatus,D.punctatifrons and Rhagoletis cerasi,a total of17 species genomes were assembled and annotated.The assembled genomes sizes ranged from 327-1,290 Mb,and the contig N50 lengths varied from 3.72-96.93 Kb.BUSCO analysis was performed to estimate the gene completeness of these insect genome assemblies,showing that the gene spaces ranged from 80.3%-99.0%.The genomes of 17 species were de novo annotated following the BRAKER pipeline,showed that a total of 23,046-160,776 genes were predicted.BUSCO analysis for annoted genes showed that the gene spaces ranged from 77.4% to 96.2%.These results indicate that these low coverage genomes contain relatively complete information of genes,providing important data resoures for the subsequent research on the phylogeny of Tephritidae and the mining of species-specific sequences of fruit flies.(2)Extraction of the molecular markers including BUSCO,AHE and UCE,from the low coverage genome data of fruit flies.The genome data of eight species of fruit flies were downloaded from NCBI genome database,including B.dorsalis,B.latifrons,B.oleae,B.tryoni,R.zephyria,R.pomonella,C.capitate,Z.cucurbitae,together with two Drosophila species(Drosophila melanogaster and D.novamexicana)as the outgroups.Together with the17 species of the Tephritidae mentioned above,three molecular markers were extracted from the genome data of a total of 25 species of Tephritidae and 2 species of Drosophila.BUSCO loci were extracted(from 983 to 1,631)from the 27 genome assemblie,ranging from 59.29% to 98.37% correspondence with the insecta orthodbv9 dataset targeting 1,658 BUSCO loci.For AHE,129-541 loci were extracted from the 27 genome assemblie,ranging from 23.26% to 96.78% correspondence with the AHE probe set of Diptera targeting 559 AHE loci.For UCE,the number of extracted UCE loci was from the 27 genome assemblies.573-1,842 UCE loci were extracted,which ranged from 21.14% to 67.95% correspondence with the Diptera-wide UCE2.7kv1 probe set targeting 2,711 UCE loci.(3)A novel molecular marker OGF was developed based on the low coverage genome data of fruit flies.At present,it remains limited to identify sufficient orthologous sequences from fragmented genome.Here,a novel marker was developed,named OGF,which is a 1:1protein coding orthologous locus,in order to retrieve as many orthologous loci as possible from low-coverage genomic data.Short sequences were fragmented from protein coding sequences with lengths of 120 amino acid(aa)by a slide window strategy,with a step of one aa,then used as queries in BLASTP searching against the protein sequences of another species,and vice versa.The bidirectional hits were assembled and regarded as OGF target loci.A protein coding gene that had at least one orthologous fragment was regarded as an OGF gene.A total of 575 OGF loci across332,764 amino acid containg 101,822 parsimony informative sites.Since the OGF loci were present in all test species,for comparison,no missing species data from all four molecular markers were used in the tests.The results showed that OGF contained the largest number of genes.In contrast,BUSCO had 33 loci and AHE had only one locus,and no UCE locus was available at 100% taxon-occupancy.Compared with AHE and BUSCO,the OGF tree at 100% species occupancy shows higher bootstrap values and higher total phylogenetic informativeness across all times scales for reconstructing phylogenetic tree and phylogenetic informativeness calculation.The results showed that OGFs have have a robust phylogenetic performance.(4)Phylogenetic analyses and divergence time estimation for the Tephritidae.Four markers at different taxon-occupancies were used to reconstruct phylogenetic trees of the Tephritidae,using both the concatenation and coalescent method.And divergence time of the Tephritidae was estimated using a relaxed molecular clock method.The result showed that the crown group of fruit flies(Tephritidae)originated about 132.61 Mya.The subfamily Dacinae and Trypetinae began to diversify at 110.73 Ma and 102.98 Mya,respectively.The phylogenetic trees inferred using 50% taxon-occupancy dataset of BUSCO,AHE and UCE were generally congruent with OGF at most nodes based on the both two methods.The results showed that for higher level relationships among subfamilies,the family Tephritidae was recovered as having two main clades at the backbone nodes,the subfamily Dacinae((Bactrocera +(Dacus + Zeugodacus))+ Ceratitis)and Trypetinae(Anastrepha +(Rhagoletis + Carpomya)),confirming the the monophyly of Dacinae and Trypetinae.For the genus level,the genus Zeugodacus is sister to the genus Dacus rather than Bactrocera,supporting for the elevation of Zeugodacus to the genus level.At the shallow relationship levels within the subgenus Bactrocera,B.dorsalis was closer to B.latifrons than to B.tryoni,which was basal to the subgenus Bactrocera.This result was incongruent with previous conclusions based on mitochondrial data.The result suggested that the nuclear genome data tends to support the closer relationship between B.dorsalis and B.latifrons.In summary,the phylogenetic relationship of the Tephritidae was analysed at the genome-wide level.The controversial phylogenetic relationships among some higher and lower taxonomic levels in the family have been clarified,providing theoretical basis for the classification and identification of the Tephritidae.(5)Specific primers were developed for quarantine fruit flies based on genome-specific sequences.At present,some related species and complex species of fruit flies cannot be distinguished with the current molecular identification methods.To this end,together with 8 species of quarantine fruit flies in 5 genera whose protein coding gene sequences were annoted in this study,including A.ludens,A.suspensa,B.correcta,B.invadens,B.minax,C.rosa,D.punctatifrons and Z.tau,and the protein coding gene sequences of other 5 species were downloaded from NCBI including B.dorsalis,B.latifrons B.oleae C.capitate,Z.cucurbitae.The protein coding sequences of a total of 13 fruit fly species were blast against the assemblied genomes of all 25 fruit fly species mentioned above to predict specific sequences by similar sequences.And 150 pairs of primers were designed for PCR specificity verification.A total of 10 pairs of specific primers successfully screened,whose target band were single,can specifically distinguish 10 species of quarantine fruit flies,including B.dorsalis,B.latifrons,B.oleae,C.capitate,Z.cucurbitae,Z.tau,A.ludens,D.punctatifrons,B.correcta,A.suspensa,respectively.These results provide technical support for molecular identification of quarantine fruit flies and improvement of quarantine level at ports.
Keywords/Search Tags:Fruit flies, Low-coverage genome, Molecular marker, Phylogeny, Molecular identification
PDF Full Text Request
Related items