| Pakistan plays a vital role in China’s "Belt and Road" initiative.The development and dissemination of traditional medicine resources in the new era are crucial components of this strategy.Traditional medicine resources in Pakistan are plentiful and highly sought after,making them an essential aspect of the country’s traditional medicine system.However,the dispersion of traditional drug resources and a lack of systematic collection and organization have led to confusion regarding drug varieties.This confusion not only affects drug safety but also impedes the import and export trade of traditional drugs.Hence,there is an urgent need for systematic organization and information management of traditional drugs in Pakistan and the establishment of a standard and practicable method for identifying the origin of traditional drugs.This study conducted literature research and information retrieval,with a focus on the third part of the Harmard Pharmacopoeia,to summarize the genetic and genetic information data of commonly used traditional drugs in Pakistan.Additionally,second-generation sequencing and bioinformatics techniques were employed,using Getorganelle,SPAdes,and CPGAVAS2 software to supplement the chloroplast genome sequences and annotated data of17 species in the database.Furthermore,comparative genomic analysis was conducted on the chloroplast genome data of these species.Finally,this study constructed the commonly used traditional drug genome database,the Traditional Pakistani Medicine Genome Database(TPMGD),based on the LAMP(Linux,Apache,My SQL,PHP)web server architecture.The stored data in this database will promote the traditional drug trade,identify new drug sources,and facilitate the exchange of traditional medicine culture.The main research content and results are as follows:(1)The database can be readily accessed through the user-friendly website http://www.tpmgd.com/,which is openly and freely available.This database comprises five entries,namely "Home"(website homepage),"Medicinal Species"(list of species),"Species Identification"(identification of species),"BLAST+"(comparison tool),and information search.The users can conveniently search and browse relevant species introduction information,DNA barcodes,and cp-G data and utilize cp-G and DNA barcode data for efficient molecular identification.Additionally,the "BLAST+" tool can be employed by users for gene sequence alignment analysis on protein or nucleic acid data.The current version of the database comprises basic introductions of 128 medicinal species,141 COI sequences,and one mitochondrial genome sequence from 1 animal species,1396 ITS2 sequences from 92 plant species,1074 psb A-trn H sequences from 83 plants,and 199 chloroplast genome sequences from 81 species.(2)The present investigation involved the sequencing,assembly,and annotating of the chloroplast genome(cp-G)of 15 distinct species.The findings of the analysis are summarized as follows:(1)The chloroplast genome size and structure were mostly conserved,ranging from 127,677 bp to 163,784 bp and tetrads,respectively,except for Melilotus officinalis(L.)Pall.,which had a cyclic gene structure and a whole chloroplast genome size of approximately120 kb due to the loss of the IR region.(2)The GC content varied from 33.63% to 39.19%,and the total number of genes ranged from 108 to 130,with Abroma augustum(L.)L.f.)having the highest total number of genes(130)and M.officinalis having the lowest(108).(3)Long repeat fragments,mainly of the F and P types,were generally distributed between 30-39 bp.(4)The Gynocardia odorata R.Br.has the most SSRs with 138,while Aucklandia costus Falc.and Phyla nodiflora(L.)E.L.Greene have the least with 40 each.The majority of species have SSR repeat types primarily composed of single nucleotide A/T sequences.(5)Analysis of codon usage preference revealed that 27 codons were frequently used,two had no preference,and 35 were less commonly used.(3)This study involved the sequencing,assembly,and annotation of the chloroplast genome from fresh leaves of Glycyrrhiza pallidiflora and Glycyrrhiza yunnanensis,along with an investigation of the chloroplast genome sequence characteristics in comparison to orthologous Glycyrrhiza species within the same genus,namely Glycyrrhiza glabra,Glycyrrhiza inflata,Glycyrrhiza triphylla,and Glycyrrhiza uralensis.(1)Analysis revealed the six Glycyrrhiza species possessed a ring-like chloroplast genome structure,with a genome size range of 127,362 bp to 128,148 bp,GC content range of 34.22% to 34.25%,and 108 shared genes,including 74 protein-coding genes,4 r RNA genes,and 30 t RNA genes.(2)Repetitive sequences demonstrated a greater abundance of F and P types compared to R and C types,with fragments commonly present at lengths 30-39 bp and a total number of SSRs in the range of 88 to 93.(3)Codon preference analysis identified 30 codons with preference,32 codons with low preference,and 2 codons with no preference.(4)Global alignment analysis highlighted trn L-UAA-trn T-UGU,trn Q-UUG-psb K,trn G-GCC-psb Z,psb D-trn T-GGU,pet Ntrn C-GCA,and other intergenic regions as highly variable regions.(5)Nucleotide diversity(Pi)analysis identified six highly variable regions,including trn F-GAA-trn L-UAA,trn L-UAAtrn T-UGU,trn C-GCA-rpo B,acc D-psa I,ycf1,and ndh A.(6)The constructed CDS-ML and cpG-ML phylogenetic trees demonstrated similar topologies with high self-spreading support at each node,enabling the classification of the genus Glycyrrhiza into four branches(Clade IIV).Further analysis showed G.pallidiflora and G.yunnanensis in Clade III,and four orthologous Glycyrrhiza species(G.triphylla,G.glabra,G.inflata,and G.uralensis)from Pakistan and China in the second Clade II,with G.uralensis being sister to each other.The ML phylogenetic tree based on six potential marker fragments identified trn F-GAA-trn L-UAA,ycf1,and ndh A fragments as potential molecular markers for the identification of six Glycyrrhiza species(G.triphylla,G.yunnanensis,G.licheniformis,G.inflata,G.glabra,and G.uralensis).Our study presents the first comprehensive analysis of the chloroplast genomes of 15 species,including an in-depth investigation of the chloroplast genome of Glycyrrhiza,and the establishment of the first specialized genomic database for traditional Pakistani medicine.This database integrates species descriptions,information on medicinal properties,genetic information storage,and molecular identification capabilities,making it a versatile tool for information retrieval and molecular identification of traditional Pakistani medicine.Our research not only facilitates the retrieval of information on commonly used traditional Pakistani medicine but also provides a scientific basis for the safe use,molecular identification,and resource conservation of traditional medicine. |