Pathogenic microorganisms can cause a variety of infectious diseases in humans and other animals,which not only restrict the economic and social development,but also pose a serious threat to public health and safety.Most of the pathogenic microorganisms that can cause zoonosis come from animals,and the recombination and variation of pathogenic microorganisms in these animals are very active,which brings great challenges to the detection of pathogenic microorganisms and the diagnosis of zoonosis.Rapid detection of pathogenic microorganisms can effectively reduce the harm of infectious diseases.At present,the detection of infectious diseases caused by pathogenic microorganisms mainly relies on the traditional methods,such as serum immunization,smear microscopy,PCR amplification and bio-gene chip methods.Although these methods have been playing an important role in the detection of pathogenic microorganisms and the diagnosis of diseases,they also have some limitations.For example,the pathogenic microorganisms in the sample to be tested need to be isolated and cultured,and these methods have poor effect on the detection of the recombinant mutated pathogen.In recent years,the pathogenic microorganisms detection methods based on high-throughput sequencing data have been constantly reported,which can be used to detect the known or unknown pathogenic microorganisms by bioinformatic analysis of high-throughput sequencing data.Without relying on probe design and sample morphological characteristics,this method makes up for the shortage of traditional detection methods and provides new drives for the prevention and control of infectious diseases.Our veterinary Institute has long been responsible for major national projects on the prevention and control of zoonosis,it is of great practical significance to apply the NGS-based pathogenic microorganism detection method to the rapid detection of pathogenic microorganisms.In this paper,we choose RINS as the basic algorithm for rapid detection of pathogenic microorganisms in zoonosis after comparing the advantages and disadvantages of several pathogenic microorganism detection algorithms at home and abroad.With the characteristics of rapid analysis,RINS can greatly reduce the size of the pathogen database,and greatly reducing the amount of computation using prior knowledge.Finally,we improved RINS algorithm that combines the advantages of Path Seq and Ca PSID algorithm,and build the rapid analysis platform for zoonotic pathogens.The main contents of this paper are as follows:First,we improved the RINS algorithm.Most processing steps of the RINS algorithm are serial computation.By parallelizing the Blat comparison in the RINS algorithm,it improves more than 7 times faster in this step.We combined Fast QC and Trimmomatic for quality control and removed the contamination of adapts and primers based on visual report results to ensure the quality of the sequencing data,impoved the data comparison module,by using the newest software such as DIMOND to analyze the splicesd metagenomic data,replaced Trinity with MEGAHIT for De Novo stitching of metagenomic sequencing data to make the stitching result more accurate and improved the way of building Blast database in RINS algorithm,so that the result of Blast output contains the annotated information.Second,a visualized data analysis platform is built.Linux server is chosen to build analytic platform.Developped by automated analysis process,the integrated analytic platform included the dedicated genome reference sequence database of major zoonosis related hosts(pigs,cows,pigeons,etc.)and pathogens(bacteria,viruses,fungi,etc.),as well as the optimized RINS algorithm for pathogenic microorganism detection.By using the established analysis platform and pigeon transcriptome sequencing data,we construct the pigeon EST database.The accuracy and completeness of the database was evaluated by using BUSCO and other softwares.The analysis platform is visualized in three aspects: visualization of genome data,visualization of data analysis operations and visualization of data results,which greatly improves the practicality of the system and reduces the threshold of data analysis.Third,performance analysis.The analysis platform have been verified with pigeon metagenome sequencing data on a 4-processor Linux server,the whole analysis process lasted 39 h,during the analysis we found that 16.1% of the sequences were annotated to bacteria,2.2% to the virus,32.9% to the fungus and 48.8% to the unknown sequence.Through the statistics of the bacteria carrying information in the sample,we found that the number of sequences matching to Pasteurella,Escherichia coli and Salmonella is higher than others;this result is consistent with epidemiological studies on pigeon disease in data sources.The verification results show that the analysis platform can get the species information of pathogenic microorganisms quickly.Fourth,realize the rapid migration and deployment of analysis platform.We package the entire analytics platform with docker container technology to make migration deployment easy.The analysis platform is now deployed at the Cloud Computing Center in Jilin Province for researchers,it takes less than half an hour that analysis platform from deployment to actual use without counting the time spent on copy the data.In this study,we have completed the research and construction of the zoonosis detection and analysis platform based on high-throughput sequencing data and realized a practical transformation of high-throughput sequencing technology to practical applications. |