| The genomic composition of individual cells is lost in conventional bulk sequencing studies,so clear insights into many biological processes—from normal development to tumor evolution—can only be gained from a detailed understanding of genomic,at the single-cell level.With the maturity of next-generation sequencing technology and the development of whole-genome amplification(WGA)technology,single-cell genome sequencing has become possible and begun to be widely used in biological researches.Single-cell genomics is a powerful new tool for investigating evolution and diversity in cancer,detecting rare cell types,tracking cell lineages,etc.By obtaining information of cell heterogeneity,it generated a broad impact on many diverse fields of biology,including microbiology,neurobiology,development,and cancer research.In recent years,the amount of data generated by single-cell genome sequencing has continued to increase.Since each study is relatively independent,these sequencing data are mainly analyzed within their research projects,and cross-project comparisons and research have not been carried out yet.Therefore,it is of urgent need to build a resource-sharing database platform for these data,so that researchers can efficiently download and use these data,carry out comparison and analysis between different projects,cell types and WGA methods,and detect precious biological meanings and laws.This study collected a large amount of single-cell genome data,constructed a human single-cell whole genome sequencing database,built its corresponding website platform,and carried out related application researches.The main work is as follows.According to the characteristics of single-cell sequencing data,we constructed a singlecell genome database,designed its project-sample-cell three-layer structure,and added unique features on the basis of key metadata in the corresponding fields of the 7 tables,such as WGA methods,literature information,and sequencing depth of varied projects.We collected,organized,and analyzed 26,019 single-cell genome sequencing data and metadata from 40 research projects,including 6,121 original sequencing files with a data volume of 15.43 Tb.The corresponding human single-cell whole genome sequencing database website HSCGD is built by using Linux+Apache+My SQL+PHP technology stack.Combined with actual research needs,it provides six functional modules,including user-friendly interface and features to browse data by WGA method or cell type,search and view data,visualize CNV profile,compare amplification results online,and view statistics of current cancer researches.Among them,processes such as mapping,quality control,coefficient of variation calculation,and CNV analysis have been carried out in advance,and the result files have been stored into local server,which reduces the burden and time of interactive computing through web server,allowing researchers to directly obtain landscape of CNV and efficient comparison of amplification uniformity and coverage.HSCGD can be accessed at https://10.193.176.3/hscgd/home.php.Based on the HSCGD website,we performed cross-project sequencing depth-scale comparison,WGA method comparison,CNV comparison and SNV analysis of different types of tumor cells.Some valuable conclusions have been confirmed and found,such as: the scale of single-cell genome sequencing is limited by the sequencing depth,and is gradually increasing with the optimization of technology;DOP-PCR method has the best coverage uniformity but performs worst coverage;CNVs of the same type of cancer cells have similar characteristics,but there are still large differences between individual cells.These applications can be easily performed through the HSCGD website,demonstrating the accuracy and high availability of this database.In summary,the construction of human single-cell whole-genome sequencing database provides a new perspective for multi-dimensional joint analysis in the field of singlecell genomics. |