| With the great leap of personal genetic sequencing in recent years,people pay more and more attention to the ancestral information revealed by their own genetic data.Genetic affinity between strangers provides a clear evidence of an existed common ancestor no matter how temporally or spatially they are currently divided.However,there are two significant problems with the current ancestry-tracing methods: Firstly,The ancestry-tracing method based on haplogroup analysis can only be used for human ancestry-tracing,and can provide limited information with personal ancestry-tracing over the past century.Besides,The data sparsity and the long-tail feature need to be considered in the genetic dataset,classical machine learning methods cannot accurately trace family origins.Based on the existing researches,an ancestry-tracing method based on Gene Similarity Network analysis is proposed to solve the above problems.The main content is as follows:(1)An ancestry-tracing method based on Gene Similarity Network and adaptive learning is proposed.By constructing the Genetic Similarity Network(GSN),node information selection aggregation layer and neighborhood-oriented adaptive learning module are used to adaptive learn the GSN’s topology information and node feature information.Based on those embedding representation,we perform the node classification task to infer user’s ancestry location label,so as to accurately complete the ancestry-tracing.(2)A multi-class data augmentation framework for imbalance Genetic Similarity Network is proposed.Taking the embedding of GSN as the model input,by using multiple independent generative adversarial networks to enhance the data of the minority class samples,it effectively alleviates the convergence problem and data skew problem in the generative adversarial network training process,and further enriches the genetic dataset’s information and improve the accuracy of ancestry-tracing.(3)Design and implement a visualization system for ancestry-tracing.The system can display the user’s current kinship distribution,perform ancestry inference based on the dataenhanced genetic dataset,and visualize the user’s ancestry-tracing results. |