Font Size: a A A

Phenotype-genotype Association Study Of Congenital Skeletal Malformations Based On Genetic Data And Artificial Intelligenc

Posted on:2024-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F ChenFull Text:PDF
GTID:1524306938457824Subject:Surgery (orthopedics)
Abstract/Summary:PDF Full Text Request
BackgroundCongenital skeletal disorders(CSDs)are a group of genetic and clinically heterogeneous bone and cartilage disorders.These disorders not only manifest in bone and joint deformities but also influence other organs and systems.The molecular diagnosis of CSDs aids in identifying causative genes,managing treatment plans,and improving clinical prognosis.Recently,whole-exome sequencing(WES)has revolutionized the diagnosis and research of CSDs by enabling rapid,high-throughput gene data analysis.However,after quality control and variant filtering,analyzing a typical WES data involves interpreting hundreds of variants.This means that researchers need to identify the variant that can explain the patient’s phenotype from many candidate variants.Given the rapidly growing number of genotype-phenotype associations in CSDs,identifying the causative variants in patients with CSDs remains a labor-intensive and challenging task.Artificial intelligence(AI)has immense potential in handling massive amounts of data and identifying complex patterns,providing an effective means to improve the efficiency of WES data analysis.Therefore,how to use AI to analyze large phenotype-genotype data of CSDs,provide molecular diagnosis for CSDs patients,discover novel genes for CSDs,thereby reducing the costs and turnaround time of molecular diagnosis for CSDs patients,and improving the diagnostic yield of CSDs,is a scientific problem that needs to be addressed.ObjectivesThis project proposes a scientific hypothesis:phenotype-based molecular diagnosis of CSDs and discovery of novel causative genes for CSDs can be achieved through AIassisted analysis of CSDs phenotype-genotype associations and phenotype-first analysis of WES data.Based on this hypothesis,the research comprises three key aspects:1.Develop a phenotype-based gene prioritization model,PhenoApt,using a graphembedding algorithm.2.Develop a molecular diagnostic system for CSDs,PhenOrtho,using large language models(LLMs).3.Use phenotype-first analysis of WES to discover novel genes of CSDs.Materials and MethodsCSDs patients were recruited through the Deciphering disorders Involving Scoliosis and COmorbidities(DISCO,https://discostudy.org/)study group.WES was performed,and the data was analyzed using the Peking Union Medical college Pipeline(PUMPipeline).1.Construct a directed graph of CSDs phenotype-genotype associations based on public databases.Vectorize the phenotype-genotype association directed graph through a graph embedding algorithm.Assign intrinsic weights to phenotypes based on the term frequency-inverse document frequency score.Measure the likelihood of a gene being a causative gene by calculating the score between a group of phenotypes and a gene.Finally,validate the performance of PhenoApt in multicenter,multi-ethnic cohorts,and explore the clinical application of the weighting function.2.Use LLMs to retrieve CSDs related literature from the Pubmed database,extract phenotype-genotype associations,and develop a phenotype-based CSDs molecular diagnostic system,PhenOrtho,based on PhenoApt.Validate its performance among CSDs patients in the DISCO cohort in both "phenotype-based mode" and"phenotype+WES mode".3.Phenotype-first analysis was employed to unravel novel causative genes and variants of CSDs and establish novel phenotype-genotype associations.Immunoblotting,immunofluorescence,co-immunoprecipitation,and in silico simulations were used to investigate the underlying mechanisms of variants.Results1.This study employed graph embedding algorithms to analyze the phenotype-diseasegenotype associations of Mendelian disorders in public databases,and developed a phenotype-based gene prioritization tool,PhenoApt.Baseline analysis indicates that PhenoApt improves performance by 22.7-140.0%compared to previous orthogonal approaches in three independent,real-world,multi-center cohorts(Cohort 1,N=185;Cohort 2,N=784;Cohort 3,N=208).In the evaluation of the weighting function,by increasing the weight for the clinical indications,performance can be further improved by 37.3%(Cohort 2,N=471)and 21.4%(Cohort 3,N=208).2.This study used LLMs to retrieve CSDs related literature from the Pubmed database and developed a phenotype-based CSDs molecular diagnostic system,PhenOrtho.In the "phenotype-based mode",PhenOrtho ranks the causative gene of 51.96%of CSDs patients in the top 5,which is about 89.2%higher than the second-ranked software.In the "phenotype+WES mode",PhenOrtho ranks the causative gene of 80.39%of CSDs patients in the top 5,which improves performance by 17.01192.86%compared to previous orthogonal approaches,reducing the WES analysis duration by 97.15%.3.In this study,12 de novo H3-3A and H3-3B variants were identified in a multi-center and multi-ethnic cohort.Patients carrying these mutations exhibited typical dysmorphic features,as well as developmental delay,short stature,hypotonia,visual impairment,and abnormal brain structures.Immunoblotting experiments revealed that H3-3A and H3-3B mutations might affect protein stability.Coimmunoprecipitation experiments suggested that some mutations could also alter the binding strength between H3.3 and its chaperone protein DAXX.Conclusions1.PhenoApt can identify causative genes based on patient phenotypes.By integrating clinical expertise with artificial intelligence algorithms through the weighting function,the performance of PhenoApt can be further improved.2.LLMs can accurately retrieve CSDs related literature from Pubmed.By combining it with WES data,PhenOrtho can automatically identify causative genes for CSDs,thus shortening the data analysis duration.3.H3-3A and H3-3B are novel genes causing dysmorphic features and can lead to clinical phenotypes such as developmental delay,short stature,and hypotonia.H33A and H3-3B mutations may affect protein stability and alter the binding strength between H3.3 and its chaperone DAXX.This study uses phenotype-first analysis of the CSDs research cohort WES data,and it has found that H3-3A and H3-3B are novel genes causing congenital craniofacial deformities.
Keywords/Search Tags:congenital skeletal disorders, whole-exome sequencing, artificial intelligence, phenotype-driven analysis, H3-3 gene
PDF Full Text Request
Related items