| Background:Infantile epileptic spasms syndrome(IESS)is a common developmental epileptic encephalopathy during infancy.Most of IESS patients has poor prognosis,who suffered from varying degrees of developmental delay.Recently,several studies reveled that the etiology of IESS is an important prediction factor of prognosis,whereas nearly one fifth to one third of IESS patients still have an unknown cause,and researchers mostly believe that it is related to genetic factors.Currently,reanalysis of whole-exome sequencing(WES)data by trios-mode from previous negative diagnosed patients,can improve the diagnostic yield from 5%to 20%.Whole-genome sequencing(WGS)is believed to increase variant detection,especially non-coding region variants,and the transcriptome sequencing(TS)can improve the interpretation of non-coding region variants,increasing the diagnostic yield for genetic etiologies of IESS patients.Objectives:To explore the genetic etiology in IESS patients of unknown cause,and to improve the diagnostic yield by WES,WGS,and TS.To investigate the utility of machine learning in the unearth of candidate IESS causative gene.Methods:(1)This study has retrospectively collected the clinical information of IESS patients with unknown cause,who met the criteria and hospitalized in the Department of Pediatrics,Xiangya Hospital,Central South University from January 1,2010 to June 1,2021,described and summarized the general information,epileptic semiology,abnormal clinical feature,neuro-imaging characteristic,anti-seizures strategies,and prognosis of these patients.The relationships among prognosis of epileptic seizure,mental-motor development,and social-living ability were analyzed by Pearson’s chi square test.(2)Quality control,variant detection and annotation,and kinship validation were performed on trio-WES data of IESS patients with unknown cause.On the basis of OMIM and Pub Med,a virtual epilepsy gene panel was summarized,and used to filter variants and evaluate the pathogenicity of variants.Epileptic gene-tissue expression features were analyzed based on the GTEx database.Variants from Clin Var database were used to build epilepsy-causative variant prediction model based on logistic regression,classical decision trees,conditional inference trees,random forests,and support vector machine algorithm.Features of tissue expression,variant pathogenicity prediction,and population frequency were integrated into prediction model.Optimal prediction model was selected to unearth candidate disease-causative gene from trio-WES data of IESS patients with unknown cause.GO and KEGG enrichment analysis were performed on candidate genes.(3)IESS patients of unknown cause with negative WES reanalysis were prospectively enrolled and their peripheral blood was collected.DNA was extracted from peripheral blood and DNA PCR-free libraries were prepared.WGS was performed,and sequence data were analyzed by Dragen Bio-platform.RNA was also extracted from peripheral blood for TS analysis.Human phenotype oncology(HPO)phenotypes of these patients were summarized,and then genotype-phenotype matching,variant filtrating and variant pathogenicity rating were performed by the ISogenetic software.The results of TS data were used to verify the functional impact of non-coding region variants from WGS to adjust the results of pathogenicity rating.Results:(1)A total of 216 IESS patients with unknown cause were enrolled in this study,with a male to female ratio of 1.73:1.The average age of onset of epileptic spasms was 7 months,and multiple seizure types were present in nearly half of patients.Microcephaly(9.7%)and cafe-au-lait spots(6.9%)were the most common clinical feature.Enlarged extracerebral spaces or ventricles(29.6%)as well as brain atrophy or dysplasia(15.3%)are the most common non-epileptogenic imaging change.Multiple anti-seizure treatment strategies were used in most of children(83.3%).ACTH,topiramate,valproate,vigabatrin,and levetiracetam are the most common anti-seizure treatment.The first-tier drug of anti-epileptic spasms(ACTH or vigabatrin)combined with topiramate or valproate regimen were the most effective.At last follow-up,29.6%patients achieved seizure control or remission,13.6%patients had normal or mild mental-motor development delay,and 14.3%patients’m RS score were no more than 2 points.12.0%IESS patients developed into Lennox-Gastaut syndrome.Patients with controlled/remissive epileptic seizure had better mental-motor development(P<0.001)and social-living ability(P<0.001)prognosis than those with active/relapsed epileptic seizure.(2)The trio-WES data from 216 IESS patients with unknown cause passed the quality control and kinship validation.A virtual epilepsy-related gene panel of 1739 genes had been summarized based on the retrieval of OMIM and Pub Med database.The reanalysis of WES data from 17 patients had found disease causative variants matching their clinical phenotypes,and the positive diagnosis rate was 7.87%.The diagnostic yield of reanalysis over three years was higher than those within three years(P=0.004).Epilepsy-related genes were specifically highly expressed in brain tissues,with the highest specificity in the frontal cortex(BA9 region)tissue.Epileptic-causative variant prediction models were constructed based on logistic regression,classical decision trees,conditional inference trees,random forests,and support vector machine algorithm,and prediction model based on random forests had the highest accuracy.41 candidate disease causative genes were detected by the random forest variant prediction model.GO and KEGG enrichment analysis revealed the enrichment of Ca2+signaling and lysosome pathway of these 41 genes.(3)103 IESS patients of unknown cause with negative WES reanalysis were prospectively enrolled in this study.DNA PCR-free libraries and WGS data of these 103 IESS patients met the requirement.The size of WGS data was correlated with the average coverage of genome(Spearman’s r=0.994,P<0.001)and mitochondrial genome(Spearman’s r=0.689,P<0.001),as well as the detection number of complex structure variant breakpoints,insertions and deletions,and the Spearman’s correlation coefficients r were 0.901(P<0.001),0.890(P<0.001)and 0.899(P<0.001),respectively.An average of 453 de novo low-frequency single nucleotide variants and insertion-deletion variants,20 de novo copy number variants and 86 de novo structural variants were detected in each sample.In addition,an average of 144 low-frequency truncating variants,56 canonical splice site variants,and 1058 nearby exon intronic variants were detected in each sample.44 different HPO phenotypes were extracted from 103 IESS patients,and genotypic-phenotype matching was performed by ISogenetic software.Consequently,with the combination of TS data,only 3 patients detected pathogenic disease-causative variants,including a de novo missense variant of ATP6V0A1 and complex heterozygous intronic variants of AFG3L2,and a de novo missense variant of MT-ATP6,and the positive diagnosis rate was 2.9%.In addition,28 patients detected variants of uncertain significance,involving 50 variants of 23 different genes.The expression of 73.9%genes was lower than 5 transcripts per million(TPM)in whole blood tissue.Among 50 variant sites,only one variant that affecting transcript modification was confirmed by whole blood TS.Conclusions:Periodically re-analysis of trio-WES,or trio-WGS with TS can improve genetic diagnostic yield of IESS with unknown cause,and provide clues for further identifying candidate IESS-causative or risk gene variants;Prediction model of random forests based on gene-tissue expression features can help to unearth candidate IESS-causative gene. |