Font Size: a A A

Sequence And Structure Analysis Of Tumor Fusion Genes And Proteins

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:D D WangFull Text:PDF
GTID:2334330554950021Subject:Food Engineering
Abstract/Summary:PDF Full Text Request
Gene fusion associated with chromosomal translocations,which is common in tumor,is considered as one of the major causes leading to cancer.The fusion genes translate into fusion proteins which are significantly enriched in disorder regions.The disordered regions of fusion proteins are involved in important biological processes.However,what contributes to the formation of disordered structure in fusion proteins and what is the effect of fusion protein structure on its function remain unresolved,so it is important to explore the forming factors of disorder region and the contribution to the disorder regions for carcinogenicity.To address these questions,we investigated the nucleotide preference at the breakpoint of fusion partner genes and fusion genes,predicted the secondary structure preference of fusion proteins,and explored the posttranslational modification preference in the fusion proteins.We created a database of fusion protein to identify fusion proteins base on the RNA-Seq data and the experimental fusion proteins.Firstly,the fusion genes data which generated from experiment were selected to explore the features of sequence near the breakpoint in the fusion partner genes and to analyze the relationship between the features of sequence and the fragility of chromosome sequence.The result indicated that the nucleotide combination have sequence preference with a length of ten residues near the breakpoints in the fusion partner genes: the occurrence frequency of dinucleotide combination GG is significantly higher than other combination compared to whole human genome and AGG is preferred before the breakpoint.Similar to the cleavage site of ALU sequence at AG/CT,the sequence at the breakpoint of fusion genes may be more easily recognized and cut by some enzymes that further contribute to gene interruption.Secondly,we further analyzed features of the structure in fusion proteins by gathering all the fusion protein data.The structure feature near the breakpoints in the fusion proteins predicted by IUPred,the result suggested that the breakpoints in the fusion proteins prefer to be located in the disordered regions.We analyzed the posttranslational modification preference by NetPhos 2.0 Server and Me Mo,the result revealed that phosphorylation modification sites at serine and threonine are enriched in the disorder region and the arginine methylation sites in the disorder area are much higher than those in the structural region.For example,we explained how the fusion protein EML4-ALK leads to the protein disorder and contributes to its carcinogenicity,the result illustrated that the disorder region at the breakpoint of EML4-ALK trigger the autophosphorylation of the kinase domain and lead to the oncogenic potential in non-small cell lung cancer.All the results revealed that the formation of fusion proteins may increase the disorder structure,the enrichment of posttranslational modifications in disorder region may promote the posttranslational modifications in the fusion protein.Both of them further play important roles in cancer carcinogenicity.Thirdly,novel fusion genes were identified based on the RNA-seq data from the colon cancer by the TopHat-Fusion tool in current research.Analyzing the breakpoints of fusion genes in the two organizations,the result demonstrated that the occurrence of the dinucleotide in breakpoint had a certain preference,the dinucleotide GA was preferred at the breakpoint.The result is consistent with the fusion genes which were experimental.Finally,we constructed a database base on the preliminary studies in order to provide a reference for identifying fusion protein.We integrated all the fusion genes which were identified by the TopHat-Fusion tool,selected from Cosmic database and FusionCancer database.Then,all the fusion genes were translated into fusion proteins base on the frameshift rule.Combined with the experimental fusion proteins from NCBI to build a background database containing 31,847 cancer-related fusion proteins.The background database provides support for the further research.
Keywords/Search Tags:fusion genes, fusion proteins, disorder region, protein post-translational modification, fusion genes identify
PDF Full Text Request
Related items