Font Size: a A A

Identification Of PAP1 And PAP2 Gene And Their Correlative Bioinformatics Analysis

Posted on:2007-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:K X ShuFull Text:PDF
GTID:1100360215499108Subject:Physiology
Abstract/Summary:PDF Full Text Request
[Objective] Tumor suppressor p53 is a transcription factor that playsa critical role in coordinating the response of cells to a diverse range ofstress conditions, e.g. oncogenic activation, hypoxia or DNA damage,which can mediate its different downstream functions by activating orrepressing a large number of target genes. P53 and its downstream genesconsist of a complicated gene network. It is very important to understandthe p53 gene regulatory network in order to know the p53 physiologicalfunctions, medicament discovery and gene therapy in cancers. Theultimate challenge to define the complete p53 gene regulatory network isto identify p53 downstream genes. To identify novel p53 downstreamgenes and explain their functions by molecular approaches, to predict p53downstream genes in the whole human genomic DNA by bioinformaticsmethods in order to study the p53 gene regulatory network further.[Methods] We established a new system of p53 gene inducibleexpressions, with the Tet-OnTM Gene Expression System, in whichexogenous p53 gene could overexpress in doxycycline (Dox) medium butnot in the medium without Dox. And constituted a cDNA library whilep53 gene overexpressed. Gained the novel p53 downstream genes byDD-PCR, sequencing, BLASTn in GenBank and screening the cDNAlibrary. Predicted the structures and functions of the novel genes bybioinformatics analysis and knowed their expression characterizations inmouse embryonic development by northern blot and in situ hybridizationapproaches.Then, collected the p53 downstream genes and the binding DNAsequences for wild-type P53 protein published in PubMed. Statisticalanalysis of the characteristics of the consensus sequences. A model forprediction of p53 downstream genes based on logistic regression analysiswas proposed, with which the candidate features of primary sequence arecalculated by selecting proper models including PWM model, frequencydistribution model, consensus sequence model and the length of insertsequence in the motif. We predicted the p53 downstream genes in human genomic DNA by the conservative consensus binding sequence, theconsensus binding sequence, and the logistic regression analysis model,then classified them according to GO (Gene Ontology).[Results] These results were divided into five parts:Ⅰ. We established a new system of p53 gene inducible expression,named U251-pTet-p53 cell line, with the Tet-OnTM Gene ExpressionSystem, in which exogenous p53 gene could overexpress in doxycycline(Dox) medium but not in the medium without Dox. By comparing theirrandom primer RT-PCR products, it was proved that exogenous p53 geneexpression could lead to many genes differential expression, someup-expressed and others down-expressed. All of these differentialexpressed genes may be p53 downstream genes. Sequenced the 11 EST ofdifferential expressed genes observed, 2 of them not reported.Ⅱ. We constructed the p53 overexpressed cDNA library andscreened the two novel genes complete nucleotide sequences, namedPAP1 (p53 activated protein 1, GenBank number: AF497245) and PAP2(p53 activated protein 2, GenBank number: AY093673) respectively.Ⅲ. The structure and function of PAP1 as follow:1. The results of PAP1 gene bioinformatics analysis:(1). PAP1 gene has been localized the human chromosome 16p12-13,with six exons and five introns.(2). There are many p53 binding sites in PAP1 gene promoter and1-3 introns.(3). The complete nucleotide sequence of PAP1 cDNA has 2779 bpand contains a long open reading frame of 849 bp that starts at the firstmethioine codon (nt 282) and ends with the stop codon TAA (nt 1130).The predicted protein sequencederived from the open reading frameproduces a 282-amino acid polypeptide, with a calculated molecular massof 32.9 kD and a theoretical isoelectric point of 5.81. The molecularformula is C1505H2309N385O421S11.(4). The secondary structure of PAP1 protein can be classified as:40%of alpha-helix, 17%of beta-pleated sheet and 43%of others. PAP1protein is hydropathicity protein, and no signal peptide was found.(5). PAP1 gene is a novel member of the immunoglobulin superfamily (IGSF). Alignment of the predicted protein sequence forHuman, Pan troglodytes, Canis, Mus musculus and Gallus gallus revealedit was highly conserved.2. The results of the molecular experiment:(1). There is a p53 binding site, GAGCTTGTCCcccGAtCAAGCCC,in intron 2 of PAP1 gene indicated it is a p53 downstream gene.(2). The results of immunohistochemistry and TUNEL techniquesshowed From 9-10-dpc was the phase of primitive organ formation inembryo development. It was observed that the cell proliferation wasdominant, apoptosis was scarce, 11-14-dpc was the the phase ofmaintainnent balance by the proliferation and apoptosis.(3). PAP1 gene (in fact is its homologue, IGSF6gene) possibleinvolves in mouse embryonic development. The presence of IGSF6specific transcript was detected by Northern blot in the RNAs extractedfrom 11-14 day-postconception. PAP1 expression is different in mouseembryos of the different ages.(4). In situ hybridization performed on mice embryos sections in11-14 dpc showed the differential presence of PAP1 (in fact is itshomologue, IGSF6gene) in developing lung, kidney, intestine andvertebral column and indicated that PAP1 possible involved in mouseembryonic development. By comparing it with the proliferation andapoptosis in the developing cells suggests a function involvement inembryonic development, perhaps involvement in cell apoptosis.Ⅳ. The results of PAP2 gene bioinformatics analysis as follow:1. PAP2 gene has been localized in the human chromosome 17.2. The complete nucleotide sequence of PAP2 cDNA has 2007 bpand contains a long open reading frame of 510 bp that starts at the firstmethioine codon (nt 952) and ends with the stop codon TGA (nt 1461).3. The predicted protein sequence derived from the open readingframe produces a 169-amino acid polypeptide, with a calculatedmolecular mass of 19.2 kD and a theoretical isoelectric point of 12.56.The molecular formula is C818H1355N317O208S9. NO signal peptide wasfound, it might be non-secretory protein.4. The PAP2 protein has been localized in nucleus. 5. The secondary structure of PAP2 protein can classified as: 20.71%of alpha-helix, 4.14%of beta-pleated sheet and 75.15%of others.Ⅴ. Total 49 of p53 downstream genes and 72 of human DNAbinding sequences for wild-type p53 published in PubMed was collected.1. The results of statistical analysis as follow:(1). It's consistent with the consensus binding sequence for wild-typep53 that El-Deiry, et al defined, but there are mismatch distribution inmost of position in decamers and the numbers of mismatch are 10-20%.(2). In all decamers, the number of three mismatches is 34.4%, fourmismatches is 12.7%and five mismatches is 6.35%. These data show thatthe criterion for computer analysis of p53 downstream genes allows atleast four mismatches.2. The results of establishment of the model of logistic regressionanalysis as follow:(1). Two PWM matrices were adopted to modeling the two decamersrespectively, and a cross validate method was used to affirm the motif inevery known binding sequence. Then those motifs' features wereconsidered as the objects of the logistic regression analysis. A model forprediction of p53 downstream genes based on logistic regression analysiswas proposed, according to the optimal features including the twodecamers' PWM score are determined from candidate feature sets througha stepwise selection process offered by SPSS. The model is:p=exp(-4.655+0.457×hpwmsc+0.421×tpwmsc)/(1+exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))The region is p>or=0.1076, and hpwmsc, tpwmsc stands for thescore of PWM of head decamer, tail decamer in the motif, respectively.(2). The DNA binding sequences for wild-type p53 published inPubMed was regarded as positive dataset and human gene CDSsequences picked random as negative dataset. The model was trained andtested on the selected positive and negative datasets by the jackknifemethod, and the average prediction accuracy is 93.91%.(3). Analyzed the p53 downstream genes in the human genome usingthe prediction model and computer Perl language, and compared with theresult of consensus sequence model, the results indicated that our model was a universal algorithm that outperformed the traditionary consensussequence model, furthermore the framework of the model is extendable,which could accept more new fratures to improve the efficiency ofprediction results.3. The results of prediction of p53 downstream genes in humangenomic DNA as follows:(1). There are 1693 of p53 downstream genes by the conservativeconsensus binding sequence.(2). There are 22107 of p53 downstream genes by the consensusbinding sequence (allows four mismatches).(3). There are 15182 of p53 downstream genes by the logisticregression analysis model.4. The results of the classification of p53 downstream genesaccording to GO as follows:(1). Cellular Component: mainly including cell, organelle and proteincomplex.(2). Molecular Function: mainly including binding, catalytic activity,enzyme regulator activity, signal transducer activity, structural moleculeactivity, transcription regulator activity, transporter activity and obsoletemolecular function. There are a lot of p53 downstream genes which arenot identified now in the groups of transcription regulator activity,transporter activity and obsolete molecular function.(3). Biological process: mainly including cellular process,physiological process, regulation of biological process, response tostimulus. There are a lot of p53 downstream genes which are notidentified now in the groups of development and obsolete biologicalprocess.[Conclusion] The conclusion mainly including:1. We have established a new system of p53 gene inducibleexpression, named U251-pTet-p53 cell line, in which exogenous p53gene could overexpress in doxycycline (Dox) medium but not in themedium without Dox.2. Constructed cDNA library in whichp53 gene overexpressed.3. PAP1 gene is a novel p53 downstream gene which has been localized the human chromosome 16p12-13, with six exons and fiveintrons. The predicted PAP1 protein is a novel member of theimmunoglobulin superfamily (IGSF), which is highly conserved. Thedifferential presence of PAP1 in developing lung, kidney, intestine andvertebral column indicated that PAP1 possible involved in mouseembryonic development, perhaps involvement in cell apoptosis.4. PAP2 gene is a novel p53 downstream gene which has beenlocalized the human chromosome 17. The predicted PAP1 protein ishighly conserved.5. The results of statistical analysis show that the criterion forcomputer analysis of p53 downstream genes allows at least fourmismatches.6. A model for prediction of p53 downstream genes based on logisticregression analysis was proposed:p=exp(-4.655+0.457×hpwmsc+0.421×tpwmsc)/(1+exp(-4.655+0.457×hpwmsc+0.421×tpwmsc))The region is p>or=0.1076, and hpwmsc, tpwmsc stands for thescore of PWM of head decamer, tail decamer in the motif, respectively.15182 of p53 downstream genes have identified by this model.
Keywords/Search Tags:p53 gene, p53 downstream gene, PAP1 gene, PAP2 gene, logistic regression analysis
PDF Full Text Request
Related items