Font Size: a A A

Research On Molecular Mechanism Of Non-small Cell Lung Cancer (NSCLC) Based On Bioinformatics

Posted on:2010-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q L ChenFull Text:PDF
GTID:1114360278976329Subject:Electronic biotechnology equipment
Abstract/Summary:PDF Full Text Request
Lung cancer is the most common lung malignant tumor. In recent years, along with the many environmental factor influence, the morbidity rate and mortality rate of lung cancer were rapid rise in the world, especially in developed industry country. However, the molecular mechanism of lung cancer is by far still ambiguous, and difficulty in early diagnose and therapy. In view of this, the bioinformatics methods were used in this dissertation, and discussed the mechanism of non-small cell lung cancer (NSCLC) from the data mining of genes differentially expression, the prediction of protein-protein interaction (PPI) and constructed PPI network, respectively. In the meantime, the partial bioinformatics results were validated based on molecular biological experiment. Because the NSCLC was main type in lung cancer, the data of this dissertation were root in squamous carcinoma and adenocarcinoma database in GEO. The main works of this dissertation are as follows: Firstly, in order to elucidate that the changes of gene expressed mode, the metabolic pathway of differentially expressed genes (DEGs) and its possible roles in NSCLC development, using computer program BRB-Array Tools and MATLAB, the lung squamous carcinoma database (GDS1312) and adenocarcinoma database (GDS1650) were mined, respectively.The database GDS1312 including 5 cases lung squamous carcinoma tissues and 5 cases normal paracancerous tissues. The result shows that 409 DEGs were screened as up-regulated in squamous carcinoma, whereas 877 DEGs were screened as down-regulated. The Gene Ontology (GO) comparison result show that 95 GO categories were obtained from 1730 genes, and main involved cellular cytoskeleton, cell cycle regulation, programmed cell death, immune response, protein enzyme, and so on. KEGG pathways were main involved metabolism, cell cycle, and disease related pathway. BioCarta pathways were main involved cell adhesion, cell cycle regulation, immunology, cell signaling and metabolism.The database GDS1650 including 10 cases lung adenocarcinoma tissues and 10 cases normal paracancerous tissues. The result shows that 632 DEGs were up-regulated and 975 DEGs down-regulated. 63 GO categories were chosen from 1358 genes, and main involved cellular cytoskeleton biogenesis, regulation of cell adhesion, cell recognition, blood vessel development, and protein-kinase binding, and so on. Three KEGG pathways were involved Cell adhesion molecules (CAMs) pathway, Leukocyte transendothelial migration pathway, VEGF signaling pathway, mTOR signaling pathway and cell cycle pathway. BioCarta pathways are likely lung squamous carcinoma related pathways, also involved cell adhesion, cell cycle regulation, immunology, cell signaling and metabolism.Secondly, prediction of protein-protein interaction (PPI) based on support vector machine (SVM). The properties of any two continuous amino acids as a descriptor (two amino acids units), and counting the frequencies of each two amino acids units. Then, constructing a binary space (V, F) to represent a protein sequence, and the PPI information of protein sequences were mapped into a vector space. The predicted models of PPI were constructed using the radial basis function kernel, and the learning methods of SVM to construct. In order to validate the forecasting reliability, the 10 times 10-folds cross validation method were used. This method can obtain a stabilized PPI predicted model which the accuracy overrun 83%. Thirdly, the lung cancer protein-protein interaction (PPI) network constructed.The lung cancer related protein database was formed based on the up-regulated DEGs genes, and 95 proteins were obtained which high related to squamous carcinoma, whereas 178 proteins were obtained which high related to adenocarcinoma. 19 co-expressed proteins were also simultaneously obtained in two type lung cancer from comparison. The complete PPI data were searched from HPRD database based on these proteins, and integrate the predicted PPI information using SVM. To delete the self-interaction data and redundancy data, and the PPI network of lung carcinogenesis was constructed by Cytoscape program. Using Degree sorted computer program, the hub proteins of PPI network were obtained, including 19 proteins in lung squamous carcinoma and 35 proteins in adenocarcinoma. Discuss the possible role of hub proteins in molecular mechanism of lung cancer, and propose a"molecular group"hypothesis for lung carcinogenesis.Finally, in order to validate the results of bioinformatics, the 6 genes were screened form co-expression genes, then using Semi-Quantitative RT-PCR technology to validate these genes expression in squamous carcinoma cell strain and adenocarcinoma cell strain. The results show that 5 genes were expressed in two type lung cancer cell strains, and indicate the expression of these genes more likely"correlative"in cancer strains. Moreover, the SOX4 was high expressed and indicate this gene may associate with lung carcinogenesis. Then, SOX4 mutations were detected in partial tissues of 90 cases NSCLC tissue samples using PCR-SSCP method and DNA sequencing technology. Combine the MATLAB and SwissPdbViewer program, modeling the SOX4 tertiary structure were predicted. The results indicated that the mutation lead to the side-chain conformation of SOX4 was changed, and may effects the interaction function for other molecular. It also suggesting that SOX4 mutation may be a potential factor with lung carcinogenesis.In summary, single or several genes/proteins could not determine the molecular mechanism of lung carcinogenesis, and it may likely associated with a complex regulation system that formed by many"molecular group"which related to carcinogenesis.The main innovation of this dissertation:1. Combined MATLAB program and BRB-Array Tools to mine the differentially expressed genes data of NSCLCs, provided a new method for the data mining study of microarray data, and discuss the possible molecular mechanism of lung cancer form gene expression level.2. The properties of any two continuous amino acids as a descriptor, design a PPI predicted method based on support vector machine (SVM). This method can furthest ensure the integrality of amino acids information of protein pairs. Using MATLAB as experimental platform, and furthest decrease the difficulty of algorithm realized.3. The protein data were obtained which highly related to lung caner based on the data mining results of gene expression. Integrate the PPI information of database, the PPI network of lung cancer were constructed. Then, based on the hub proteins of PPI network, propose a"molecular group"hypothesis for carcinogenesis, and provided a new research clue for the mechanism study of lung carcinogenesis. 4. Reported the SOX4 mutation in non-small cell lung cancer (NSCLC) tissues. Combined MATLAB and SwissPdbViewer program to model and predicting the SOX4 protein tertiary structure, and provided a new method for homology modeling study of protein.
Keywords/Search Tags:Non-small cell lung cancer (NSCLC), BRB-Array Tools, SVM, Differentially expressed genes, Protein-protein interaction (PPI) network, Tertiary structure, SOX4 mutation
PDF Full Text Request
Related items