Font Size: a A A

A Study On The Research Of Network Inference In Biological Networks

Posted on:2020-12-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:1360330596481235Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The rapid advancement of high-throughput technologies provides huge amounts of information for gene expression and protein activity in the genome-wide scale.The unprecedented growth of biology data,the effective use of these data,the biology network behind it is one of the research hotspots of systems biology.Complex network theory provides a new perspective for us to explore various complex systems.People gradually realize that the research process can not be limited to a single gene,but shold explore the interaction between biomolecules from a systematic perspective.Thereby studying the operating mechanism of the entire biology system.The purpose of the inference of biological networks is to construct a network structure composed of biological molecules interactions from biological data.Therefore,research on biological network inference is of great significance.Proteins participate in and control most of life activities in living activity.The analysis method of protein-to-protein network(PPI)becomes an important way to study the functional properties of proteins.The analysis of the network of protein interactions not only provides an effective method for understanding the mechanism of life activities in cells,but also diagnoses and treats diseases.The wide application of development and other aspects has played an important role.Triple negative breast cancer(TNBC)refers to breast cancer with negative immunohistochemistry results for estrogen receptor(ER),progesterone receptor(PR)and proto-oncogene(Her-2).TNBC tends to be more aggressive,asscociated with the prognosis of receptor-positive subtypes,and more common among young and African American woman.Breast cancer is one of the most ordinary types of lifethreatening disease in females worldwide.The diverse genetic indicators of breast cancer have been examined thoroughly in magnificent detail.According to statistics,one-third of breast cancer patients experienced recurrence or metastasis later.Despite detection and emerging therapeutic have already made a great progress,further improvement has to be achieved for early diagnosis to reduce the chance of metastasis.For better forecast of the disease,monitoring and early diagnosis are of great importance.Understanding the body at the protein level may lead to a new predictive model of how cancer works.As the actual functional properties of cells are transmitted by proteins,proteomics has been extensively studied by some cancer researchers using cell lines or with low analytical depth due to technological challenges.More than 80 percentage of breast cancer can be treated by targeted therapies,but triple-negative breast cancer(TNBC)is an important unmet clinical problem.In this paper,the proteomics data of triple-negative breast cancer was studied by studying the specific pathways involved in cell proliferation in the MAPK signaling pathway,including MAP kinase,JNK kinase and P38 kinase pathway pathways.By constructing a network of specific pathway protein interactions,mining the interactions between proteins,and detecting key proteins for dynamic processes,the discovery of these key proteins can provide sugesstions for many biological and medical problems such as monitoring medical diagnosis and diagnostic effects.In recent years,the correlation metrics based on information theory have been widely used to inference biological networks.Some scholars have proposed conditional mutual information(CMI)as a measure of the correlation between network nodes,and based on the path consistency algorithm(PCA)to delete the network edges to construct a network.The algorithm has the nonlinear independent detection performance,and has the advantage of simple calculation and computes quickly.It is suitable for constructing complex biological networks.Therefore,we selected 88 samples from different stages of breast cancer collected in the Yair Pozniak et al(2016)literature,and functionally analyzed the gene set by Go-enrichment,selecting 90 proteins of Ras-Protein and Response to cytokine function.Using the PCA-CMI algorithm,we constructed four different networks for the selected 90 proteins in different states,and compared the topology and characteristics of the constructed network.When constructing a protein-protein interaction network for specific proteins of MAPK signalling pathway,we first processed the specific protein data of the MAPK pathway,and selected 60 proteins of the classical pathway based on the biological background for research.When constructing an protein-protein interaction network for specific proteins of MAPK signaling,we first processed the specific protein of the MAPK pathway,and selected 60 proteins of the classical pathway based on the biological background for research.However,due to a large propotion of missing data,remove the deletion rate of up to 50% of the protein,the remaining 27 protein data completion.44 non-time series 27 proteins were pseudo-time ordered by diffusion map and Wanderlust algorithm,and then the 27 data were smoothed by Gaussian process regression.We found that some of the smoothed protein data had a lot of "noise" compared to the original data,so these proteins were removed.Finally,the remaining 16 protein data were constructed for dynamic network of protein-protein interaction network.Secondly,we inferred the protein-protein interaction network for the 16 proteins after selection and data processing.The inference process is mainly divided into two parts: firstly,a static network is constructed for 16 proteins by the top-down method(ie,Gaussian graphical model);then based on the topology of the static network,we construct a dynamic network of the inferred static network through the bottom-up approach(ie,ordinary differential equation modeling).In the modeling of ordinary differential equations,we used protein data after pseudo-sequence smoothing.The parameters of the differential equation are estimated by the rejection algorithm of approximate Bayesian computation.When constructing the mathematical model of differential equations,we assume that the edges between nodes in the inferred network topology are bidirectional(ie,both positive and negative).Using the robustness property proposed by Professor Kitano to test the robustness of differential equations,we carefully remove the edges with directions one by one.Finally,a dynamic network of 12 protein interactions was inferred.Finally,this paper studies the network inference algorithm combining conditional mutual information(CMI)and path consistency algorithm(PCA)(ie PCA-CMI algorithm and PCA-PMI algorithm similar to this).We found that the Path Consistency Algorithm(PCA)produces different results depending on the order of the input variables.This is especially true when dealing with high-dimensional data.In order to solve this problem,we combined the statistical method,based on the PCA-CMI algorithm,to obtain the frequency matrix of the network edges and construct the network through many random experiments.Experiments show that the network constructed by the method of edge frequency matrix is not ideal.Then,we experiment with the conditional mutual information matrix(edge weight matrix)of the 0th,1st and 2nd order of the PCA-CMI algorithm.In the Matlab progam,we find that the method of constructing the network based on the mean matrix of the edge weight matrix(ie,the second-order conditional mutual information matrix)has higher precision.Therefore,we propose a new method for constructing networks based on the mean matrix of edge weights(2nd order CMI matrix)(referred to as EWMM).The comparison of ROC curves shows that the proposed EWMM algorithm has better performance than the PCACMI algorithm.The main innovations of this paper are as follows: First,we construct a dynamic network based on non-time series triple negative breast cancer data,which is the first dynamic network research on non-time series data.Second,when building a dynamic network,we propose a new mathematical model.Using this mathematical model,we can explore the interaction between proteins.At the same time,the mathematical model has higher flexibility in determining the protein interaction relationship.Third,based on protein data from patients with triple-negative breast cancer,we constructed static networks and dynamic networks for proteins of specific pathways.The dynamic network thus constructed has some similarities and differences compared to the network model inferred from normal cells.Therefore,the protein-protein interaction network of triple-negative breast cancer patients we have constructed has certain predictive significance for future experimental research.Fourth,based on the inference of relevance to static networks,we propose a new algorithm.The algorithm is based on the PCA-CMI algorithm.We solved the problem that the PC(Path consistency)algorithm has different results due to different input variables.That is to say,a statistical method is proposed.The mean matrix of edge weights is obtained through many experiments,and the network is inferred based on the obtained mean matrix of edge weights.The new algorithm we propose has certain advantages over existing algorithms.
Keywords/Search Tags:network inference, protein protein interaction network, Bayesian inference, path consistency algorithm, conditional mutual information
PDF Full Text Request
Related items