| In the study of biological information, Gene regulation network analysis and reconstruction forecast is one of the important areas of research. From gene expression data to potential gene regulation model, which is that we understand the nature of life has some of the important role of reference. In biological systems, can predict the success of gene regulation network, to more accurately predict the gene function information mining, genetic relationship between the synergy of information, gene expression - empty of information and information transmission network in the Information also has a positive important practical significance. Therefore, the gene regulatory network research has attracted many domestic and foreign research teams' attention. However network of gene regulation in many areas is still in the exploratory stage and try, database building and the high-quality software development is an urgent and arduous task. From the many difficulties: Noise and robust data system; Data collection and analysis of the reliability and data sets can be capacitive; study sought to establish suitable for large-scale model of the network led model is too rough, not very well reflect the true; Trying to establish an efficient simulation model and as the result set of parameters for calculating the scale and complexity of the model surge to the complex or difficult to deal with the extent to narrow the scope of modeling to model reflect the narrow scope of the network. Thus it makes research difficult to achieve the desired objectives and results. Also in the simulation of the biological interpretation of results, there are some problems. The study asked to select a suitable model of good design optimization algorithm, and from a strict mathematical analysis and proved, through experiments to verify and control network simulation model of stability and reliability, and the real complexity of the model Computer calculations also made higher requirements. However, the current measurement technology in terms of biological macromolecules still can not get all biological macromolecules and related substances such data. Therefore, to the molecular biology research network, at this stage there is a lack of data problems at the same time, cells within the gene, and protein substances such as the number of very large, it is necessary to build such a huge network of the network computing performance theory and also a challenge. How to understand and explain these massive biological data and the rough there are still many challenges. But such a study to identify disease-related human genes and proteins, have a very important practical significance.This article is based done more research and experiments. For example, the use of experimental verification in the efficiency of the algorithm, computer calculation of capacity constraints, we only use a small-scale model of regulation and control network as a target for research. In addition, this experiment is not perfect, by the computer with my ability to acquire knowledge of the extent of restrictions; the experimental data is not necessarily the best. So by further adjustment of the parameters, the forecast accuracy and speed may also increase.The full text is divided into five most elaborate:The first part of the introduction, mainly on the background papers on the status and significance, and the related knowledge of molecular b iology. This paper also briefly introduced the major work. This introduce d the main molecular biology such as DNA, chromosomes, proteins, gen es, gene regulation networks and the basic concept of the role.The second part is the problem, and from the biological perspective on the genetic regulation of the basic concept of network, including its biological background, the expression of the gene networks and control networks of gene transcription study two important aspects. Through the basic biology of the gene transcription regulation and control network understanding, we established a network of the concept. In addition, the current biological information for the conduct of Jurists to network analysis and reconstruction of the biological tools used by the computer tools and made a brief introduction.The third part is about genetic data acquisition and control of the network algorithm, on the basis of the establishment of Pearson similarity measure based on the genetic regulation and control network. We first introduced the original sources of data, because the raw data with the noise, including a large number of the missing data, also in specific biological processes in a large number of genes are not expressed, and therefore the need for the original gene expression data filtering, Select the largest variance to build 2,000 data matrix 2000 * 77 matrix as experimental data, and standardization of data, all the data conversion to the same area. The standardization of data can be standardized, making the average of each gene expression profiling to 0, standard deviation of 1. Pearson then called the matrix algorithm, by Pearson correlation coefficient matrix, after the two from small to large admission threshold, contrast different threshold in the case of the correlation matrix, similar to choose the appropriate measure through Network and the largest cluster WO change, when we choose to increase the value WO, the largest cluster in the network the size of the corresponding reduction in, and when WO to a certain time, the network of the size of the largest cluster No dramatic changes in the rate, we choose this as the ultimate value of the threshold WO.Part IV is divided into the gene regulatory network analysis of the statistics. First to introduce a complex network of statistical properties of the three basic concepts: the average path length, clustering coefficient and distribution. Network of gene regulation of the statistical calculation of the average first shortest path to calculate the maximum degree node degrees, the threshold corresponding output, the average shortest path to a maximum degree of node degree, and then the establishment of their clustering coefficient, through the clustering coefficient, we understand that the yeast-control network with small world characteristics. Nodes in the network of distribution, we know that yeast gene regulation networks with scale-free properties. This is in line with the existing biological knowledge, that is, in a biological process, the real participation of the gene is a small and highly relevant.Part V is to sum up some. We simply introduced the work done by the work of the significance of the problems that exist in the future and do the necessary work. |