Font Size: a A A

Identifying Essential Proteins Based On Domain Information

Posted on:2013-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChengFull Text:PDF
GTID:2230330374988812Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Essential proteins are the indispensable part of sustaining life activities. The researches of essential proteins provide important bases for the design of drug target, the treatment of diseases, the minimum genome in synthetic biology and so on. This paper is based on particular structural features of proteins, that is to say, based on domains which are functional, structural and evolutionary units. The unique structural features of essential proteins are analyzed. New feature for identification of essential proteins is provided. The primary research includes:For the reason that the removal of essential proteins will lead to infertility or lethality, the relationship between the types of domains (NDT) and protein essentiality are analyzed in this paper. Compared with non-essential proteins, essential proteins have more types of domains. For13species, the average NDT of essential proteins are more than1.5times of that of all proteins. PPV curves from identification with NDT show that it is not random that essential proteins have more NDT or proteins involved in more NDT tend to be essential. Meanwhile, the results of experiments show that for candidate essential proteins identified by NDT, more essential proteins are selected with high priority. The CC between protein essentiality and essentiality of identified results are higher than that of random. The number of domains and proteins length which is regarded as one factor of essential proteins doesn’t show obvious distinction between essential proteins and non-essential proteins.At the same time, the relationship between the frequency of domains (FD) and protein essentiality are analyzed in this paper. The proteins constituted by the less frequent domains are more likely to be essential. PPV curves from identification with FD show that it is not random of above phenomenon. Meanwhile, the results of experiments show that when candidate essential proteins are identified by FD, more essential proteins are selected with high priority. The CC between protein essentiality and essentiality of identified results are higher than that of random. Based on the two rules above, the weighted network of domain information is designed. The degree and edge clustering coefficient are applied to it. The experimental results show that for six evaluation measures, the identified level of weighted network are all higher than that of un-weighted network. Meanwhile, the difference between top100proteins identified by methods which are completely based on the topology is not obvious. The methods with domain information differ from that without. When domain information is introduced, more edges between essential proteins and essential proteins are included in and the number of edges between essential proteins and non-essential proteins are decreased. The integration of domain information improves the accuracy of identification of essential proteins.From a new perspective, the structural features of essential proteins are pointed out instead of features of topology protein-protein interaction network. New ideas are introduced for identification of essential proteins, construction of minimal genome and so on.
Keywords/Search Tags:essential proteins, domain, protein-protein interactionnetwork, essential genes
PDF Full Text Request
Related items