| Protein and proteomics research of animals, plants and other eukaryotes is becomingincreasingly important in the post-genomic era. Due to the completion of a variety ofbiological gene sequencing project including apple, grape of fruit trees, the focus of researchbegan to move direction to determine the gene function of the protein product. Proteinsubcellular localization of fruit trees is an important research content of proteomics, cellbiology and molecular bioinformatics of fruit trees. The realization of the biological functionof the fruit protein molecules is closely related to the biological processes of metabolism,signal transduction, and so on. On the other hand, the fruit protein molecules must be in aspecific subcellular region to exercise its biological function. For further study of themolecular function of this protein is essential to obtain the position information of itssubcellular for unknown function fruit protein. Protein subcellular localization information ofa fruit obtained through biological experimental techniques is the usual practice, but thispractice a longer time-consuming and high cost of experiments. For large-scale proteinsubcellular localization information in a short period of time due to the rapid growth ofprotein sequences of the fruit trees (for example: Apple protein genome-wide protein cellularlocalization information), can only rely on bio-information technology means to accomplish.On the other hand, bioinformatics research can be divided into three areas from theperspective of biological data: the generation and management of a large number of biologicalsequence data, the use and analysis of biological data, and the research and development oftool for biological data analysis platform. Since the generation of a large number ofbioinformatics data and the rapid development of life sciences, either from research orproduction practice, need tool of biological data analysis platform which meet the demand ofpeople. In some research tool of biological data analysis platform even become a bottleneckrestricting depth study of the problem. Meanwhile, the research and development of tool ofbiological data analysis platform often need knowledge which come from biology,mathematics, physics, chemistry, information science and other areas, which also increasesthe complexity of the research and development of tool for biological data analysis platform.Therefore, it is necessary to carry out in-depth research in tool for biological data analysisplatform of fruit trees. It also has important practical application value, which is one of thepurposes of our study. In this paper, based on quantum algorithms, the issues of biological informationtechnology of PCD protein subcellular localization prediction and the realization of appleprotein subcellular localization prediction are conducted in-depth analysis and research.Combined biophysical and physical knowledge, and specific solutions and implementationare put forward. The main work and innovation are summarized as follows:1. The departure from composition of protein amino acid sequence and the use of the ideaof physical granularity, the concept of granularity of amino acid sequence of protein isproposed. The amino acid sequence of protein is analyzed by protein granularity. Theconcepts of protein granularity order, protein granularity bound, protein granularity limit, andprotein granularity increment are given respectively. And we found some useful phenomenon:protein granularity is uneven distribution along the sequence of protein; each protein sequencehas its own protein granularity limit; for all protein, each protein granularity has a commonbound. In terms of the predictable application of protein, it also can be drawn: proteingranularity include the amino acid composition information, the sequence-order information,the same amino acid ‘neighbor’ information, and the sequence length information. In thispaper, a concrete construction method and related parameters are described in detail for howto use the theory and knowledge of protein to construct feature vectors of the proteinsequences. According to the information of protein granularity increment, standard data setsof protein secondary structure classes and sub-chloroplast localization of plant protein havebeen predicted. The better results than their predecessors are obtained, which further illustratethe protein granularity is useful indicator reflects the protein attribute.2. The ZD98, ZW225and CL317apoptotic protein standard datasets are selected. Usingprotein granularity to extract apoptotic protein sequence feature, the38-dimensional proteinsequence feature vector is obtained. The apoptotic protein subcellular localization predictionis conducted by improved quantum algorithm (QNN). The overall prediction accuracyachieved87.8%,83.1%and85.5%, respectively. The prediction accuracy is equal to or higherthan the prediction accuracy of the original author, indicating that protein granularity methodcombined with QNN for apoptotic protein subcellular localization prediction is valid.3. Based on the apple genome-wide protein sequences which have been published, proteingranularity feature vectors of the apple genome-wide protein sequences are obtained. Featurevectors of the apple genome-wide protein sequences such as second-order protein granularitycomposition, third-order protein granularity composition and integration of multi-granularityspace are obtained. Then according to wave function superposition of quantum mechanics, anew quantum algorithms (QSVM) is developed.The protein subcellular localization prediction of63,541amino acid sequences of the apple genome-wide proteins have been conducted. Thecorresponding results of the protein subcellular localization prediction are presented. Theapple genome-wide protein subcellular sites database1is obtained.4. A high-quality plant protein dataset of protein multi-localization constructed by Chou isselected. In this paper, the respectively processed prediction mode is presented and the multi-tagged protein and single-tagged protein are predicted respectively. At the same time the GOannotations are used for feature extraction of protein sequences. The predictions achievehigher prediction accuracy and find a new protein localization prediction method.5. Based on the apple genome-wide protein datasets, the GO annotations are used forfeature extraction of the apple protein sequences which have the GO annotations.A newquantum algorithms (SQSVM) combined with the proposed theory and knowledge of theprotein granularity, the protein subcellular localization prediction of15297amino acidsequences of the apple genome-wide proteins which have the GO annotations have beenconducted. The corresponding results of the protein subcellular localization prediction arepresented. On this basis, the apple genome-wide protein subcellular sites database2isconstructed.6. Based on the conclusions of this paper, as subcellular localization websites ofbiological data analysis platform-apple protein subcellular localization system and plantprotein subcellular multi-localization have been built. Websites will be launched to providefree services for Chinese and foreigners. |