| The invasion of the pests and diseases of plants has become a serious problem.Especially for the crops, the invasion will not only cause the reduction of output or totalcrop failure in large area, and will have a devastating impact on the economy. Therefore,to extract the features of the known plant resistance genes by studying the interactionbetween plants and pests and diseases, then digging out the more unknown resistancegenes, cultivating the plants with the resistance function, which not only contributes tothe genetic improvement, and to the biological breeding also has an important value.The recognition of plant resistance gene with ab initio method can be formalized asa classification problem. Usually, both labeled positive and negative samples are neededto train the classifier. However, the available information is only about less manuallycurated R-genes. Usually, choosing the gene families which do not include the knownresistant genes as the training set, but that some genes may have the resistant functionwill seriously affect the results. To eliminate the low recognition rate of the classifierbrought by the fewer positive sample and the false negative samples,a novel sampleselection method is proposed according to the distance between genes and the curatedR-genes in the protein-protein interaction network. Compared with the general selectingsamples method, to build performance of the classifier trained by the selected samplesof our method is superior to the general method, it indicates the effectiveness of ourmethod. Then, combined with contribution of the characteristics of the gene sequenceand physical and chemical characteristics of their corresponding proteins to geneidentification,we extract113-dimensional features of the gene sequences and proteinphysical and chemical characteristics on the data obtained by the above method, and byexperiments to analyze contribution of each group features to the identification of theresistance genes. Finally, by experiments and theoretical analysis,from four commonkernel functions, radial basis function is selected as the kernel function of the supportvector machine, which enhances the predicting results of the sensitivity and specificity.The Web version of the resistance gene identification system provides a convenientplatform for researchers, also contributes to our exploration and research of theresistance genes. |