Font Size: a A A

Antioxidant Protein Identification Based On Support Vector Machine

Posted on:2020-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y K XuFull Text:PDF
GTID:2370330578462778Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Free radicals are unstable molecules,once a free radical binds to an atom in the body,a chain reaction occurs,which also causes DNA damage in the cell,aging and various diseases.Antioxidant proteins are substances that protect cells from free radical damage.Accurate identification of antioxidant proteins is important for understanding their anti-aging effects and diseases treatment and prevention.Traditional biological methods of analyzing protein function is time consuming and labor intensive.Therefore,the development of a method for identifying antioxidant proteins is highly desirable and urgently needed.In this paper,an antioxidation proteins identification method based on support vector machine is proposed.The main work of protein sequence prediction is as follows:1.In order to effectively mine the feature information in the protein sequence,the combination of the amino acid composition and the g-gap dipeptide compositions are used to describes the feature information of the protein sequence based on the primary structure.The feature extraction method used in this paper has the characteristics of simple calculation,high efficiency and fastness,and do not need to use other information,so that the classifier can obtain better classification performance.2.The number of antioxidant proteins is balanced by the oversampling technique to be consistent with the number of non-antioxidant proteins,and the data was normalized,to eliminate the influence of category imbalance on the overestimation of classification accuracy.3.Based on the principal component analysis method to reduce the feature demension,and the 420-dimensional feature is reduced to 230 dimensions.The data is input into the support vector machine to identify the protein,and 20 experimentally proven proteins with antioxidant properties are found to verify the model and excludes model overfitting of this paper.The prediction accuracy Acc of this experiment reached 98.38%,the recall rate Sn of the positive sample reached 99.27%,the recall rate of the negative sample reached 97.54%,and the MCC value was 0.9678.The classifier's recognition performance on antioxidant proteins is better than that of existing classifiers.
Keywords/Search Tags:Antioxidant proteins, Amino acid composition, g-gap dipeptide composition, Principal component analysis, Support Vector Machines
PDF Full Text Request
Related items