Font Size: a A A

HIV Protease Cleavage Site Prediction Based On Feature Selection And Biological Similarity

Posted on:2014-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:G T ChouFull Text:PDF
GTID:2234330398950533Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
HIV is the pathogen of AIDS. Understanding the specificity of HIV-PR is crucial for designing HIV-PR inhibitors. Prediction of HIV-PR cleavage site using pattern recognition method can help understand the specificity of HIV-PR. In this paper, feature selection is conducted for investigating the important sites that determine the cleavability of an octapeptide. The prediction performance is improved with good generalization ability. The modeling of an octapeptide is investigated based on similarity, and a new similarity based on non-gap total sequence alignment is proposed to represent the relationship between octapeptides. Prediction of HIV-PR cleavage site is conducted based on this similarity. The content of this paper is divided into three parts, as shown in the following paragraphs.First, a feature selection method named CAFS is improved specially for the HIV-PR cleavage site prediction task in this paper. Feature selection is conducted combined with structure optimization of neural network. The improved feature selection method can reduce the dimensionality of feature space and determine the node number of hidden layer of neural network automatically. After feature selection the structure of neural network is simplified in order to improve the prediction performance. Accuracy, sensitivity, specificity, MCC and AUC are used to evaluate the prediction performance. The experiment results show that feature subsets got after feature selection obtain good prediction performance, and decision fusion of the subsets significantly improve prediction performance. The analysis of subsets shows that the sites near the scissile bond play more important roles in determining the cleavability of an octapeptide, and they are P1,P1’, P2and P2’Then, a feature selection method named BPFS is improved specially for HIV-PR cleavage site prediction. The feature space is reduced and classifier structure is simplified, thus guaranteeing the generalization ability. Also conduct parameter optimization for SVM in order to improve prediction performance. Fuse the subsets got after feature selection and conduct prediction based on the parameter optimized classifier, and this method turns out to have excellent prediction performance. The experiment results show that this method obtain better prediction ability than the state of art HIV-PR cleavage site prediction researches based on feature extraction. At last, a similarity based on non-gap total sequence alignment is proposed for HIV-PR cleavage site prediction. This similarity can perfectly describe the relationship between sequence samples. Different similarity matrix can be calculated based on different substitution matrix. Using the similarity matrix and SVM can obtain good prediction performance, which means that conducting HIV-PR cleavage site based on similarity is effective.
Keywords/Search Tags:HIV-PR, Pattern Recognition, Feature Selection, Dimensionality Reduction, Similarity
PDF Full Text Request
Related items