| Identifying the disease-related genes of important human diseases from genomics and proteomics can provide valuable clues for the discovery of potential therapeutic targets. However, discovering the disease-related genes by traditional biological experiments methods are usually laborious and time-consuming. Therefore, it is necessary to develop a powerful computational approach to improve the effectiveness of disease-related gene identification. In this study,674 multiple sequence features of known disease-related genes in 62 kinds of diseases were extracted. Then the features were further optimized by one of the three methods, mRMR, usage bias and F-statistic Algorithm. Finally the selected features were analyzed for disease-related genes prediction. The leave-one-out cross-validation tests demonstrated that 55% of 373 disease-related genes could be ranked within the top 10 of the prediction results, which confirmed the reliability of this approach based on sequence features.Breast cancer is one of the most common malignant tumours in women, a serious threat to the health of them. The offspring of breast cancer patients has a higher incidence rate of 10%. Gene therapy is an advanced cancer treatment, and it has been effective in the treatment of breast cancer. Disease genes are important for gene therapy. So the study of breast cancer related genes prediction is very meaningful. In this study, a multi-feature fusion method integrating gene sequence features, GO and DO terms was established based on 31 breast cancer related genes. The results showed that 22 genes were ranked top five, demonstrating that the system has good feasibility. Finally, the SVM was integrated with the prediction accuracy of 74.19%. The breast cancer gene prediction system, BCPred, and a web service were available. |