Research On Feature Processing Algorithms In Protein Sequence Recognition

Posted on:2023-12-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L M Chao

Full Text:PDF

GTID:1520307319494374

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Feature processing is an important part of identifying protein sequences based on machine learning methods.Feature processing provides a strong reference and support for traditional experiments by generating optimal feature sets,improving recognition accuracy and discovering important feature segments.This thesis studies and proposes three protein sequence feature processing algorithms from the perspectives of multi-task learning,combinatorial optimization and directed graph decomposition:(1)Multi-task protein feature selection algorithm based on data set structure information: Aiming at the optimal feature selection problem in the construction of protein sequence recognition model,a multi-task feature selection algorithm is proposed.In the feature selection process,the algorithm constructs multiple SVM models with different objective functions according to the data set structure information,then trains and optimizes the models through parameter sharing to determine the optimal feature set.The algorithm obtained the recognition effect of cell lyase with Accuracy,Sensitivity,Specificity,Matthews correlation coefficient and AUC values of 0.93,0.853,0.948,0.775 and0.9,respectively,under the leave-one-out cross validation.(2)Protein feature subset search algorithm based on elimination strategy: In order to prevent the problem of feature combination explosion,two subset search algorithms based on elimination strategy are proposed,namely the subset search algorithm based on direct elimination and the one based on cache elimination.Subset search algorithms,they all use the elimination strategy to find a new feature combination method to avoid the artificial factors in the current mainstream feature selection methods and the drawbacks of relying too much on the feature sorting results.The algorithm obtains a high model evaluation index on the 21 feature ranking data with low dimension optimal feature set.(3)Protein feature ranking algorithm based on ranking integration strategy: Based on the quantification of the global and local ranking factors of the basic ranking,a feature ranking integration algorithm based on weight quantification is proposed.Specifically,according to the central limit theorem,based on the distribution of feature score data in the basic ordering to be integrated,its normality is quantified to generate weights for the basic ordering.Then generate a weighted directed graph and use Hodge Rank to obtain the final global ranking.Through 56 experiments,it is proved that the performance of the algorithm in 2/3 experiments is better than that of similar comparison algorithms.The three proposed feature processing algorithms all aim at generating optimal feature sets and training high recognition rate models.Feature selection is based on feature ranking,and feature subset search is an important part of feature selection.They can address the problem of feature processing in protein sequence recognition,either individually or in concert.

Keywords/Search Tags:

protein sequence recognition, feature processing, multi-task learning, subset search, ranking integration

PDF Full Text Request

Related items

1	Research On Protein Fold Recognition Based On Multi-view Learning Algorithm
2	ECG Biometric Recognition Based On Feature Learning And Multi-feature Fusion
3	Sparsity Optimization Study In Multi-task Feature Learning And Disease Classification
4	Research On Thermocline Processing Method Based On Deep Learning
5	Pattern Analysis And Recognition Of Image-based Protein Subcellular Location
6	Research On Multi-task Learning Change Detection Method And Application Based On Spatio-temporal Features Fusion
7	Research On Multi-task Learning Models For Survival Analysis With High-Dimensional Censored Data
8	Research On Biological Sequence Analysis And Rna-binding Protein Recognition Based On Sequence Feature
9	Multi-task Learning And Its Application In Spectral Multivariate Calibration
10	Prediction Method Research Of Special Protein Recognition Based On Protein Sequence Information