Font Size: a A A

Screening Reliable Negative Samples For Non-coding RNA-Protein Interaction Based On Reinforcement Learning

Posted on:2024-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2530307064985309Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Non-coding RNA(nc RNA)refers to RNA that does not encode protein.Up to70% of the human genome is transcribed into nc RNA.In recent years,more and more nc RNAs have been revealed to play an important role in a variety of biological regulatory processes and are highly associated with a variety of human diseases,such as tumors and cancers.Studying the interaction between nc RNAs and proteins is one of the important ways to infer the function of nc RNAs.At present,with the development of machine learning,many excellent algorithm models for predicting nc RNA-protein interaction have emerged.However,most machine learning models require positive and negative sample support,but in nc RNA-protein interaction databases,only experimentally verified positive samples are recorded,and negative samples are not recorded.As a result,supervised machine learning model cannot be adopted to solve such problems.The absence of negative samples poses a great challenge in using machine learning models to predict the nc RNA-protein interaction.Most of the existing machine learning models predicting nc RNA-protein interaction directly train unlabeled samples as negative samples.However,such unreliable negative samples can lead to model training bias.How to design an effective scheme to solve the problem of missing negative samples is a challenge and difficulty at present.The key lies in how to fuse the multi-modal characteristics of non-coding RNA and protein,and how to screen negative samples from unlabeled samples.For this,we designed a model names Capsule-LPI(Capsule lnc RNA-protein interaction)and a model names SURE(Screen unlabeled samples for reliable negative samples).The Capsule-LPI model is a multi-modal feature extraction and fusion module in the negative sample screening architecture.It is responsible for extracting and fusing multi-modal features of non-coding RNA and protein to complete feature representation.The SURE model is a negative sample screening module in the architecture.Through reinforcement learning training,an agent can be obtained to screen negative samples.Experiments show that when the proportion of positive samples in unlabeled samples is less than 20%,the accuracy of negative samples selected by SURE can reach more than 90%,and the reliable negative samples selected can effectively improve the accuracy of prediction of interaction between non-coding RNA and protein.The most important works of this study are as follows:(1)Multi-feature extraction and fusion.In this study,capsule neural network framework is used to design a deep learning network suitable for predicting the nc RNA-protein interaction.The capsule network is easy to retain and integrated with various features.The optimal feature combination is selected from the four multimodal features through ablation experiments.Experiments showed that the Capsule-LPI model performed better than the current non-coding RNA and protein interaction prediction model.It can effectively extract and fuse multi-modal features of non-coding RNA and protein,and provide reliable feature representation for negative sample screening.(2)Negative sample screening model based on reinforcement learning.Negative sample screening is transformed into Markov decision process,and a new model based on deep reinforcement learning is proposed to screen reliable negative samples from unlabeled samples,named SURE.SURE addresses the lack of negative samples in nc RNA-protein interaction tasks by two strategies.Experiments show that the SURE model can screen out reliable negative samples.In addition,SURE has certain portability.This paper introduces the performance of SURE application in text classification task for negative sample preparation.(3)Successful application case study.A good model should not only have good performance in indicators,but also be helpful to solve practical problems.In this study,Capsule-LPI model is used as a case study to explore the relationship between nc RNAs and diseases.The case study shows that the Capsule-LPI model has practical effectiveness and can be helpful for scientific research.(4)Online service platform.Two online service platforms for predicting nc RNA-protein interactions are developed,as well as negative nc RNA-protein interactions samples are supplemented.
Keywords/Search Tags:Non-coding RNA, ncRNA-protein interaction, sample imbalance, capsule network, reinforcement learning
PDF Full Text Request
Related items