| Long non-coding RNAs(lncRNAs)are involved in many important biological activities,including imprinting genomic loci,adjusting chromosome conformation,regulating allosteric enzyme activity,and play an indispensable role in the occurrence and development of human diseases.Although the functional mechanisms of most lncRNAs are not clear,studies show that lncRNAs often function by physically binding related proteins,so it is important to study lncRNA-protein interactions(lncRPIs)for a deep understanding of the functional effects of lncRNAs.Compared with the high cost of the wet experiment-based methods,the computational-based methods can distinguish the lncRNA-protein interaction relationship faster,saving human and material resources.This study proposes a deep learning model LPI-CSFFR combined serial fusion with feature reuse to predict lnc PRIs.The fused features include the original sequences,secondary structures and physicochemical properties.The original sequences and secondary structures are encoded by a joint1-K-mer coding method,and the physicochemical properies are extracted by the Pse-in-One tool.The serial mode is adopted for the overall network to connect each feature tensor.In the input process of each tensor,the consecutive operation of convolutional operation,pooling operation,and densely connected convolutional block(Blocks)operation is used to transmit the data.The sequence tensor of protein and lncRNA is input first,the physicochemical property tensor is input after the above-mentioned complex operations,and the secondary structure features are spliced into the end of the feature maps.After multi-layer feature fusion calculations,the generated high-level representation is flattened(flatten)to obtain the overall feature map of the mini-batch samples.The feature map is weighted and summed with two fully connected layers,and the classification is performed with softmax activation function.In the convolutional process,batch normalization is used to accelerate the convergence of the training process,while the neural units are randomly discarded by dropout to prevent the overfitting of the network.Five-fold cross-validation results show that LPI-CSFFR achieves excellent performance on the RPI1460 and RPI1135 datasets with an accuracy of 83.7% and 98.1%,respectively.Compared with previous methods,the results based on the RPI1460 dataset highlights the good classification performance of the LPI-CSFFR.Furthermore,to test the generalization of the model,we independently tested samples of five model organisms on the RPI9369 dataset.The prediction accuracy for H.sapiens,D.melanogaster,S.cerevisiae,and E.coli is97.5%,96.5%,96.5%,and 96.2%,respectively.The prediction accuracy for the M.musculus is the highest,99.5%.The subsequent constructed interaction network of M.musculus visually showed that multiple hotspot proteins were accurately captured by LPI-CSFFR.This result provides guidance for understanding of the biological signaling pathways of lncRPIs and disease-related studies.The overall results indicate that LPI-CSFFR is of high accuracy and good robust performance for predicting lncRPIs.The primary source code and the datasets used are stored at https://github.com/Jianjun Tan-Beijing/LPI-CSFFR. |