| Breakthrough research plays a leading role in scientific and technological innovation,including transformative research and significant progress of gradual research.Scientific research results are often published by literature,and academic papers are one of the important presentation forms.If we can grasp the major breakthroughs reflected in the papers at the early stage of publication,it will help to promote the selection and cultivation of breakthrough research.In the past,bibliometrics results were mostly used as the criteria for judging the academic value of papers,and the evaluation of their knowledge content was neglected to some extent.In view of the necessity of breakthrough paper identification and the limitations of existing methods,this study took the biomedical field as an example,and proposed a breakthrough paper identification method based on dual features from two perspectives:the linguistic characteristics of abstracts and the external characteristics of papers.Firstly,the breakthrough linguistic features of the abstracts are extracted by manual interpretation from the breakthrough papers of the gold standard and the control group.The deep neural network is used to quantify the linguistic features.Secondly,on the basis of investigating the related literature,the external features of papers used for breakthrough paper identification are selected from the perspectives of novelty and influence of papers.Thirdly,using logistic regression,decision tree and random forest algorithm to build breakthrough paper recognition models based on dual features.Finally,the field of T cells is selected as an empirical field to evaluate the recognition effect of the recognition model constructed in this study in specific medical fields.The results show that the abstracts of breakthrough papers have the linguistic characteristics of new discoveries and innovative contributions,and authors are more likely to describe the breakthrough achievements or contributions in the purpose,result or conclusion of the abstract.The external features selected in this study are applicable to the identification of breakthrough papers,including the time novelty of references,the time novelty of MeSH word pairs,and the academic and technical impact.By comparing the performance of breakthrough paper recognition models constructed by different algorithms,it is found that random forest model is more suitable for the binary task of breakthrough recognition,and can realize the early recognition of breakthrough papers to a certain extent.Comparing the model performance of different input features,it is found that using dual features is better than using only external features. |