| At present,the mainstream literature databases at home and abroad,such as Web of Science,Wanfang,CSSCI and CNKI,have the functions of recommending highly cited and hot articles,but they do not show the recognition and recommendation functions of potential high-quality articles.Therefore,the recognition and recommendation model of potential "treasures" implemented in this paper has great application value and promotion prospect.The major research contents of this paper are as follows:(1)Design and implement the classifiers and model framework for identifying potential "treasures" of sci-tech journals based on traditional machine learning model and deep learning model,and calculate the original text and citation characteristics of articles of different influential journals in the fields of artificial intelligence,library and information,and management.The feature vector space of highquality articles in sci-tech journals is constructed by literature metrology,feature engineering and correlation analysis.(2)Adopt the random forest,decision tree and support vector machine(SVM)and probabilistic neural network,the traditional machine learning models such as depth and deep belief networks and multilayer perceptron learning model design is used to identify the potential high-quality articles automatic recognition model experiment,and build the index system based on the ROC curve and confusion matrix evaluation model recognition effect.(3)The paper makes a comprehensive measurement comparison of the similarities and differences in the bibliometric characteristics of high-quality articles,uncited articles and total sample,and explores the differences among different disciplines in bibliometric characteristics of high-quality articles.The results showed that:(1)The deep learning model has poor recognition effect on potential "treasures",while the traditional machine learning model has better recognition effect,among which the random forest model and support vector machine model have the best recognition effect on potential "treasures",and the decision tree model and probabilistic neural network model have the second best recognition effect.(2)Based on THE F1 value evaluation,the journals with higher influence in the field of artificial intelligence and library and information have better identification effect of potential "treasures".Based on AUC evaluation,the recognition effect of high-quality articles of low-influence journals in library and information field and general influential journals in management is below 70%,and all other journals are above 70%.(3)Compared with total articles,common articles and zero citation articles,the measurement eigenvalues of high-quality articles show great differences,that is,high-quality articles are mostly papers with short first response time,large scale of author cooperation,long abstract length and length,and more keywords,references and citation frequency.In general,the machine learning method can quickly identify potential "treasures" in sci-tech journals,and improve the automation degree of recognition of potential "treasures",and provide theoretical reference and method support for automatic recognition,dissemination and utilization of potential high-quality articles in sci-tech journals.The innovations of this paper are as follows: First,based on the original and citation library of Web of Science,a sample library of original text-citation analysis,which contains the bibliometric characteristics of literatures and the annual distribution of citation frequency of literatures,is constructed to enrich the data sources for scientific research.Second,the scientific community has recognized highly cited and hot literature characteristic vector as a reference standard to identify potential quality literature integrated use of bibliometrics method,correlation analysis and characteristics of the engineering construction of sci-tech periodical high-quality articles characteristic vector space,and using machine learning method is used to identify the potential "treasures" experiment,for the high value of literature identification research provides a new perspective.Third,introducing such machine learning methods as random forests,decision tree and support vector machine(SVM)and probabilistic neural network,deep belief networks and multi-layer perceptron to solve the model building,training and test application problems in the process of identifying potential "treasures" from massive literature,and realizing the interdisciplinary integration of the machine learning model and the application of identifying potential high value articles.Which provides a new way and method for the study on identifying potential high-value literature.This study enriches and expands the theoretical and methodological system in the fields of Information Resources Management,Library information and archives management,scientometrics,evaluation of science and technology,and excavates and utilizes the scientific value of potential "treasures" from massive literature,and plays driving role of potential "treasures" in development of science and technology to a greater extent. |