| The information age brings digitization and networking,the number of all kinds of information presents explosive growth,the increasing number of academic papers,on the one hand,increased the possibility of obtaining valuable content,On the other hand,it also increases the workload of discovering useful content.Innovation is the soul of academic papers,an important factor to evaluate the academic value of a document,and an attribute that scholars consider most when using academic papers.If the innovation points of academic papers can be presented to the users in the form of original expression of papers,the efficiency of screening and using academic papers can be greatly improved.Based on this,we select the field of library and information as the scope,take the academic value as the guidance,collect the high cited papers published by 51 university libraries in China,take the text extraction technology as the basis,adopt two kinds of extraction methods determine a set of innovative point extraction process in the full text environment.At the same time,we try to construct a specialized academic query platform by using the research results and combining the web front and rear communication technologyFirst of all,this paper explores the necessity of extracting the whole text of academic papers under the current academic environment,combs the research progress of text extraction technology at home and abroad,and compares the effects and advantages of various text extraction techniques from old to new.Based on the review of relevant literature,the innovation point sentence is defined and the classification standard is formulated.At the same time,the compound matching extraction method and the classification extraction method based on deep learning are constructed from the point of view of using environment.Among them,matching extraction creatively combines two basic text extraction methods to form a practical method of using innovative point sentence extraction to explore its effectiveness.The classification extraction starts with the characteristics of the innovative point sentence itself and takes the sentence feature extraction as the starting point.Two deep learning networks with different feature extraction abilities are selected.Two training models are combined to explore an optimal combination for practical text extraction.After that,using the information of high cited papers in the field of library and information collected in the previous research,a group of high cited papers were selected from 51 university libraries,and a total of 1319 full text were obtained by manual means.The text cleaning and manual tagging were carried out.Then,the index of evaluating the extraction effect is determined,and the validity of the two kinds of text extraction methods is verified on the data set of manual annotation.According to the test results of matching extraction method,three optimization methods are put forward from the point of view of matching process and matching result,and the matching extraction method is improved.The accuracy of the optimized matching extraction method is 78.7.The parameters of the model are adjusted and retrained according to the training results of the classification extraction method.It is found that the overall effect of the Bi LSTM model based on BERT is better and the accuracy is 77.6.Finally,4348 innovative points are extracted from the remaining data set,and the extraction results are combined with the paper catalog information for use in the construction of the platform.Finally,according to the result of text extraction,using the technology of data interaction before and after web and the construction technology of question and answer system,a practical online academic innovation point query platform is designed and deployed.The platform can search the innovation points,provide the title,time,author and so on as the key words,at the same time,it can use the matching extraction method to excavate the innovation points of the editable text.From the research results,pattern matching and classification extraction can effectively extract innovation point sentences under different usage scenarios.The pattern matching method has high accuracy and fast speed,which mainly meets the innovation point mining function of the platform.Classification extraction can extract the innovation point sentence of large scale data set,which is the basis of the platform retrieval function. |