| The new demands and the improvement of tools cover the basic activities of society.We can regard new requirements as problems and regard the improvement of tools as methods.The same is true for scientific research,a practical activity from asking questions to designing methods and using methods to solve problems.Thus,problems and methods are an essential part of scientific research.Scientific researchers usually use the form of academic papers,monographs,patents,or reports as the carrier for communication and dissemination.The vast array of academic literature is an explicit expression of the tacit knowledge of scientific researchers.We can realize the organization and management of academic literature by extracting the "problem-method" relations.The "problem-method" relation extraction in academic literature is the practice of knowledge discovery,classification,and organization in management science and engineering.It is also the basis of scientific research evaluation.Discovering problems and methods from academic literature and digging out their relations has essential theoretical and practical significance for the in-depth knowledge management and evaluation study.Academic papers are the main form of scientific research results.Therefore,we take the academic paper as the research object.The rapid development of the Internet has promoted the digitization and open access of academic papers,which provides a foundation for the text extraction of academic papers.The "problem-method" relation extraction task of academic papers is different from the general text extraction task.First,the task needs to define the problem,method,and their relations.Secondly,academic papers have expression characteristics such as scientificity and logic.Furthermore,the abstract and full-text of academic papers have differences in content level and difficulty in processing.The existing research on the automatic extraction of the "problem-method" relation of academic papers ignores the above points to varying degrees: First,existing studies seldom integrated with theories in the philosophy of science when defining methods,problems,and their relations;Secondly,some studies have noticed the differences between academic texts and general texts in the automatic extraction task.Thus,they improved the performance of this task by using pretraining language models,such as SCIBERT,trained on a large number of academic papers.In these studies,they seldom designed and optimized the automatic extraction algorithm by analyzing the unique difficulties in this task;Thirdly,the full-text extraction of academic papers has become a research trend.In the academic paper,the abstract can be easily processed and has refined content,and the full text is difficult to process and has diverse content.In this situation,there is a lack of research to explain the value of extracting the "problem-method" relations from full text.By taking the above three points into account,we first define the problem and method and the relations between them,then design automatic relation extraction methods based on the characteristics of academic papers,and analyze and compare the extracted result of the abstracts and full texts of academic papers.This thesis includes the following four aspects:(1)Because of the current situation that the definitions of the problem,method,and relations between them in academic papers are seldom integrated with theories such as problemology in the philosophy of science,we use theory,literature,and data research to design a two-level "problem-method" relationship system.The first level is the general scientific research "problem-method" relationship framework,and the second level is the field-specific "problem-method" relationship framework.The general "problem-method" relationship system is based on common ground in different scientific research fields.Domain-specific "problemmethod" relation system is based on the general system,which is subdivided and expanded by combining the characteristics of specific research fields.According to the principle of realizability,we select natural language processing as the research field.(2)Since existing studies seldom design algorithms for the unique difficulties in the "problem-method" relation extraction task,based on Dewey’s five-step thinking method,we first analyze the unique difficulties in the "problem-method" relation extraction task in the natural language processing domain.Then,we design and optimize the algorithm for "problemmethod" relation extraction and finally verify the effectiveness of the algorithm proposed in this thesis.To reduce the noise caused by the full text,we divide the "question-method" relation extraction task into three steps: problem and method sentence classification,problem and method word identification,and "problem-method" relation extraction.We find that the difficulties of automatic sentence classification include formulaic expression influence and contextual information dependence.This paper proposes a data augmentation method based on formulaic expression replacement and a model with contextual information to alleviate these two difficulties.The difficulties of word recognition include formulaic expression influence and boundary recognition errors.This paper proposes a data augmentation method based on formulaic expression replacement and a model combining pointer networks to alleviate these two difficulties.The difficulties of relation extraction include the wrong coreference resolution,wrong relation direction prediction,and the semantic understanding problem caused by the imbalance of the quantity in relation categories.Given these three difficulties,we first use the Transformer module to learn the relations between words and discover the coreference better.Then,we use a Cross-Encoder to predict the direction of the relation extraction results.Finally,we use active learning to expand the dataset and then enrich the semantic information of each relation category.This paper verifies the effectiveness of methods on two manually labeled English experimental datasets.(3)To analyze the value of the full text in the "problem-method" relation extraction task,we extract,analyze,and compare the results of "problem-method" relation extraction in the full text and abstracts of academic papers in a specific field.We took the field of natural language processing as the research object and selected the ACL conference’s essay collection from 1979 to 2020 as the corpus for analysis.The ACL corpus is an English corpus.We use the "problemmethod" relation extraction model proposed in this paper for relation extraction.Moreover,we divide the corpus into three periods,named the semantic period,the traditional machine learning period,and the deep learning period.To compare the results of relation extraction from abstract and full text,we construct abstract and full-text corpus.Then,we design numerical indicators and content indicators to analyze and compare the extracted results.(4)We construct a "problem-method" relation automatic extraction system and a "problemmethod" relation analysis system,respectively.An automatic extraction system can be used to mine problems and methods in academic papers,and an analysis system can provide a basis for knowledge organization and evaluation in academic papers.The automatic extraction system has three modules: the problem and method sentences classification module,the problem and method identification module,and the "problem-method" relation extraction module.The analysis system has two modules: the relation summary module and the content analysis module of problem and method. |