| With the rapid development of large-scale pre-trained language models(PLMs),the research and application of intelligent question answering(QA)systems have attracted a lot of attention from academia and industry.As a basic research content of intelligent QA task,machine reading comprehension(MRC)has also been studied in the field of natural language processing.In terms of engineering application background,as an important transportation infrastructure,bridge plays a key role in economic and social development.Therefore,based on the textual data from the electronic reports in the field of bridge management and maintenance,carrying out the research on MRC approaches with domain characteristics,and promoting the intelligent and innovative development of information interaction in such field are the key problems that need to be solved urgently and have significant application prospects.Based on the important requirements,this paper takes the QA corpus in the field of bridge management and maintenance constructed by our research group as the data source.Fully considering the practical scenarios,this paper focuses on the key problems of fewshot MRC for bridge management and maintenance,and carries out the following research:(1)Based on the information interaction requirements of bridge management and maintenance,the characteristics of domain-specific text and MRC task are analyzed.The textual data in such domain has obvious characteristics in terms of content organization,domain terminologies and grammatical expression.Meanwhile,there are a large number of professional vocabulary descriptions in domain-specific paragraphs and question sentences,which poses a great challenge to the context understanding ability of the MRC models.In addition,due to the data confidentiality restrictions,it is difficult to obtain large-scale unlabeled corpus and to build a pre-trained language model suitable for the field from scratch.On the other hand,since the construction of labeled corpus requires the collaboration of domain-specific experts and the participation of a large number of annotators,the relative lack of labeled corpus also brings great challenges to the adaptive fine-tuning of pre-trained language models for domain-specific MRC task.(2)For the practical scenario of few-shot constraint in the field of bridge management and maintenance,a few-shot MRC approach based on self-supervised post-training of PLMs and heuristic Prompt tuning is proposed to fill the gaps between the general PLMs and the domain-specific MRC task.We first define several Prompt templates according to the characteristics of domain-specific context and MRC task.After that,self-supervised post-training on the general-domain PLMs is performed using the unlabeled textual data of bridge management and maintenance to construct the post-trained model with domain adaptability.Based on the post-trained model,the domain-specific QA corpus are taken as input,and the corresponding Prompt template is heuristically matched via question classification and its suffix identification.Finally,the heuristic Prompt templates and the domain-specific context are concatenated and input into the post-trained model for finetuning to realize the fine-grained QA.The experimental results show that the few-shot MRC approach based on self-supervised post-training and heuristic Prompt achieves better performance than baseline models.In the setting of 1024 fine-tuning samples,the F1 value and EM value are 86.38% and 72.9%,respectively.(3)Aiming at the problems that the heuristic Prompt template is mainly defined manually with the expert experience,and the self-adaptability of the post-training approach still needs to be improved,a novel few-shot MRC approach with data augmentation pre-tuning is proposed,and a domain-specific few-shot MRC model with better domain-specific and task-aware adaptability is constructed.On the basis of the general PLMs,this approach first uses unlabeled textual data of bridge management and maintenance to automatically generate domain-specific questions and corresponding answers,and combines text paragraphs to construct pseudo-labeled data for MRC task.Then,the general PLM is pre-tuned based on the pseudo-labeled data to improve the domain and task adaptability.Finally,in the few-shot settings,the model is fine-tuned by using the real labeled QA corpus in the field of bridge management and maintenance.Experiments show that the performance of the constructed model is better than other baseline models,and it is better adapted to the actual application scenarios in the field of bridge management and maintenance.With 1024 fine-tuning samples,the F1 value and EM value are 86.42%,74.65%,respectively.In summary,this paper deeply integrates MRC research and the urgent requirements for bridge management and maintenance.Based on the analysis of the characteristics of the domain-specific text and the MRC task,a few-shot MRC approach via post-training and prompt-tuning,and a few-shot MRC approach via data-augmented pre-tuning are proposed.This paper improves the performance of few-shot MRC for bridge management and maintenance.The proposed approaches also provide reference for the research of few-shot MRC in other fields. |