| The rapid development of natural language processing is attributed to the emergence of large-scale pre-training language models.In related fields,there has even been a phenomenon where machines surpass human performance.However,researchers have not come to a consensus on whether machines really understand human language and how machines can achieve such good performance.As a subtask in the direction of natural language understanding,the Machine Reading Comprehension(MRC)task integrates various basic tasks such as part of speech analysis,named entity recognition,relationship extraction,anaphora resolution,and textual entailment to comprehensively measure the ability of a machine to understand human language.Due to the diversity and complexity of this task,it has attracted more researchers’attention in recent years.The span-based reading comprehension task is one of the most basic tasks in reading comprehension tasks.This task requires the model to extract the answer to the corresponding question from a given context,where the answer is a continuous string in the context.Although some models have outperformed human performance in the English span-based reading comprehension task,the interpretability of the model is not high in this task,that is,the model cannot provide relevant explanations for the answer.Among them,the research on Chinese span-based reading comprehension tasks is relatively lacking compared to English,and there is still a lot of room for improvement in the performance of related models.Undoubtedly,evidence plays an important role in explainability,but there is currently a lack of accurate human-labeled evidence to train models,and manual labeling will incur a huge cost.The work of this study mainly includes the following aspects:(1)This study proposes a Semantic Textual Similarity(STS)optimization model.Firstly,the label reconstruction method is used to calculate the fine-grained loss.In order to alleviate the problem of insufficient similarity corpus,this study uses the data augmentation method to expand the corpus,and obtains the Chinese version of the STSb dataset through data crawling,that is,Chinese-STSb dataset.This study uses the T5(Text-to-Text Transfer Transformer)model to verify the effectiveness of the proposed method.In the English STS task,under the Spearman correlation coefficient metric,the performance improvement was obtained by 1.0%and 0.6%in the development set and test set of STSb,and 1.0%and 0.6%in the development set and test set of SICK,respectively.In the Chinese STS task,under the Spearman correlation coefficient metric,0.4%and 0.9%performance improvements were obtained in the Chinese-STSb development set and test set,respectively.(2)This study proposes an interpretable reading comprehension framework based on T5(Text-to-Text Transfer Transformer)model,that is,the InterMRC framework,which can provide answers to questions and evidence of their interpretation at the same time.The framework consists of an evidence-extraction module and a question-answering module,in which both the evidence-extraction module and the question-answering module are composed of a shared encoder and an independent decoder.In order to train the model in a differentiable way,this study uses Gumbel-Softmax technology to concatenate the evidence-extraction and question-answering modules.In addition,this study proposes a threshold-based method to filter out some abnormal evidence loss and reduce the impact of the wrong evidence generated by the model on the model training.With the help of the STS optimization model,this study proposes a more accurate evidence annotation method,and uses this method to annotate the training evidence in the span-based MRC task.This study trains corresponding reading comprehension models for Chinese and English corpora.Finally,the effect of the model is verified from two aspects of evidence and answer generation on the CMRC2018 and SQuAD datasets respectively.In the English task SQuAD1.1 dataset,compared with the previous optimal model,the base-level T5-InterMRC model proposed in this study has improved by 1.6,4.1,and 4.1 percentage points under the F1answer,F1evidence,and F1overall metrics,respectively.The large-level T5-InterMRC model has improved by 3.3 and 1.1 percentage points under the F1evidence and F1overall metrics,respectively.Compared with the previous optimal model in the Chinese task CMRC2018 dataset,the Randeng-T5-InterMRC model proposed in this study has improved by 6.0,4.7 and 7.5 percentage points under the F1answer,F1evidence,and F1overall metrics,respectively. |