| Machine reading comprehension is an important topic in the field of artificial intelligence and natural language processing,it is also the key technology of question answering which has important research significance and wide application value.With the progress of technology,researchers have proposed a variety of reading comprehension models,which can surpass human beings in some data sets.However,there are still a large number of unsolved problems to be explored in the field of machine reading,and the further optimization of machine reading comprehension model is of great significance to both academia and industry.Thanks to the improvement of computing speed,large-scale pre-training language model has been proposed and has outstanding performance in many natural language understanding tasks.However,the application of these models in extractive reading comprehension is still simple,which can be optimized in terms of model structure and data composition.This article starts with a pre-trained language model named ALBERT,and optimizes it from both the model and the data,so that it can perform better in the extractive reading comprehension task.First of all,this paper verifies the performance of ALBERT in extractive reading comprehension.The characteristics of ALBERT are analyzed,and experiments are carried out on SQu AD 2.0,News QA and QUOREF data sets.The experimental results show that the model has a good performance on the data sets.It can surpass human on SQu AD and News QA,and achieve state-of-the-art on News QA and QUOREF.Then,This paper optimizes ALBERT from the perspective of modeling.In order to solve the problem that ALBERT lacks explicit processing of the relationship between paragraph and problem,an interaction layer based on bidirectional attention is introduced.In order to solve the problem that two boundaries are independent of each other in the decoding,a connection between answer boundaries is established.This model is called ALBERT-CA.The experimental results show that ALBERT-CA has obvious advantages over ALBERT in extractive reading comprehension.Additional ablation experiments have proved that the two main optimizations in the model are both effective,and it isnecessary to add original information to the bidirectional attention.Finally,this article optimizes from a data perspective.From a observation that there can be many answers in a extractive reading comprehension dataset,data augmentation based on soft label can be used to improve the model Two kinds of soft label construction methods,label smoothing and distribution prediction,are tried to give a certain probability to other words in the paragraph.By using the enhanced data set to train ALBERT-CA,we can get a results that two methods are both helpful.The soft label obtained by label smoothing cannot identity other correct answers,but it is still useful,which we think shows that multi-answer is very common in reading comprehension.A more accurate distribution prediction can further enhance the performance of the model.With the optimizations mentioned in this paper,the improvement of the model is very obvious,which can provide higher support for the follow-up research.These ideas are not limited to ALBERT and can also be applied to other models,which is helpful to future researches. |