| As one of the optional technical schemes of question answering system,machine reading comprehension adopts multi-layer network to fully understand the text semantics,deeply excavate the relationship between questions and paragraphs,and accurately return the answers.With the continuous development of machine reading comprehension,the accuracy of the answer returned by the question answering system based on machine reading comprehension is close to the level of human experts.At present,it has been widely used in the fields of intelligent education,question answering assistant and so on.In the process of working in the field of petroleum exploration,operators need to sort out the best answers from a large amount of unstructured text when they encounter operational problems.However,the current machine reading comprehension model can not effectively deal with a large amount of paragraph information,and does not fully mine the problem and paragraph interaction information,resulting in the unsatisfactory effect of machine reading and understanding in the actual application of petroleum exploration question and answer.In view of the above situation,this paper focuses on multi paragraph preprocessing and problem paragraph interactive expression.The specific work is as follows:Firstly,a candidate paragraph extraction model based on Transformer is proposed.In terms of model word embedding,aiming at the ambiguity of cw2 vec word vector model in learning Chinese stroke structure,this paper adds n-tuple stroke information feature to position information to improve the expression effect of the model on words.In terms of feature extraction of the model,the sparse matrix idea is used to reduce the computational complexity of the self-Attention matrix of the depth feature extractor Transformer,so as to improve the calculation speed of the model.Secondly,the defects of the baseline model Bi-DAF are improved to improve the performance of Bi-DAF model.In terms of model embedding expression,aiming at the problem that Bi-DAF model can not effectively deal with polysemy of one word,this paper integrates the Chinese pre training language model Ro BERTa-wwm-ext to enrich the semantic representation of words.In terms of the problem and paragraph interaction of the model,aiming at the lack of reasonable relevance between the problem and paragraph vector in calculating the similarity matrix,this paper proposes a word frequency similarity algorithm to strengthen the relevance and interaction between the problem and paragraph;When interacting with a problem and a paragraph,the redundant information in the interaction vector will interfere with the model.In this paper,the self-gating mechanism is used to filter the redundant information;At the same time,in view of the lack of paragraph context information after the interaction between questions and paragraphs,this paper calculates the attention of the paragraph itself to provide context information for the interactive expression of questions and paragraphs,so as to improve the accuracy of the model in predicting answers.Finally,this paper takes the improved candidate paragraph extraction and machine reading comprehension model as the core of the petroleum exploration question answering system,and realizes the question answering with NAO robot as the carrier.The question answering system framework and system flow are designed.By deploying each module,a complete question answering system for the field of oil exploration is integrated,and the machine reading comprehension question answering technology is applied to practice.The experimental results show that the question answering method of machine reading comprehension proposed in this paper is feasible. |