| The acronym is often used in scientific papers,patents,books,and other scientific documents to abbreviate related methods and professional vocabulary,simplify expressions and save space.However,with the growth of scientific literature,the number of acronyms shows an exponential growth trend.The same acronym may have completely different interpretations,which brings severe obstacles to understanding scientific literature.Therefore,research on machine understanding of acronyms is of great significance in information management.The core of machine comprehension of acronyms is identifying the abbreviation’s exact meaning.According to different interpretation acquisition methods,it can be divided into two types: acronym identification and disambiguation comprehension.Acronym identification understanding means that both abbreviations and their definitions are included in the original text,and the definition is usually obtained through named entity recognition.Disambiguation comprehension means that only acronyms are included in the text,and appropriate definitions need to be selected from auxiliary dictionaries with contextual information.Existing research shows that the BERT model can obtain information from unlabeled data and has advantages in machine understanding of abbreviations.However,the existing research does not consider the feature relationship between abbreviations and paraphrases.Accordingly,based on the BERT model,this paper conducts the following research on abbreviation recognition and disambiguation understanding.(1)Acronym identification comprehension.Acronym identification comprehension is aimed at where both abbreviations and definitions are contained in the text.Machine comprehension of abbreviations can be performed only by employing named entity recognition.Although the BERT model has achieved great results on this issue,the traditional BERT model uses a random mask method and does not consider the continuity of the occurrence of abbreviations.Accordingly,this paper proposes using the Span BERT-CRF model for acronym disambiguation and comparing it with existing models through experiments to prove the method’s effectiveness.The results show that the Span BERT-CRF model is significantly better than the comparison model.(2)Dictionary-based acronym disambiguation comprehension.When abbreviations appear alone,it is often necessary to perform disambiguation comprehension based on the abbreviation dictionary.However,the existing dictionary-based disambiguation comprehension usually uses a classifier to directly score the spliced ??input of the candidate definition and the original text without considering the degree of match between the candidate definition and contexts of corresponding abbreviations correspondence.Based on this,this paper proposes to use candidate paraphrases to replace acronyms in the original text to obtain a new sentence set.And then use the BERT Siamese network to encode the original sentence and the new sentence or candidate interpretation simultaneously.Therefore the model can learn the paraphrase features from the perspective of the new sentence and the candidate paraphrase itself.The experimental results show that the method proposed in this paper has a higher F1 value and better effect than the existing state-of-the-art model.At the same time,this paper proposes verifying the model’s robustness by expanding the candidate paraphrase set.The results show that simultaneously acquiring features from the perspective of new sentence and candidate paraphrase is more stable.In general,this paper proposes new solutions for the two critical tasks of acronym recognition disambiguation comprehension,aiming at the problem of insufficient feature information acquisition in existing acronym machine understanding methods,and formulates research plans and schemes for follow-up research. |