Textual entailment is that if the hypothesis sentence H can be reasonably inferred from the premise sentence P,then P implies H.Recognizing Textual Entailment(RTE)aims to assume the entailment relationship between a given premise sentence and a hypothesis sentence.RTE can help computers understand the deep meaning of text and language.It is applied to many natural language processing tasks,such as question answering,machine translation,information retrieval,text summarization,relation extraction,etc.Compared with RTE in English,the research progress of Chinese textual entailment recognition is slow.Firstly,aiming at the problem that the knowledge characteristics of the data itself are not rich enough,this thesis uses the synonyms in the CI Lin to expand the dataset.Then,aiming at the weak ability of sentence semantic information captured by the model,this thesis proposes a RoCo model combining the pre-trained model and attention mechanism to enhance the ability of the model to learn the semantic relationship between sentences.Finally,aiming at the problem of limited sentence encoding ability,this thesis improves and optimizes the encoding layer of the model by using the vector representation integrating the sememe information and context in HowNet.The main work is as follows:(1)A textual entailment recognition method based on synonym extension is proposed.The process first extended the CI Lin synonyms for the CNLI and XNLI-ZH datasets and obtained the datasets CNLI-m-p and XNLI-m-p,which monosemous and polysemous words have expanded.Then,the RoCo model is proposed to jointly enhance the ability of the model to capture the semantic information of sentences by combining the pre-trained model and the attention mechanism.The experiment results show that each model can more effectively identify textual entailment relations using the synonym-expanded dataset.Compared with models that only use the attention mechanism or the pre-trained models,the recognition effect of the RoCo model is better.And the model achieves 80.50% and80.02% accuracy on the CNLI-m-p and XNLI-m-p datasets.Using the RoCo model on the synonym-extended dataset helps identify textual entailment relations.(2)A textual entailment recognition method based on sememe enhancement is proposed.To further enhance the encoding ability of sentences,this thesis presents the RoCo-Sem model.The encoding layer of the model uses the sense vector and word vector based on the sememe information and context in How Net to encode the sentence,respectively.Then aggregates the results of the attention calculation of the premise and hypothesis sentences.The RoCo-Sem model achieves good results on the original dataset,indicating that using a vector representation based on sememe information in How Net can enhance the encoding ability of the model.The model achieves 81.33% and 80.74% accuracy on CNLI-m-p and XNLI-m-p datasets,respectively,indicating that using sememe-based vector representation based on fused knowledge datasets can effectively improve the accuracy of model recognition. |