| With the rapid development of modern science and technology,the amount of information on the Internet has increased at an alarming rate.Facing the pressure of the extremely inflated network information and data overload surplus,it is extremely important to locate “valuable information” efficiently and accurately.Text summarization technology in the field of Natural Language Processing is an effective means to analyze and process network information.Text summarization technology extracts and summarizes text information to generalize the content or meaning of the article in concise and refined words.However,the coverage of text information disseminated on the network is getting wider and wider,and the text content information is only increasing,involving more entity nouns such as person names and place names.The current summarization model is more difficult to capture the key information and deep meaning of medium-long texts,and it is beginning to face the problem of long-distance dependency.In view of these issues,the research is carried out from the perspective of semantic analysis,the main research contents and contributions are as follows:(1)To solve the problem that the semantics of summaries are not smooth enough,we construct an abstractive summarization model based on bidirectional decoding Bidecoder.Based on the sequence-to-sequence model,the model improves the decoder part and adopts a bidirectional decoding structure.This structure can predict from both two directions and continuously fine tune it on combination with the prediction results,so as to alleviate the error accumulation problem caused by the unidirectional structure.Attention mechanism is introduced during decoding and attention allocation is rational used to improve the semantic consistency of the generated summary.(2)To solve the problem of inaccurate recognition of entity information in medium-long text,we construct an abstractive summarization model based on NER(Named Entity Recognition)tag and bidirectional decoding NER-Bidecoder.The model uses NER technology to mark entities in the original text and divides them into four categories,PERSON,ORG,GPE,and MISC,which are represented as person names,organization names,place names,and numbers.The NER-marked text retains entity information after vectorization,which can effectively alleviate the problem of incomplete entity information.Encoder utilizes entity information to generate intermediate semantic vectors,decoder utilizes a bidirectional decoding structure.Attention mechanism is introduced to capture deep-level semantic relationships based on time-series information to improve the entity integrity and coherence of the summary.(3)To solve the problem of inaccurate comprehension of sentence-level words in the medium-long text,we construct an abstractive summarization model based on BERT(Bidirectional Encoder Representations from Transformers)vectorization and bidirectional decoding BERT-Bidecoder.The BERT pre-training model is utilized in the vectorization stage of the model,which can make full use of the contextual information of the vocabulary and obtain a more global vector representation.It helps encoder and decoder understand full-text information,and attention mechanism is introduced to enhance the semantic relationships.The bidirectional decoding structure of the decoder can also further reduce the error accumulation,alleviate the tilt problem caused by unidirectional errors,and improve the coherence and generalization of the summary. |