| Text semantic understanding is to understand the semantic meaning of language from text.It is to analyze the semantics of words,phrases and discourse in a text to extract the information and meaning.Topic models can be used to understand the global semantics of a text by clustering and topic modeling the words in the text.The attention mechanism can deepen the model’s understanding of text semantics by learning to focus attention on the key information in the text.This thesis is based on semantic understanding techniques,combined with topic models and attention mechanisms.This thesis study three tasks of named entity recognition,keyphrase extraction and text summarization generation.The main work of this thesis includes:Ⅰ.A named entity recognition enhancement method based on attention mechanism.To solve the problem of poor semantic representation in named entity recognition tasks,a named entity recognition enhancement method based on attention mechanism is proposed.Firstly,POS tag embedding and word embedding are fused.Attention mechanism is introduced to obtain the contextual information representation of the input sequence.Then,the sentences and entity trigger are semantically matched.The trained trigger word vector representation is used as the query vector for the attention mechanism to generate a new sentence vector representation.Finally,conditional random fields are used to decode the entity labels of the whole sentence.Experimental results show that the method can enhance the semantic representation.It provides a basic for candidate phrase block processing for the keyphrase extraction task.Ⅱ.A keyphrase extraction method based on topic and semantics.To solve the problem that existing keyphrase extraction methods consider a single perspective and are prone to extract synonyms,a keyphrase extraction method based on topic and semantics is proposed.First,the text is first preprocessed to obtain phrase blocks as well as word embeddings and topic models.Secondly,the Manhattan distance between the masked document embedding and the original document embedding is calculated as the semantic importance feature.The likelihood of the candidate phrase belonging to the topic is calculated as the topic diversity feature.Then the position weights are used as the position features.Finally,the phrase importance score is calculated with semantic importance,topic diversity and position information.The results show that the method can avoid the problem of synonyms.Thus,keyphrases covering the main idea more comprehensively.It provides fine-grained semantic elements for the following text summarization generation tasks.Ⅲ.A Graph2Seq text summarization generation method based on attention mechanism.To solve the problem that the Seq2Seq model has difficulties in considering long distance relationships for long texts,which leads to topic deviation.Considering keyphrases as a kind of fine-grained semantic elements that guide the task of text summarization generation,a Graph2Seq text summarization generation method based on attention mechanism is proposed.Firstly,aggregate co-reference phrases in the text and construct a semantic graph based on coreference phrases to transfer the rich relationships between phrases.Secondly,the graph encoder is used to model the relationships between phrases and capture the global structure based on the semantic graph to efficiently encode long sequences.Finally,the semantics based on keyphrases guide the weight assignment of attention,thus guiding the sequence decoder to focus on the main idea of the document.The results show that the method can generate more accurate and comprehensive text summarization.In conclusion,this thesis is based on text semantic understanding techniques,combining topic models and attention mechanism for understanding text.It studied a named entity recognition enhancement method based on attention mechanism,a keyphrase extraction method based on topic and semantics and a Graph2Seq text summarization generation method based on attention mechanism.It achieved more effective results than existing small model methods for three tasks:named entity recognition,keyphrase extraction and text summarization generation. |