Font Size: a A A

Research On Machine Writing Based On Deep Learning

Posted on:2021-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:K L XiongFull Text:PDF
GTID:2435330626954093Subject:Engineering
Abstract/Summary:PDF Full Text Request
Thesis abstract writing is an auxiliary technology to realize thesis abstract writing based on the input thesis title,so that human beings can write the thesis abstract more effectively and professionally.At present,for the abstract writing task,neural networks can extract more robust models through the study of a large number of text corpora.Although such networks have achieved good results,there are still very big challenges,including There are errors in model input corpus processing and errors in model performance.Therefore,in order to solve the above problems,it also promotes the rapid development of paper abstract writing technology in the field of machine writing.This paper aims at the problems in the abstract writing tasks such as: the recurrent neural network cannot accurately convey the original information,the insufficient subject information leads to the difficulty of machine learning,the abstract of the thesis is not fully utilized,the differences in the study of Chinese and English corpus and the practicality of Chinese corpus The value of research,but has not studied the Chinese corpus and other issues.A series of solutions were proposed at the level of improving the performance of the recurrent neural network sequence model and the optimization model search ability,mainly from the following aspects:First,for the problem that the recurrent neural network cannot accurately convey the original information,a topic enhancement mechanism is proposed,and a Seq2 Seq model combined with a soft attention mechanism is used to enhance the communication of the original information.In this paper,we use the characteristics of topic reproduction in human writing,starting from the two dimensions of topic vocabulary probability distribution and vocabulary probability distribution.When decoding and generating abstract vocabulary at each moment,we use topic enhancement weights to perform topic vocabulary probability and vocabulary probability Weighted summation dynamically adjusts the probability of predicted words,which alleviates the problem that the subject words do not appear in the summary,thereby enhancing the original message communication.Experiments on Chinese and English corpora show that the topic enhancement model proposed in this paper can significantly improve the model performance score compared to the typical sequence model.Secondly,in response to the problem that the lack of topic information in recurrent neural networks makes machine learning more difficult,this article proposes a Seq2 Seq model that combines the theme enhancement mechanism and the editing mechanism through multi-model fusion technology based on the integration of the soft attention mechanism.In order to obtain more theme information.This article uses the habit characteristics of human editing when writing,and based on the theme enhancement mechanism,in the process of generating abstracts,it continuously obtains more topic information from the previously generated abstracts for the generation of abstracts at the current moment.Experimental results show that the neural network model proposed in this paper has an average score increase of 2.75 and 2.2 percentage points on METEOR and ROUGE_L,respectively,compared to the existing sequence model.Thirdly,for the problem that the abstract of the paper is not fully utilized,this paper proposes a method of combining title vocabulary and abstract vocabulary in subject-based enhanced gating.In each step of the generation process,the previous version of the abstract vocabulary is introduced into the theme enhancement gating,according to a certain weight,the title vocabulary attention weight distribution and the abstract vocabulary attention weight distribution are combined,and the combined vocabulary distribution is used as the theme Information probability distribution to update the predicted vocabulary probability distribution at the current time.The experimental results show that the method proposed in this paper has higher performance and better model evaluation results for the abstract writing task than many current advanced machine writing text generation methods.Fourth,the research aimed at the differences between Chinese and English corpus research and Chinese corpus has practical research value,but has not studied the Chinese corpus.Based on the differences between Chinese and English corpus preprocessing methods,this paper studies the impact of the smallest text processing unit in Chinese corpus on the performance of several models proposed in this paper.By analyzing the advantages and disadvantages of word-level and Chinese-level text processing units,the pytorch framework and GPU acceleration realizes the internal use of matrix and tensor operations in softmax,so that Chinese character-level text processing can obtain better paper abstract writing performance than word-level text processing.
Keywords/Search Tags:paper abstract writing, sequence-to-sequence model, attention mechanism, theme enhancement mechanism, editing mechan
PDF Full Text Request
Related items