Font Size: a A A

Design And Software Development Of Couplet Generation Model Based On Word Vector And Attention Mechanism

Posted on:2023-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y FangFull Text:PDF
GTID:2555306812475694Subject:Engineering
Abstract/Summary:PDF Full Text Request
The couplet is a unique form of artistic expression in the Chinese nation,with a long history and profound cultural heritage.Its unique rhythm and concise language style are widely loved by people.Due to the characteristics of strict confrontation and concise language,the study of automatic creation methods of couplets faces greater difficulties and challenges.In recent years,some automatic generation methods of couplets based on deep learning have achieved certain improvements in generation speed and readability of results,but there are still some problems to be further studied and solved,such as poor generation effect of long couplets and poor antithesis of idioms.In order to further improve the neatness of the buttress structure of the couplet generation results and the relevance of the semantics of the first and second couplet,in view of the problems existing in the existing deep learning methods,this thesis presented an automatic couplet generation method based on the word vector and the multi-head Attention mechanism.And based on it,a couplet generation system was built,which can automatically generate a more neat and more semantically appropriate second couplet according to the given first couplet content.Aiming at the problem that the multi-character phrases and idioms in the first couplet cannot be strictly matched in the lower couplet generated by the existing methods using a single Chinese character as the corpus material,this thesis proposed a technical route for generating model training based on "word corpus",and designed according to the characteristics of couplet style.A joint word segmentation algorithm was proposed,which realized the reasonable extraction of corpus materials.In order to solve the problem of low quality of long couplet and even number couplet generation caused by uneven sample distribution in the dataset,a data extension based on buttress constraint was designed.The strategy,combined with the EDA data expansion method,effectively optimized the sample distribution of the dataset;in order to further strengthen the semantic connection between the second and the given first couplet,the Transformer was used to build a couplet generation model,and the multi-head Attention mechanism was used to strengthen the semantic correlation of the first couplet and second couplet connections.So that the quality of the generated second couplet can be improved to a certain extent.Used GRU model based on the Attention mechanism as the baseline model,and used BLEU,METEOR and Perplexity as the evaluation indicators,the training effect of the word corpus and the word corpus in the baseline model and the generation results of the baseline model and the generated model were compared when the same word corpus was used in this thesis.The experimental results showed that,compared with the word corpus,the word corpus after word segmentation and data expansion in this thesis improved the BLEU index of the baseline model by 4.31,the METEOR index by 2.7,and the Perplexity index by 4.69;under the same experimental conditions,this thesis Compared with the baseline model,the BLEU index of the model was increased by 0.82,the METEOR index was increased by 1.69,and the Perplexity index was decreased by 3.35,which reflected the use of word materials and the method in this thesis could greatly improve the quality of automatic couplet generation.
Keywords/Search Tags:Couplet generation, Word corpus, Joint word segmentation, Attention mechanism, Transformer model
PDF Full Text Request
Related items