Font Size: a A A

The Construction And Statistical Analysis Of The Image Annotation Corpus Of "Three Hundred Poems Of Tang Dynasty"

Posted on:2020-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:S J GeFull Text:PDF
GTID:2515305777471844Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Imagery is an important element in the process of poetry creation and understanding as it can convey profound thoughts and emotions.With multi-layered and metaphorical meanings,understanding imagery is critical for mastering poem.Traditional studies on imagery mostly start from the literary and aesthetic perspectives,and focus on particular poet,poem and imagery.There has been a lack of macro-quantitative research on many issues,such as the distribution of specific imageries in ancient poetry,combination approaches of imageries and emotions,and the distribution of imageries in different authors.The reason may be that traditional research paradigm fails to use computing method and database technology,while newly emerging digital humanities basically glean shallow literal information.This paper establishes a deep annotated corpus taking Three Hundred Tang Poems as a sample.Firstly,it applies deep learning method to word segmentation and part-of-speech annotation,both of which get manual proofread.Then,literal and deep meanings of imageries are annotated.These data enable us to analyze the composition characteristics of literal information,try to carry out multi-level statistical analysis from literal to deep meanings and explore the relationships among imageries,styles of poets and themes of poems.Imagery distribution data and semantic knowledge such as the composition characteristics of lexical information and deep emotions obtained by statistics can serve the automatic poetry generation,etc.The work of this paper mainly includes following aspects:Firstly,a corpus contains word segmentation and part-of-speech information of total 17718 words is established.All data are automatically annotated by machine and proofread manually.Poems in Three Hundred Tang Poems get automatic word segmented and part-of-speech annotated.The F-scores of these experiments are 85.59 and 77.47 respectively,which can be used in machine-assisted annotation and analyzing the influence of factors such as text particularity of ancient poetry and corpus size on annotation effect.Secondly,a semantic annotation system of imagery is designed considering its multi-layered and metaphorical meanings,which annotates themes of poems,literal and deep meanings of imageries,compositions of imageries and so on.How Net is used as a classification scheme to annotate meanings of static imageries in poems that have already got word segmentation and part-of-speech labels.That is,this work tries to find semantic relationships between literal meanings and deep meanings of imageries in a cognitive way.Thirdly,statistical analyses from multiple perspectives are carried out.This paper annotates 4496 imageries that come from 320 Tang poems.Based on theoretical researches of imageries,this paper analyzes the annotated data and draws following conclusions:(1)The cases of imageries show a long-tailed distribution,which accord with Zipf's Law.Common natural terms like ?(the moon),?(night),?(wind)and ?(mountain)are main imageries,while the distribution of semantic categories are more even gentle,with components of physical objectives and characters as the main categories.Besides,those imageries that are used frequently have obvious metaphorical meanings.(2)In poems of Li Bai,Du Fu and other well-known poets,imageries are not densely distributed.And imageries are frequently used in nostalgic poems and war poems,which mean imageries can reflect styles of poets and themes of poems to a certain extent.(3)The internal compositions of imageries comply certain rules.Firstly,imageries of a single word are quantitatively close to that of several words.Then,considering imageries of several words,there are often coordination or modification relations among them,whose modifiers are mainly color and season and mainly modify mountains,wind,and the moon etc.(4)In terms of deep meanings of imageries statistics,deep meanings of static imageries are divided into three categories: cognitive attribute category,metaphor and metonymy category and event category according to parts of speech of deep meanings.Cognitive attributes of imageries indicate that the overall emotions of poems are negative.Besides,the use of metonymy is more frequently than that of metaphor in poems.Flowers and components of physical objects are often used as vehicles in metaphor;while in metonymy,whole-part relationship,category-member relationship are two types that used most frequently,showing a cognitive tendency of fanning out from point to area.Then,events represented by imageries are mainly reminiscence,war,seclusion and separation,reflecting themes and contents.To sum up,the imagery corpus constructed in this paper can fully represent the multi-level and multi-dimensional information of imageries in language form,surface level and deep level.Using the method of quantitative statistics,it can show and compare comprehensively distribution characteristics of imageries in different semantic categories,poets and ways of expression.Also,it can provide useful supplements to the research both on imagery and automatic poetry generation.
Keywords/Search Tags:Three Hundred Tang Poems, imagery, digital humanities, Chinese information processing
PDF Full Text Request
Related items