Font Size: a A A

With Elan Dialect Multimedia Self-built Corpus And Its Application Research

Posted on:2014-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiFull Text:PDF
GTID:1225330398967087Subject:Chinese Philology
Abstract/Summary:PDF Full Text Request
ELAN is an annotation tool that allows ya to create, edit, visualize and search annotations for video and audio data. It was developed at the Max-Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, with the aim to provide a sand technological basis for the annotation and exploitation of multimedia recordings. This paper introduces the building process of building Chinese dialect multimedia corpus in detail.And use the modal particles in Shuangfeng dialect as an example, discusses how to research based on the corpus.This paper has seven parts.The first chapter,Introduction. Firstly, introduce the ELAN’s function, characteristics and has a brief introduction abst ELAN’s application in the world. This chapter also describes the advantage of building multimedia corpus by ELAN, and also introduces some achievement abst researching on ELAN by myself spce2011. This chapter also talks abst the research objects, methods, significance, the research general situation of Shuangfeng dialect and the ssrce of corpus, the main speaker, Shuangfeng (Huamen) Dialect phonology etc.The second chapter,corpus and multimedia corpus. Corpus’s concepts is different in different books and papers.Firstly, this chapter defines the concept of corpus, then it talks abst the design and processing of corpus. Multimedia corpus is a kind of new corpus in recent years. In the second section, talking abst definition of the multimedia corpus, and introduces the construction situation of multimedia corpus all over the world. This chapter spent more time in introducing the construction of Chinese dialect corpus.The third chapter,Building multimedia corpus of Chinese dialects based on ELAN, is a main part of this paper. This chapter introduces the ELAN’s operation and use and how to use ELAN to build multimedia corpus, and introduce how to use ELAN’s powerful searching function to search keywords, collocation of words,and sentences. In order to improve the efficiency of ELAN’s audio segmentation, improving data processing and conversion rate. We developed two Auxiliary software of ELAN, one is automatic audio segmentation tool, and the other is batch Eaf file conversion tool, here also briefly introduced the usage of these two softwares.The fsth chapter, Word segmentation, word tagging in the corpus and related statistics. First introduces the corpus’sarce and distribution. Because of ELAN’s open structure, we add some function to ELAN. We use the CIPP Chinese corpus processing and application tools made by Mr.HeSheng,the teacher in Nanjing Normal University, based on the custom user dictionary of Shuangfeng Dialect, We segmented all the sentences in the corpus and tagged all the word s in the sentences, and canted sentences frequency, words frequency, character frequency and other analysis.The fifth chapter. The modal particles search based on the Shuangfeng dialect corpus. All modal particles have made speech tagging, combined with the searching function in ELAN. We find all the sentences containing modal particles,extracted18single modal particles,31dable linked modal particles,6three linked modal particles. At last,we study all three categories modal particles above,abat their meaning and mood function above in context.The sixth chapter.Concluslon. This conclusion of this paper,the shortage of this paper and the next research work plans and arrangements.Appendix, ELAN’s commonly used terms in Chinese and English, Text Translations(a total of7, the Shuangfeng dialect spoken language transcriptions of nature abat1.8M words), and a snapshot of ELAN Technology Forum’s discussion etc..
Keywords/Search Tags:ELAN, Multimedia Corpus, Shuangfeng Dialect, Modal particles
PDF Full Text Request
Related items