Font Size: a A A

Research On Chinese Abbreviation Based On Word Frequency Statistics

Posted on:2018-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:L F XuFull Text:PDF
GTID:2335330515979614Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Chinese abbreviation is a common vocabulary phenomenon in the development and expression of Chinese.It is abbreviated from some language units——original words-which have the same meaning with each other but more syllables.Currently,the key issues in the study of Chinese abbreviations focus on the definition of abbreviations,timeliness,precision of word meaning and frequency of use.Under the guidance of the above problems,this article uses the abbreviations and original words in the Chinese abbreviations dictionary as the research object,does some word frequency count and comparative analysis on study of the large-scale corpus,and summarizes the synchronic characteristics of abbreviations and word frequency.At the same time,we analyze the evolution of Chinese abbreviations and word frequency in depth from the diachronic perspective.Finally,on the basis of the above research,we generalize the main motivation of Chinese abbreviations and summarize the principle of abbreviation.It is found that the abbreviation principle of Chinese abbreviations is similar to Huffman coding principle.This paper is divided into six parts:The first chapter reviews the research status of modern Chinese abbreviations,and several key problems in the research on combing abbreviations.On this basis,the chapter introduces the purpose,significance,ideas and methods of this article.The second chapter is the research on the relationship between Chinese abbreviations and the original words and the principle of Chinese abbreviations.This chapter analyzes the relationship between the two from several aspects such as semantics,time of occurrence and construction,and summarizes the principles which need to be followed when abbreviating.The third chapter is the analysis of Chinese abbreviations and the frequency of the original words.This chapter is divided into four parts:The first part briefly explains the source of Chinese abbreviations and the construction of corpus;The second part summarizes the research status and frequency statistics,and compare the results of computer word segmentation statistics and manual segmentation statistics through the experiment to verify the accuracy of computer statistics;In the third part,according to the data comparison of word frequency statistics,we analyzed the relationship between the frequency of the acronym and the total frequency of the original words is analyzed,and then drew the conclusion that the higher the frequency of the words is,the less the syllable is,and the easier it will be to abbreviate.The fourth part is the analysis of some special phenomena that do not meet the above conclusions.The fourth chapter is the diachronic analysis of Chinese abbreviations and word frequency.This chapter first introduces the current situation of linguistic diachronic research,and then explains the contents and characteristics of large-scale flow corpus.Finally,according to the characteristics that the Chinese acronym and the original words in the frequency will change year by year,with a decade as a stage,we analyze the acronyms and the original words' frequency changes,sum up the relationship between the acronym and the original words of the current stage.The fifth chapter is the demonstration and summary for the motivation and rules of Chinese abbreviations.First of all,it can be inferred from the existing research results and experimental data that " Principle of Least Effort " is the main motivation of the abbreviation;Then,according to the characteristics of Huffman coding,we analyze the abbreviated principle on the basis of the experimental data in the previous two chapters.We find that the generation of abbreviations is the same as Huffman coding in information theory,both show the characteristics of high frequency and short coding.At the same time,the frequency of abbreviations is also related to the number of syllables,which prove the practice principle of Hoffman encoding in the field of linguistics from the side.The sixth chapter is the conclusion.This chapter combs the main contents of the paper,points out the areas to be improved in the research work,and prospects the future work.Based on experimental method,the article analyzes the frequency characteristics of the original words and abbreviations,explore the relationship between information theory and abbreviation principle from a novel perspective,which makes a strong complement to the application of acronym and computational linguistics.
Keywords/Search Tags:Chinese abbreviations, Original words, Word frequency, Principle
PDF Full Text Request
Related items