| Since the 1990 s,people began to realize the value of national language and culture,especially the national corpus resources.Under this background,various academic institutions have proposed to establish various types of corpus,to dig deeply and maintain the value and usage norms of national corpus.Corpus is an important tool for linguistic research in the background of big data times.The community of teaching Chinese as a second language for foreigners has already established some intermediary language corpus and HSK dynamic composition corpus,which has played a huge role in the teaching practice of Chinese and the study of Chinese ontology.Therefore,the author began to think about how to provide more convenience for Chinese researchers in the construction of corpus,and set out to establish a corpus of Chinese children’s books.Through elaborate design and compilation,we preliminary plans to build a 7 million words scale for Chinese children’s corpus,throughout the development process including the corpus and preliminary design,data collection,database programming,corpora late entry,rectification,etc.The basic starting point of the corpus design is that the language survey provided by the corpus is reasonable and reliable.Corpus corpus linguistics is based on the text in a statistical information related to frequency,using the statistical principle to measure the complicated language phenomenon in actual language use in specific circumstances.Because corpus statistics are all from the real language,and the corpus based data can largely avoid chance,the conclusion is more persuasive.Chinese children’s books written research can not only deepen the understanding of children’s language acquisition,and the second language of Chinese teaching is of great reference significance for reference,the first language acquisition and second language teaching of the intimate contact,children’s books can reflect the real Chinese acquisition of some of the rules,these rules can be used to guide both domestic Chinese teaching,and compiling Chinese textbooks for teaching Chinese as a foreign language and,in particular the development of Chinese as a foreign language books provide precious material.As a powerful tool for linguistic research in the age of big data,corpus can provide strong technical support for these researches.Therefore,we design a the corpus according to Chinese children’s books,children’s books mainly from domestic popular books and translation of introducing foreign good reading,read age from age 1 to 12 years old,which is divided into threedifferent age groups.We set up the corpus is used for the purpose of the investigation and analysis of the language features of the children’s books and written words,Chinese children’s books based on self-built corpus,gathering different ages,different theme is the text of the children’s books,using the word segmentation procedure for word segmentation in the corpus,finally complete the different ages,different types of text word frequency and word frequency statistics,for children’s books in Chinese characters,vocabulary and sentence thorough investigation to understand the actual usage,statistics and analysis of children’s books in Chinese characters,word frequency and sentence length of usage for comprehensive analysis. |