Font Size: a A A

A Corpus-based Study On English Vocabulary For Primary And Secondary Schools And Automatic Test Questions Generation

Posted on:2020-03-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Y XiaoFull Text:PDF
GTID:1365330575956264Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Vocabulary is the cornerstone of language,playing an important role in language learning.Thus,vocabulary teaching and testing has been receiving increasing attention from language educators and academic researchers at home and abroad.With the advancement of informatization and globalization,it is worth exploring whether the vocabulary in English textbooks used in Chinese primary and secondary schools can meet students' needs for learning and information exchange.The advent of the era of intelligent education not only raises new demands on language testing,but also sheds light on the future direction.Based on dynamic system theory,Nation's principles on vocabulary teaching and learning,and communicative language testing theory,this study leverages corpus linguistic approach,qualitative and quantitative approach,and interdisciplinary research methods to analyze the vocabulary in English textbooks and to study the automatic generation of test questions.First,this study develops an English webpage corpus(EWC)consisting of the latest language materials by employing relevant natural language processing(NLP)techniques.Second,based on multiple reference corpora,this study examines the vocabulary in English textbooks using scientific and objective quantitative indicators,and proposes supplementary vocabulary for English learning,which also provides data support for textbook compilation and texts selection.Finally,by adopting NLP techniques and machine learning method,this study initially realizes the automatic generation of lexical multiple-choice items for learners of different language proficiency.The dissertation includes the following six chapters.The first chapter serves as the introduction.It first introduces the research background and main contents,defines the lexical counting unit,and then expounds the theoretical basis and research methods,the significance and the innovation points of this research,etc.The second chapter first introduces the definition,history and different types of corpus,and reviews the related applications of corpus in the field of NLP and language research,especially in textbook vocabulary research and language testing.Then it gives an overall literature review on the automatic test question generation and sentence difficulty evaluation.The paper points out that it is necessary to analyze the vocabulary in English textbooks and study the automatic test questions generation from a dynamic perspective by employing multiple representative corpora and relevant NLP techniques.The third chapter clarifies the construction process of the EWC and introduces the other corpora used in this study.Firstly,it outlines the five issues that need to be considered in the EWC construction and proposes the overall design.Then the web crawler technology is used to collect all the text data in the past two years from two mainstream English websites(BBC News and America Online)and a very popular English web forum(Delphi Forum).We use relevant NLP techniques to process and annotate the corpus and finally build the EWC with more than 100 million tokens and nearly 370,000 types representing the authentic language that people use in communication.Finally,the British National Corpus(BNC)and the English subtitle corpus used in this study are introduced.This part of the study provides a reliable data basis for vocabulary analysis and the automatic test questions generation.The fourth chapter gives a corpus-based quantitative and qualitative analysis on the vocabulary in English textbooks used in the Chinese primary and secondary schools.The statistical indexes include word frequency,distribution rate,the Zipf value and coverage rate,etc.Several supplementary word lists are suggested for primary and secondary English learning through a series of comparative experiments.The analysis shows that language is developing dynamically,the vocabulary of English textbooks in primary and secondary schools can lay a good foundation for cultivating students' communicative competence,but far from guaranteeing effective language communication.Regarding the words selection,the majority of the textbook words have high word frequency and distribution rate in the reference corpora,while very few have a relatively low word frequency and distribution rate.Based on the above findings,we suggest the textbook vocabulary be expanded properly and updated regularly by adding the frequently and widely used core vocabulary.The evaluations based on the coverage rate and new-GSL(new General Serivice List)indicates that the supplementary word lists proposed in this study can effectively improve the lexical coverage.The fifth chapter attempts to automatically generate multiple-choice items for prepositions.First,to measure sentence difficulty,six features at lexical,syntactic,and semantic levels are used,i.e.,sentence length,the minimum word frequency,syntactic depth,amount of word senses,length of dependency links,language model probability.Sentences of a given difficulty are selected from the corpus as the carrier sentences by using the similarity measure.Then,under the principle of semantic similarity,we use the Word2 vec which is an efficient prediction model to automatically generate distractors.The evaluations indicate that the proposed approach can initially generate test questions targeting learners at different language levels and effectively generate distractors with relatively high reliability and plausibility.This part of the study contributes to the further development of computer-aided language testing,and provides some inspiration for the practical application of artificial intelligence in English teaching and learning.The interdisciplinary research method used has merits over a single language research method,which brings enlightenment to the future language research.The sixth chapter summarizes the main research contents and findings,expounds the theoretical implications and practical suggestions on words selection for English textbooks,vocabulary learning and testing,and finally points out the limitations of the study as well as the future work.
Keywords/Search Tags:Corpus, Textbook Vocabulary, Coverage Rate, Word Embedding, Automatic Test Questions Generation
PDF Full Text Request
Related items