Font Size: a A A

Analysis Of Language Features Of English Abstracts And Construction Of A Knowledge Base

Posted on:2017-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:M T SunFull Text:PDF
GTID:2355330491956273Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
As an emerging interlaced subject, computational linguistics combines linguistics, mathematics and computer science together, which accelerates the language communication and technological development internationally. The English abstract of computational linguistics thesis, the head of main body, is the high generalization of a thesis, playing an important role to make readers have a quick and clear understanding of the research background, purpose, methodologies, results and application values, which serves as not only a significant bridge of international academic communication but also a vital carrier of technological achievements. Also, the English abstract has a great impact on papers'being included and reprinted by any kinds of large-scale conferences and core journals at home and abroad. Hence, writing a normative and precise English abstract is a very important task in improving the exchange of linguistic and technological achievements and the flourish of computational linguistics.We take the 756 abstracts of the Association of Computational Linguistics Conference's long papers from 2010 to 2014 and the 284 English abstracts of computational linguistics thesis of the Journal of Chinese Information Processing from 2010 to 2014 as research object, constructing them as two corpus with annotation, statistics and analysis, to help non-native speakers of English in China to write a more normative English abstract of computational linguistics thesis and extend the impact of Chinese linguistic and technological achievements internationally.We conclude an international writing norm of the English abstract of computational linguistics thesis as follows with the utilization of corpus method, statistical method as well as combination of qualitative and quantitative analysis:(1) the length of an abstract is generally 90 to 140 words with 4 to 6 sentences; (2) the type of an abstract should be informative; (3) the structure of an abstract should include introduction, methodologies and results, which should also pay attention to the introduction of research background, problem, theme and purpose; (4) the verb tense of an abstract is generally the simple present tense while the verb voice is usually active, and the first person pronoun often uses "we"Based on the two corpus above, we construct an English-Chinese common terms knowledge base with 269 entries, an English-Chinese common verbs knowledge base with 116 entries and a common sentence patterns knowledge base with 325 entries for writing the English abstract of computational linguistics thesis by using corpus tool, manual sorting and Chinese translation. Furthermore, we design an English abstract writing software to assist non-native speakers of English to write a more normative and idiomatic English abstract in an easy way.Besides, our constructed common terms knowledge base and common sentence patterns knowledge base can also be respectively applied to the content module and syntactic variety module of the automatic scoring system abroad called E-rater, which can help E-rater to assess the content quality and language quality of papers related to computational linguistics.
Keywords/Search Tags:English abstract, Language features, Term extraction, Knowledge base, Computational linguistics
PDF Full Text Request
Related items