| Linearity and hierarchy are fundamental properties of human language.As a result,the linear and hierarchical features of languages at the syntactic level and their relations with language processing difficulty have attracted the interest of researchers in many fields.However,previous studies on the subject have mostly investigated the effects of one syntactic feature on language processing using a single research method.This has not only hindered our overall understanding of the structural patterns of natural language,but also slowed efforts to establish a systematic view of human language processing and its underlying cognitive mechanisms.Given the above background,the current study focuses on the linear and hierarchical syntactic properties of languages and the relation of these properties to cognitive difficulty.In-depth investigations were carried out into different types of syntactic structures and syntactic processing in one language,and into the syntactic structures of different types of languages successively.Specifically,we attempt to address the following research questions:(1)What similarities and differences do English sentences of varying complexity have in terms of their quantitative syntactic features,such as the distributions of dependency distance(DD),dependency direction(DDir)and hierarchical distance(HD)?(2)What are the regularities related to DD,DDir and HD in language production?(3)For an arbitrary word or sentence,can its linear and hierarchical syntactic features predict its reading time(RT)in language comprehension?(4)What are the commonalities and differences in the DD distributions of different languages? Among them,questions(1)to(3)focus on the syntactic structures in one language and their processing mechanisms,and question(4)takes the subject further by exploring the differences in the syntactic structures across languages and their motivations.The study of these questions is conducive to a deeper understanding of the linear and hierarchical properties of human sentence structures and their relation to human cognition.To answer the above questions,we carried out research using a combined method of quantitative linguistics and psycholinguistics against large-scale dependency-annotated corpora(treebanks).For question(1),two reference treebanks were built based on the British National Corpus,after which a quantitative comparative analysis between them and the English reading time treebank was conducted.For questions(2)and(3),both quantitative analyses and psychological experiments were carried out based on the English reading time treebank.For question(4),Chinese,English,Czech and Japanese treebanks of the news genre were adopted to study the similarities and differences of syntactic structures across languages and their internal and external motivations.For the first question,it was found that the English reading time treebank is not significantly different from the reference treebanks in terms of their general syntactic features.This indicates that the reading time treebank can well represent the sentence structures of English.With the increase of sentence length,however,the reference treebanks and the reading time treebank containing more complex structures also exhibit some differences,which is a reflection of the quantitative characteristics of English complex sentences.To be specific,the reading time treebank enjoys a lower proportion of headinitial dependencies and degree of dependency distance minimization(DDM),and a higher degree of hierarchical distance minimization(HDM),compared to the reference treebanks.This may be attributed to a desire to avoid recursion.Regarding the second question,we found that languages exhibit a tendency to minimize hierarchical distance.While HDM is similar to DDM in that they both reflect general cognitive constraints on language processing,their manifestations enjoy some differences,which suggests that the cognitive mechanisms driving linear and syntactic language processing may not be identical.In addition,there is a preference in English to use head-initial structures with the increase of dependency distance and sentence length.This implies that different dependency directions may put different demands on working memory.Concerning the third question,it was found that DD and RT(reading time)are positively correlated for a word with a preceding head(distance effects),whereas negatively correlated for a word with a following head(anti-distance effects)when DD is limited within the span of working memory(approximately when DD ≤ 4).Furthermore,the HD of an arbitrary word in the corpus is positively correlated with its RT(hierarchy effects),and the significance of the effect is not limited by the capacity of working memory.These findings not only provide large-scale and broad-coverage validation to related sentence processing models and hypotheses,but also suggest that linear and hierarchical language processing may enjoy different cognitive mechanisms.Lastly,sentence length,mean dependency distance,dependency length,and mean hierarchical distance for an arbitrary sentence in the corpus can all be used to predict its reading time.With respect to the fourth question,we found that different languages exhibit different degrees of DDM.This phenomenon can be captured by the fitting functions to their DD distributions.Among the four languages that were studied,languages with higher syntactic complexity tend to have lower morphological complexity.This reflects the dynamic synergy between the sub-systems of a language,and implies that the overall complexity of languages as a tool for communication and expression may be similar under human cognitive constraints.Adopting large-scale and authentic corpora and interdisciplinary methods,the present study offers a systematic and comprehensive investigation into the linear and hierarchical properties of human language syntax and their relations with language processing difficulty.With respect to the subject of the study,we integrated three interrelated syntactic features together,namely dependency distance,dependency direction and hierarchical distance.In terms of research methodology,we combined quantitative linguistic and psycholinguistic methods to avoid the limitations associated with a single method,and provided new insights into the cognitive mechanisms underlying human language processing.The current study is therefore relevant to studies in many disciplines related to language and cognition,such as quantitative linguistics,psycholinguistics,natural language processing and cognitive science. |