First, we show that all the world’s languages that we can currently analyze minimize syntactic dependency lengths to some degree, as would be expected under information processing considerations. Next, we consider communication-based origins of lexicons and grammars of human languages. Chomsky has famously argued that this is a flawed hypothesis, because of the existence of such phenomena as ambiguity. Contrary to Chomsky, we show that ambiguity out of context is not only not a problem for an information-theoretic approach to language, it is a feature. Furthermore, word lengths are optimized on average according to predictability in context, as would be expected under and information theoretic analysis. Then we show that language comprehension appears to function as a noisy channel process, in line with communication theory. Given si, the intended sentence, and sp, the perceived sentence we propose that people maximize P(si | sp ), which is equivalent to maximizing the product of the prior P(si) and the likely noise processes P(si → sp ). We discuss how thinking of language as communication in this way can explain aspects of the origin of word order, most notably that most human languages are SOV with case-marking, or SVO without case-marking.
Information theoretic approaches to language universals
Finding explanations for the observed variation in human languages is the primary goal of linguistics, and promises to shed light on the nature of human cognition. One particularly attractive set of explanations is functional in nature, holding that language universals are grounded in the known properties of human information processing. The idea is that grammars of languages have evolved so that language users can communicate using sentences that are relatively easy to produce and comprehend. In this talk, I summarize results from explorations into several linguistic domains, from an information-processing point of view.