Language is extremely complex, extremely useful, found in all human populations and fundamentally different from any system of communication in other animals. Although there is a recognizable language processing network in the adult brain, it does not seem to rely on anatomical or neural structures, cell types, proteins or genes that differ fundamentally from those found in closely related species without language. This set of properties poses a major challenge for the language sciences: how do we reconcile observations about the uniqueness of language with those about the biological continuity of the underlying neural and genetic mechanisms? In our research we face this challenge head-on. While much of the debate in linguistics emphasized either uniqueness or continuity, we try to develop models that do justice to insights from both traditions.
We believe an important part of the solution to the puzzle is the mechanism of cultural evolution, as demonstrated by models in the iterated learning framework (Ferdinand & Zuidema, 2009). The key idea is that in the cultural transmission from generation to generation, language adapts to the peculiarities of pre-existing learning and processing mechanisms of the brain. These peculiarities might be there for reasons that have nothing to do with language or communication, but by adapting to them, languages can become much more complex than they could have otherwise (under constraints of expressivity, learnability and processability). This explains how small changes in the biology underlying language can have major effects on the languages that we are able to learn. However, research still has to elucidate what those small but essential biological changes are.
Many linguists suspect that one of the key biological innovations making language possible is the ability to hierarchically combine words into phrases, small phrases into larger phrases and ultimately into sentences and discourse, and to compute the meaning of such hierarchical combinations. We termed this property ’hierarchical compositionality’ (HC), and collected evidence from comparative biology research that indicates that HC is a central difference between human language and various animal communication systems (Zuidema, 2013). Interestingly, the difficulty of neural network models to account for HC provides additional evidence that a qualitative rewiring of the neural architecture is required to deal with the hierarchical structure of natural language (Jackendoff, 2002). HC is therefore a prime candidate for the biological innovations in humans that we are after. An important focus of our research has therefore become the possible neural representation of hierarchical structure in language.
Given ability to represent hierarchical structures, a new question arises: how do humans navigate the space of all possible hierarchical structures that could be assigned to a specific sentence? In computational linguistics this problem is usually conceived of as a problem of probabilistic inference: how do we learn a probabilistic grammar from examples (from a train set) such that the probability it assigns to the correct parse of a previously unseen sentence (from a test set) is higher than the probabilities assigned to incorrect parses? Train and test sets are usually very large (tens or hundreds of thousands of sentences) derived from naturally occurring written text. This is a very different approach than that of psycholinguistic studies, which focus on small sets of carefully constructed sentences and examine the neural activation profile while humans interpret those sentences.
We developed a framework to solve the probabilistic inference task, called Parsimonious Data-oriented Parsing (DOP; Zuidema, 2007), which is based on the (not so parsimonious) data-oriented parsing framework (Bod, 1992; Scha, 1999). We have had some great successes, in particular some of the best parsing results (with a pure, generative model) in English (on the Penn WSJ parsing benchmark test) with a model called Double-DOP (Sangati & Zuidema, 2011), which was also key in obtaining state-of-the-art results in parsing German and Dutch (van Cranenburgh and Bod, 2013). However, much work remains in connecting the insights we have obtained in this framework (about the actual probabilistic dependencies observable in large corpora) to research in the psycholinguistic tradition (c.f. Sangati & Keller, 2013).
Recursive Neural Networks
Recursive Neural Networks (RxNNs) provide the first step to bridge probabilistic grammars and neurally more plausible models of language processing. Unlike recurrent neural networks (RNNs), recursive neural networks are hybrid symbolic-connectionist models, that rely on a symbolic control structure to represent hierarchical structure. That said, they do provide a connectionist alternative to the treatment of syntactic and semantic categories as well as to probabilities in probabilistic grammars. In that sense, they complement our work on neural representation of hierarchical structure by providing a neural account for structural disambiguation and semantic composition. Recently, RxNNs have been successfully applied to a range of different tasks in computational linguistics and formal semantics, including constituency parsing, language modelling and recognizing logical entailment (e.g., Socher et al., 2013).
We have worked out an extension of the standard RxNN that we call Inside-Outside Semantics (Le & Zuidema, 2014). This extension involves adding next to the commonly used content vector a second vector to every node in a parse tree representing its context. Like content vectors, these context vectors can be computed compositionally. With this major innovation we obtained promising results on tasks that before did not allow the application of RxNNs: among these are semantic role labeling, missing word prediction, supervised dependency parsing and unsupervised dependency parsing. On the last two task we have obtained state-of-the-art results: above 93% accuracy on supervised parsing (Penn WSJ benchmark) and 66% accuracy on unsupervised parsing (previously best was 64%). Currently, we are investigating whether such networks can learn logical reasoning.
Predicting Brain Activity
In a new line of research, we take a closer how and where language processing is implemented in the brain. Specifically, we combine word embedding models, which map words onto numerical vectors, with human brain imaging data. This type of research was pioneered by Mitchell and colleagues (2008), who first used corpus-derived word representations to predict neural activation patterns when subjects are exposed to word stimuli. In a recent study (Abnar et al, 2018), we systematically compared how well a range of distinct word embedding models predict patterns of brain activity evoked by different classes of nouns. The underlying assumption is that the better a model predicts the evoked brain activity, the more likely it is that it reflects the mechanism applied by the brain to represent words. The highest accuracies of up to 80% were obtained for word embedding models that reconstruct the context of a given word.
Artificial Language Learning
In the research discussed so far, we investigate complex patterns in natural language use. A final strand of our research interests concerns Artificial Language Learning, where greatly simplified ’languages’ are designed and presented to participants in controlled experiments. One intriguing aspects of such experiments is that they can involve not only human adults, but also prelinguistic infants and non-human animals. Experiments in this tradition have provided evidence for human-specific biases and skills in discovering patterns. Our work in this area has focused on using insights from modelling and formal language theory to (re)interpret experimental findings.
One eye-catching result from this field that experiments claimed to show that humans and starlings can learn context-free languages, while Tamarin monkeys cannot (van Heijningen et al., 2009). Another finding is that while humans appear to be capable of both ’statistical learning’ and ’rule learning’, the latter cannot be found in rats (Toro & Trobalón, 2005). We have worked out a unified model that provides an excellent fit with both human and rat data (Alhama, Scha & Zuidema 2014). The model involved quantitatively different parameters for both species but no qualitative difference in the underlying mechanisms.