LIRa session: Mehrnoosh Sadrzadeh

Date and Time: Thursday, November 5th 2020, 16:30-18:00, Amsterdam time.

Venue: online.

Title: Linguistic Random Matrix Theory

Abstract.

Constructions in type-driven compositional distributional semantics associate large collections of matrices of size D to linguistic corpora. Following the work of Wigner and Dyson on developing random matrix theory for analysing energy levels of nuclei using distributions of matrices of random variables, we develop the proposal of analysing the statistical characteristics of linguistic data in the framework of permutation invariant matrices. The observables in this framework are permutation invariant polynomial functions of the matrix entries, which correspond to directed graphs. Using the general 13-parameter permutation invariant Gaussian matrix models recently solved, we find, using a dataset of matrices constructed via standard techniques in distributional semantics, that the expectation values of a large class of cubic and quartic observables show high gaussianity at levels between 90 to 99 percent. We find evidence that observables with similar matrix model characteristics of gaussianity also have high degrees of correlation between the ranked lists of words associated to these observables. This is joint work with Ramgoolam and Sword from String Theory in QMUL and with Kartsaklis from Apple.

See here for the recording of the talk.