Speaker: Toni Martí,
Universitat de Barcelona, General Linguistics,
Data: December 15th 2014, Monday
Time: 11:00 – 12:30
Room: 3.2 computer Science Faculty (UPV/EHU)
Title: “A Vector Space Model approach to discover constructions”
Abstract:
In cognitive models, a construction is a conventional symbolic unit that involves a pairing of form and meaning. These units can be of different types depending on their complexity -morphemes, words, compound words, collocates, idioms and more abstract/schematic patterns. Cognitive Linguistics assumes the hypothesis that these constructions are learned from usage and stored in the human memory, where they are accessed during both the production and comprehension of language. Therefore, constructions are fundamental linguistic units for infering the structure of language, for infering speakers’ knowledge of language, and their identification is crucial for language understanding. Although a broad range of these linguistic structures have been subjected to linguistic analysis, we assume that there exist a huge number of constructions that are still to be discovered. There are different approaches to the task of identifying and discovering them, depending on the type of construction we are looking for or dealing with. This fact allows for a wide range of methods and approaches aiming at the treatment of this kind of linguistic units.
From the point of view of the methodology and knowledge applied in the automatic detection of constructions, we can distinguish two main approaches: those that have been guided by previously gathered empirical data and those that apply methods oriented to learning constructions from plain text or automatically annotated text. That is, those methods that do not use manually annotated data nor are based on ad hoc linguistic knowledge.
Our proposal is based on the Harris distributional hypothesis, which states that semantically related words (or other linguistic units) will share the same context. We propose a new specific hypothesis within the family of distributional hypotheses the pattern-construction hypothesis, which states that those contexts that are relevant to the definition of a cluster of semantically related words, tend to be (part of) a lexico-syntactic construction. Following this hypothesis, we implemented a methodology that uses Vector Space Models (VSM) to discover candidates for consideration as constructions from a large automatically processed corpus. This approach is in line with the idea proposed by Landauer et al. 2007, who states that VSMs are plausible models of some aspects of human cognition.
[…] Talk: A Vector Space Model approach to discover constructions (Toni Martí, 2014-12-15) […]