Distributional Semantics meets Embodied Cognition: Flickr® as a database of semantic features

Distributional models such as Latent Semantic Analysis (LSA, Landauer, Dumais 1997) generate semantic spaces based on the co-occurrences of words in linguistic contexts. The semantic representations that emerge from these models are based solely on linguistic information, leaving aside the information that we retrieve from perceptual experiences. The proposed analytical approach applies the methods of distributional semantics to Flickr®, a corpus of images enhanced with metadata (tags), expressing a wide range of concepts, including perceptual features triggered by the experiences captured in the photographs. A case study on the domain of colors shows how a distributional analysis based on Flickr® can produce semantic representations for color terms that better resemble the similarity judgments provided by humans, when compared to those that emerge from distributional models based on solely linguistic information.

Key words: distributional semantics, grounded cognition, corpus analysis, annotated images.


