Distributional Semantics meets Embodied Cognition: Flickr® as a database of semantic features

Distributional models such as Latent Semantic Analysis (LSA, Landauer, Dumais 1997) generate semantic spaces based on the co-occurrences of words in linguistic contexts. The semantic representations that emerge from these models are based solely on linguistic information, leaving aside the information that we retrieve from perceptual experiences. The proposed analytical approach applies the methods of distributional semantics to Flickr®, a corpus of images enhanced with metadata (tags), expressing a wide range of concepts, including perceptual features triggered by the experiences captured in the photographs. A case study on the domain of colors shows how a distributional analysis based on Flickr® can produce semantic representations for color terms that better resemble the similarity judgments provided by humans, when compared to those that emerge from distributional models based on solely linguistic information.

Key words: distributional semantics, grounded cognition, corpus analysis, annotated images.

References

Baroni, M. and Lenci, A. (2010). Distributional Memory: a general framework for corpus-based semantics. Computational Linguistics 36 (4): 673-721.

Barsalou, L.W., Santos, A., Simmons, W.K., and Wilson, C.D. (2008). Language and simulation in conceptual       processing. In M. De Vega, A. Glenberg and A. Graesser (eds.), Symbols and embodiment: debates on meaning and cognition. Oxford: University Press. pp. 245-283.

Barsalou, L.W. (2012). The human conceptual system. In M. Spivey, K. McRae, and M. Joanisse (eds.), The Cambridge handbook of psycholinguistics. New York: Cambridge University Press. pp. 239-258.

Bouma, G. (2009). Normalized (Pointwise) Mutual Information in Collocation Extraction. In  C. Eckart de Castilho and Stede (eds.),  From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009. pp. 31-40.

Bruni, E., Tran, G.B., and Baroni, M. (2011). Distributional semantics from text and images. Proceedings of the EMNLP 2011, Geometrical Models for Natural Language Semantics Workshop. pp. 22-32.

Connolly, A.C., Gleitman, L.R., and Thompson-Schill, L.S. (2007). Effect of Congenital Blindness on the Semantic Representation of Some Everyday Concepts. Proceedings of the National Academy of Sciences. pp. 8241-6.

Feng, Y. and Lapata, M. (2010). Visual information in semantic representations. Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. pp. 91-99.

Firth, J. R. (1957). Papers in Linguistics 1934-1951. London: Oxford University Press.

Fodor, J.A. (1975). The Language Of Thought. New York: Crowell.

Glenberg, A.M. (1997). What memory is for. Behavioral and Brain Sciences 20: 1-55.

Glenberg, A.M., and Robertson, D.A. (2000). Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning. Journal of Memory and Language 43: 379- 401.

Harnad, S. (1990). The Symbol Grounding Problem. Physica 42: 335-346.

Harris, Z. (1954). Distributional structure. Word 10 (23): 146-162.           

Haugeland, J. (1985). Artificial Intelligence: The Very Idea. Cambridge: Mit Press.

Hunt, R.W.G. (2004). The Reproduction of Colour (6th ed.). Chichester UK: Wiley.

Izmailov, C. A. and Sokolov, E. N. (1992). A semantic space of color names. Psychological Science 3 (2): 105-110.

Landauer, T.K. and Dumais, S.T. (1997). A solution to Plato's          problem: the Latent Semantic Analysis theory of the acquisition, induction and representation of knowledge. Psychological Review 104 (2):  211-240.

Louwerse, M.M. and Jeuniaux, P. (2010). The linguistic and embodied nature of conceptual processing. Cognition 114: 96-104.

McRae, K., Cree, G. S., Seidenberg, M. S., and McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavioral Research Methods, Instruments, and Computers 37: 547-559.

Miller, G.A. and Charles W.G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 1-28.

Pecher, D. and Zwaan R. (eds.) (2005). Grounding Cognition: The Role of Perception and Action in Memory, Language, and Thinking. Cambridge: University Press.

Pylyshyn, Z.W. (1984). Computation and cognition: Toward a foundation for cognitive science. Cambridge, MA: MIT Press.

Sahlgren, M. (2006). The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words. Stockholm: University Press.

Steels, L. (2006). Collaborative tagging as distributed cognition. Pragmatics and Cognition 14 (2): 287-292.

Thaler, S., Simperl E., Siorpaes K., and Hofer C. (2011).  A survey on games for knowledge acquisition. STI Technical Report.

Vigliocco, G., Meteyard, L., Andrews, M., and Kousta, S. (2009). Toward a theory of semantic representation. Language and Cognition 1 (2): 215-244.

Vinson, D.P. and Vigliocco, G. (2008). Semantic feature production norms for a large set of objects and events. Behavioral Research Methods 40 (1): 183-190.

Wu, L.L. and Barsalou, L.W. (2009). Perceptual simulation in conceptual combination: Evidence from property generation. Acta Psychologica 132: 173-189.

Zwaan, R.A. (2004). The immersed experiencer: Toward an embodied theory of language comprehension. In B. H. Ross (ed.). The psychology of language and motivation. New York: Academic Press. 

Download

Download full text of the article as PDF