Agreement Groups Analysis of Mother-child Discourse

We propose a distributional framework for analysing linguistic corpora. The analysis is based on groups of minimally contrasting utterances. Such groups can be considered as representing agreement relations. Agreement groups can be related to the notion of ‘frame' used in its various senses in the research literature: item-based phrases (Cameron-Faulkner et al. 2003, Stoll et al. 2009), frequent frames (Mintz 2003, Chemla et al. 2009, Wang and Mintz 2010), flexible frames (St. Clair et al. 2010). Since agreement groups provide a means of representing novel sentences on the basis of sentences already encountered, we tested to what extent they can account for novel utterances in a database. We used the Anne files from the Manchester corpus (Theakston et al. 2001) of the CHILDES database (MacWhinney 2000). It was examined to what extent the agreement groups at a given stage of development can account for the utterances of the immediately following 30-minute session. Agreement groups were extracted from the body of utterances encountered up to the test stage. Examining the data of approximately one year we found that at each developmental stage some 19% - 41% of the utterances of the new session were compatible with the agreement groups extracted from the previous sessions. This amounts to a 6% - 10.3% proportion of novel utterances having been compatible with some groups. The results were slightly improved when a "guessing" mechanism was added. Qualitatively, we also found that the formation of groups may support categorisation, and the actual emergence of grammatical agreement.

Keywords: agreement, categorisation, group formation, distributional analysis, language acquisition


