Loading…

Grounded language acquisition through the eyes and ears of a single child

Starting around 6 to 9 months of age, children begin acquiring their first words, linking spoken words to their visual counterparts. How much of this knowledge is learnable from sensory input with relatively generic learning mechanisms, and how much requires stronger inductive biases? Using longitud...

Full description

Saved in:

Bibliographic Details
Published in:	Science (American Association for the Advancement of Science) 2024-02, Vol.383 (6682), p.504-511
Main Authors:	Vong, Wai Keen, Wang, Wentao, Orhan, A Emin, Lake, Brenden M
Format:	Article
Language:	English
Subjects:	Associative learning Child Ear Eye Humans Knowledge Language Language Development Linguistics Machine learning Neural Networks, Computer Questions Supervised Machine Learning Words (language)
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Starting around 6 to 9 months of age, children begin acquiring their first words, linking spoken words to their visual counterparts. How much of this knowledge is learnable from sensory input with relatively generic learning mechanisms, and how much requires stronger inductive biases? Using longitudinal head-mounted camera recordings from one child aged 6 to 25 months, we trained a relatively generic neural network on 61 hours of correlated visual-linguistic data streams, learning feature-based representations and cross-modal associations. Our model acquires many word-referent mappings present in the child's everyday experience, enables zero-shot generalization to new visual referents, and aligns its visual and linguistic conceptual systems. These results show how critical aspects of grounded word meaning are learnable through joint representation and associative learning from one child's input.
ISSN:	0036-8075 1095-9203
DOI:	10.1126/science.adi1374