One crucial step in the construction of the human representation of the world is found at the boundary between two basic stimuli: visual experience and the sounds of language. In the developmental stage when the ability of recognizing objects consolidates, and that of segmenting streams of sounds into familiar chunks emerges, the mind gradually grasps the idea that utterances are related to the visible entities of the world. The model presented here is an attempt to reproduce this process, in its basic form, simulating the visual and auditory pathways, and a portion of the prefrontal cortex putatively responsible for more abstract representations of object classes. Simulations have been performed with the model, using a set of images of 100 real world objects seen from many different viewpoints and waveforms of labels of various classes of objects. Subsequently, categorization processes with and without language are also compared.