Filling-in visual motion with sounds
Introduction
Many objects and events in our everyday environments are perceived concurrently through several sensory systems. It is now widely accepted that inter-sensory correlations can significantly sharpen our perceptual capabilities, for example, audio-visual speech perception in noisy environments (i.e. Massaro, 2004, Sumby and Pollack, 1954) or for difficult-to-perceive phonemes in the second language (Navarra & Soto-Faraco, 2007). In addition, these inter-sensory correlations also can be used in constructing unitary and coherent scenes from the fragmentary information available across different sensory channels. For example, one can use transient sounds and fleeting sightings when tracking a moving animal racing through the undergrowth of a cluttered forest. In order to accomplish perception under these less than optimal conditions, perceptual processes taking place in our brain often implement a filling-in strategy whereby sensory illusions help to complete missing external information, for example, restoring the shape of partly occluded objects (see Komatsu, 2006 for a recent review), the completion of lines and surfaces in Kanizsa figures (e.g. Kanizsa, 1976), or the filling-in of visual information missing in the blind spot of the eye (e.g. Ramachandran, 1992). However, while it is widely accepted that human perception is multisensory, there is barely any systematic research addressing the role of cross-modal interactions in the process of perceptual filling-in.
One avenue to start addressing cross-modal filling-in is to capitalize on the fact that temporal accuracy in perception is higher for the auditory than for the visual modality (Kohlrausch, Fassel, & Dau, 2000), and as a consequence the former usually dominates over the latter in temporal perception. For example, sound can significantly change the temporal perception of visual events, such as in the “auditory driving” effect (e.g. Welch, Duttenhurt, & Warren, 1986), the temporal ventriloquism (Morein-Zamir, Soto-Faraco, & Kingstone, 2003), or the freezing effect (e.g. Vroomen & de Gelder, 2000). In fact, outside the laboratory, in audio-visual media such as cinema and television, this auditory influence on vision has been extensively exploited, as sound has been traditionally used for highlighting the temporal structure of rapid visual events. Consider, for example, hitting sounds in Kung-Fu fighting scenes (Chion, 1994) or Walt Disney’s “Mickey Mousing” technique, whereby motion picture sounds are tightly synchronized with the character’s movements (Thomas & Johnston, 1981). Interestingly, sound has also been used in cinema for creating an illusion of visual action continuity. In a classic example from George Lucas’ film “The empire strikes back” (1980), the visual illusion of a spaceship door sliding open is created using two successive stills, a door closed and a door opened, plus a “swapping” sound effect (Chion, 1994).
Recent laboratory studies focusing on cross-modal interactions have found that sound can indeed induce the illusion of seeing a visual event when there is none. For example, Shams, Kamitani, and Shimojo (2000) reported an illusion of multiple visual flashes that were produced, when one single brief visual stimulus was coupled with the multiple auditory beeps. In these experiments, participants were asked to count the number of times a flickering white disk had flashed when presented with one or more task-irrelevant brief sounds. The number of flashes reported by observers increased with the number of beeps. In a later ERP study, Shams, Kamitani, Thompson, and Shimojo (2001) reported that the beeps modulated early visual evoked potentials originating from the occipital cortex. Interestingly, the electrophysiological activity corresponding to the illusory flashes was found to be very similar to the activity produced when a flash was physically presented. Later works confirmed that illusory flash is a perceptual effect with the psychophysically assessable characteristics (McCormick & Mamassian, 2008) and showed that human performance did not depend on whether visual stimuli in orientation-discrimination tasks were real or illusory (Berger, Martelli, & Pelli, 2003).
Here, we addressed the potential effects of sound-induced illusory flashes when embedded into time-sampled object motion. We used motion perception, as it has been shown to be a subject to strong multisensory interactions (e.g. Kitagawa and Ichihara, 2002, Meyer and Wuerger, 2001, Soto-Faraco et al., 2002, Soto-Faraco et al., 2004; see Soto-Faraco, Kingstone, & Spence, 2003, for a review). In particular, we capitalized on the adaptation to motion in depth because it is known to produce consistent motion after-effects (MAE) both unimodally and cross-modally (Kitagawa & Ichihara, 2002). For example, after exposure to looming visual objects the viewer will perceive a steady visual stimulus as if it would be receding (Regan & Beverley, 1978). Importantly, as was shown by Kitagawa and Ichihara, the adaptation to visual continuous object motion in depth also results in an auditory changing-loudness after-effect thus highlighting the cross-modal nature of motion perception. Indeed, dynamic changes in sound intensity have been shown to be a strong cue to auditory motion in the horizontal plane (Lutfi & Wang, 1999), but this cue is even more important for the perception of auditory motion in depth, where binaural cues do not contribute to the localization in the median plane (Blauert, 1997). Therefore, keeping the same methodology as in Kitagawa and Ichihara (2002), in the rest of the text we will use changing-loudness after-effect as a correlate of auditory motion in the auditory motion after-effect (aMAE).
The goal of the present study was to measure the capacity of sounds to reinstate missing visual events. To this aim, we measured aMAE induced by sampled (discontinuous) visual events, and most critically, the interaction of acoustic and visual adaptors at different sampling rates. In Experiment 1, we compared the aMAE induced by high- and low-rate visual adaptors on their own with the same visual adaptors when combined with the high-rate acoustic events.
Section snippets
Experiment 1
Our first hypothesis was that (cross-modal) aMAE would depend on the frequency of the flashes representing discrete motion of the adapting visual object. Sparsely time-sampled visual motion should produce a lower aMAE compared to a higher rate (i.e. perceptually fused) moving stimuli. The second and critical hypothesis was that, if the combination of a slow train of flashes (flicker) with a rapid train of beeps (flutter) leads to the sound-induced illusory flash illusion (Shams, Kamitani, &
Experiment 2
Audio-visually induced aMAEs reported in Experiment 1 show that the high-rate acoustic flutter might compensate the low-rate visual flicker perception. In Experiment 1 however, we did not assess the subjective sensation of the participants. In Experiment 2, we combined the aMAE measure with a questionnaire assessing the subjective sensation of smoothness regarding the visual motion in depth (the adapting stimulus). If our filling-in hypothesis is to hold, then the low-rate visual flicker should
Experiment 3
Results from the two previous experiments reveal that the combinations of low-rate visual flicker with high-rate auditory stimuli significantly increase the aMAE and the subjective ratings of visual smoothness. Yet, the present results do not speak directly about whether the observed effects are specific to motion per se or else they result just from the more continuous temporal signal provided by the higher rate flutter. This is especially relevant in light of the fact that the role of
General discussion
The present findings reveal that discrete audio-visual motion stimuli follow the same cross-modal interaction patterns as continuous stimuli in their capacity to produce aMAE. More importantly, however, our results provide empirical evidence supporting that sound can fill-in sparsely time-sampled visual motion, possibly arising from the occurrence of illusory visual events in the low-rate flicker / high-rate flutter condition (VlAh, Fig. 2). As was discussed earlier in the introduction,
Acknowledgements
AV work was supported by PRESENCCIA project (27731) under the IST programme and Swedish Science Council (VR); SS-F work was supported by grants of the Ministerio de Educación y Ciencia (SEJ2007-64103/PSIC and CDS2007-00012 Consolider-Ingenio programme).
References (43)
- et al.
No direction-specific bimodal facilitation for audiovisual motion detection
Cognitive Brain Research
(2004) - et al.
What does the illusory flash look like?
Vision Research
(2008) - et al.
Multisensory integration of looming signals by rhesus monkeys
Neuron
(2004) - et al.
Auditory capture of vision: Examining temporal ventriloquism
Cognitive Brain Research
(2003) Motion after-effect after monocular adaptation to filled-in motion at the blind spot
Vision Research
(1995)- et al.
Illusory motion in depth: Aftereffect of adaptation to changing size
Vision Research
(1978) - et al.
Visual illusion induced by sound
Cognitive Brain Research
(2002) - et al.
Multisensory contributions to the perception of motion
Neuropsychologia
(2003) - et al.
The ventriloquist in motion: Illusory capture of dynamic information across sensory modalities
Cognitive Brain Research
(2002) - et al.
The effect of update rate on the sense of presence within virtual environments
Virtual Reality: The Journal of the Virtual Reality Society
(1995)