Elsevier

Acta Psychologica

Volume 129, Issue 2, October 2008, Pages 249-254
Acta Psychologica

Filling-in visual motion with sounds

https://doi.org/10.1016/j.actpsy.2008.08.004Get rights and content

Abstract

Information about the motion of objects can be extracted by multiple sensory modalities, and, as a consequence, object motion perception typically involves the integration of multi-sensory information. Often, in naturalistic settings, the flow of such information can be rather discontinuous (e.g. a cat racing through the furniture in a cluttered room is partly seen and partly heard). This study addressed audio-visual interactions in the perception of time-sampled object motion by measuring adaptation after-effects. We found significant auditory after-effects following adaptation to unisensory auditory and visual motion in depth, sampled at 12.5 Hz. The visually induced (cross-modal) auditory motion after-effect was eliminated if visual adaptors flashed at half of the rate (6.25 Hz). Remarkably, the addition of the high-rate acoustic flutter (12.5 Hz) to this ineffective, sparsely time-sampled, visual adaptor restored the auditory after-effect to a level comparable to what was seen with high-rate bimodal adaptors (flashes and beeps). Our results suggest that this auditory-induced reinstatement of the motion after-effect from the poor visual signals resulted from the occurrence of sound-induced illusory flashes. This effect was found to be dependent both on the directional congruency between modalities and on the rate of auditory flutter. The auditory filling-in of time-sampled visual motion supports the feasibility of using reduced frame rate visual content in multisensory broadcasting and virtual reality applications.

Introduction

Many objects and events in our everyday environments are perceived concurrently through several sensory systems. It is now widely accepted that inter-sensory correlations can significantly sharpen our perceptual capabilities, for example, audio-visual speech perception in noisy environments (i.e. Massaro, 2004, Sumby and Pollack, 1954) or for difficult-to-perceive phonemes in the second language (Navarra & Soto-Faraco, 2007). In addition, these inter-sensory correlations also can be used in constructing unitary and coherent scenes from the fragmentary information available across different sensory channels. For example, one can use transient sounds and fleeting sightings when tracking a moving animal racing through the undergrowth of a cluttered forest. In order to accomplish perception under these less than optimal conditions, perceptual processes taking place in our brain often implement a filling-in strategy whereby sensory illusions help to complete missing external information, for example, restoring the shape of partly occluded objects (see Komatsu, 2006 for a recent review), the completion of lines and surfaces in Kanizsa figures (e.g. Kanizsa, 1976), or the filling-in of visual information missing in the blind spot of the eye (e.g. Ramachandran, 1992). However, while it is widely accepted that human perception is multisensory, there is barely any systematic research addressing the role of cross-modal interactions in the process of perceptual filling-in.

One avenue to start addressing cross-modal filling-in is to capitalize on the fact that temporal accuracy in perception is higher for the auditory than for the visual modality (Kohlrausch, Fassel, & Dau, 2000), and as a consequence the former usually dominates over the latter in temporal perception. For example, sound can significantly change the temporal perception of visual events, such as in the “auditory driving” effect (e.g. Welch, Duttenhurt, & Warren, 1986), the temporal ventriloquism (Morein-Zamir, Soto-Faraco, & Kingstone, 2003), or the freezing effect (e.g. Vroomen & de Gelder, 2000). In fact, outside the laboratory, in audio-visual media such as cinema and television, this auditory influence on vision has been extensively exploited, as sound has been traditionally used for highlighting the temporal structure of rapid visual events. Consider, for example, hitting sounds in Kung-Fu fighting scenes (Chion, 1994) or Walt Disney’s “Mickey Mousing” technique, whereby motion picture sounds are tightly synchronized with the character’s movements (Thomas & Johnston, 1981). Interestingly, sound has also been used in cinema for creating an illusion of visual action continuity. In a classic example from George Lucas’ film “The empire strikes back” (1980), the visual illusion of a spaceship door sliding open is created using two successive stills, a door closed and a door opened, plus a “swapping” sound effect (Chion, 1994).

Recent laboratory studies focusing on cross-modal interactions have found that sound can indeed induce the illusion of seeing a visual event when there is none. For example, Shams, Kamitani, and Shimojo (2000) reported an illusion of multiple visual flashes that were produced, when one single brief visual stimulus was coupled with the multiple auditory beeps. In these experiments, participants were asked to count the number of times a flickering white disk had flashed when presented with one or more task-irrelevant brief sounds. The number of flashes reported by observers increased with the number of beeps. In a later ERP study, Shams, Kamitani, Thompson, and Shimojo (2001) reported that the beeps modulated early visual evoked potentials originating from the occipital cortex. Interestingly, the electrophysiological activity corresponding to the illusory flashes was found to be very similar to the activity produced when a flash was physically presented. Later works confirmed that illusory flash is a perceptual effect with the psychophysically assessable characteristics (McCormick & Mamassian, 2008) and showed that human performance did not depend on whether visual stimuli in orientation-discrimination tasks were real or illusory (Berger, Martelli, & Pelli, 2003).

Here, we addressed the potential effects of sound-induced illusory flashes when embedded into time-sampled object motion. We used motion perception, as it has been shown to be a subject to strong multisensory interactions (e.g. Kitagawa and Ichihara, 2002, Meyer and Wuerger, 2001, Soto-Faraco et al., 2002, Soto-Faraco et al., 2004; see Soto-Faraco, Kingstone, & Spence, 2003, for a review). In particular, we capitalized on the adaptation to motion in depth because it is known to produce consistent motion after-effects (MAE) both unimodally and cross-modally (Kitagawa & Ichihara, 2002). For example, after exposure to looming visual objects the viewer will perceive a steady visual stimulus as if it would be receding (Regan & Beverley, 1978). Importantly, as was shown by Kitagawa and Ichihara, the adaptation to visual continuous object motion in depth also results in an auditory changing-loudness after-effect thus highlighting the cross-modal nature of motion perception. Indeed, dynamic changes in sound intensity have been shown to be a strong cue to auditory motion in the horizontal plane (Lutfi & Wang, 1999), but this cue is even more important for the perception of auditory motion in depth, where binaural cues do not contribute to the localization in the median plane (Blauert, 1997). Therefore, keeping the same methodology as in Kitagawa and Ichihara (2002), in the rest of the text we will use changing-loudness after-effect as a correlate of auditory motion in the auditory motion after-effect (aMAE).

The goal of the present study was to measure the capacity of sounds to reinstate missing visual events. To this aim, we measured aMAE induced by sampled (discontinuous) visual events, and most critically, the interaction of acoustic and visual adaptors at different sampling rates. In Experiment 1, we compared the aMAE induced by high- and low-rate visual adaptors on their own with the same visual adaptors when combined with the high-rate acoustic events.

Section snippets

Experiment 1

Our first hypothesis was that (cross-modal) aMAE would depend on the frequency of the flashes representing discrete motion of the adapting visual object. Sparsely time-sampled visual motion should produce a lower aMAE compared to a higher rate (i.e. perceptually fused) moving stimuli. The second and critical hypothesis was that, if the combination of a slow train of flashes (flicker) with a rapid train of beeps (flutter) leads to the sound-induced illusory flash illusion (Shams, Kamitani, &

Experiment 2

Audio-visually induced aMAEs reported in Experiment 1 show that the high-rate acoustic flutter might compensate the low-rate visual flicker perception. In Experiment 1 however, we did not assess the subjective sensation of the participants. In Experiment 2, we combined the aMAE measure with a questionnaire assessing the subjective sensation of smoothness regarding the visual motion in depth (the adapting stimulus). If our filling-in hypothesis is to hold, then the low-rate visual flicker should

Experiment 3

Results from the two previous experiments reveal that the combinations of low-rate visual flicker with high-rate auditory stimuli significantly increase the aMAE and the subjective ratings of visual smoothness. Yet, the present results do not speak directly about whether the observed effects are specific to motion per se or else they result just from the more continuous temporal signal provided by the higher rate flutter. This is especially relevant in light of the fact that the role of

General discussion

The present findings reveal that discrete audio-visual motion stimuli follow the same cross-modal interaction patterns as continuous stimuli in their capacity to produce aMAE. More importantly, however, our results provide empirical evidence supporting that sound can fill-in sparsely time-sampled visual motion, possibly arising from the occurrence of illusory visual events in the low-rate flicker / high-rate flutter condition (VlAh, Fig. 2). As was discussed earlier in the introduction,

Acknowledgements

AV work was supported by PRESENCCIA project (27731) under the IST programme and Swedish Science Council (VR); SS-F work was supported by grants of the Ministerio de Educación y Ciencia (SEJ2007-64103/PSIC and CDS2007-00012 Consolider-Ingenio programme).

References (43)

  • T.D. Berger et al.

    Flicker flutter: Is an illusory event as good as the real thing?

    Journal of Vision

    (2003)
  • J. Blauert

    Spatial hearing

    (1997)
  • D.H. Brainard

    The psychophysics toolbox

    Spatial Vision

    (1997)
  • J.L. Brown

    Flicker and intermittent stimulation

  • M. Chion

    Audio-vision: Sound on screen

    (1994)
  • T.N. Cornsweet

    The staircase-method in psychophysics

    American Journal of Psychology

    (1962)
  • J. Hong et al.

    Influences of attention on auditory aftereffects following purely visual adaptation

    Spatial Vision

    (2006)
  • D.L. Jewett et al.

    Human sensory-evoked responses differ coincident with either “fusion-memory” or “flash-memory”

    BMC Neuroscience

    (2006)
  • Y. Kamitani et al.

    Sound-induced visual “rabbit”

    Journal of Vision

    (2001)
  • G. Kanizsa

    Subjective contours

    Scientific American

    (1976)
  • N. Kitagawa et al.

    Hearing visual motion in depth

    Nature

    (2002)
  • Cited by (0)

    View full text