Research report
No direction-specific bimodal facilitation for audiovisual motion detection

https://doi.org/10.1016/j.cogbrainres.2003.11.011Get rights and content

Abstract

After several decades of unimodal perceptual research, interest is turning increasingly to cross-modal interactions. At a physiological level, the existence of bimodal cells is well documented and it is known that correlated audiovisual input enhances localisation and orienting behaviours. Audiovisual perceptual interactions have also been demonstrated (e.g., the well-known McGurk effect). The present study explores motion perception and asks whether correlated audiovisual motion signals would be better detected than unimodal motions or bimodal motions in opposing directions. Using a dynamic random-dot field with variable motion coherence as a visual stimulus, together with an auditory motion defined by a stereo noise source smoothly translating along a horizontal trajectory, we find that correlated bimodal motion yields only a slight improvement (approximately a square root of two advantage) in detection threshold relative to unimodal detection. The size of this benefit is consistent with a statistical advantage rather than a bimodal facilitation account. Moreover, anticorrelated bimodal motion showed the same modest improvement, again speaking against linear summation but consistent with statistical combination of visual and auditory signals. These findings were replicated in peripheral as well as in central vision, and with translating visual objects as well as with spatially distributed visual motion. The superadditivity observed neurally (especially in deep-layer superior collicular cells), when weak unimodal signals are combined in bimodal cells does not apply to the detection of linear translational motion.

Introduction

Many objects in the external environment are represented in two or more sensory modalities. Touch and vision are commonly co-activated, when objects are taken in hand and inspected. Audition and vision are also frequently activated by the same stimulus event (the sight and sound of speeding cars is a common example). Despite the modularity of our sensory systems, we perceive a unified and coherent world. Indeed, by synthesising complementary information, we enhance the likelihood that our internal perceptions will accurately reflect external realities [7], enabling us to respond more rapidly and appropriately. Two of the interesting questions which arise from this synthesis are: how is information combined across modalities; and does the perceptual system capitalise on complementary information about the same stimulus to improve its performance. The experiments we present deal primarily with the latter question. Specifically, we ask whether the ability to detect movement is improved when that movement is represented in both auditory and visual modalities.

The combination of information across senses has been heavily researched at the neurophysiological level [30], with particular focus on the superior colliculus. Its deep layers contain many ‘multisensory’ neurons–neurons that receive unimodal sensory input from more than one source. Multisensory cells may be bimodal, or even trimodal, with audiovisual bimodal cells a common variety. These are arranged in a topographical representation of external space and have separate but overlapping auditory and visual receptive fields so that they respond to audiovisual input from a single location [35]. Although they can be driven unimodally, they exhibit a strong non-linear response known as “superadditivity” [12], [15] when driven bimodally by spatiotemporally correlated audiovisual input.

Behavioural and attentional studies have demonstrated that cross-modal interactions do indeed occur. Behaviourally, it known that bimodal superadditivity improves orienting behaviours such as eye movements [8], [30], [32], [37] and aids stimulus localisation [1], [31]. This likely reflects response enhancement to correlated bimodal stimuli creating a salient peak on collicular topography. Several studies have also confirmed the physiological observation that response enhancement (superadditivity) is maximal when auditory and visual inputs arrive synchronously [12], [16], although for perceptual tasks the temporal window within which auditory and visual stimuli can be phenomenally integrated is rather broad [13], [21]. In contrast, discordant stimuli lead to “response depression” [11], [15] and a corresponding decrease in efficiency of orienting behaviours [32], [37]. It has also been shown that attending to a particular spatial location for visual stimuli improves task performance for auditory stimuli (and vice versa) in that location [27], [28], suggesting a linked audiovisual topography. This is consistent with the fact that the superior colliculus (containing multi-modal cells) is strongly implicated in orienting to salient stimuli, whether overtly with eye movements or covertly with attention [6], [26].

Apart from its role in orienting, the superior colliculus also has strong reciprocal links, via the pulvinar, with middle-temporal (MT) cortical area [29]. MT is an area specialised for processing visual movement and activity in this area is strongly correlated with visual motion perception [2], [3]. Outputs from MT project directly to area VIP where they combine with input from auditory areas to create bimodal cells with strong motion selectivity [5], [9], [14]. Based on the evidence for strong audiovisual interactions in sensory processing, both at an early, subcortical level as well as at higher, motion-specialised cortical areas, we conducted experiments to examine whether sensitivity to bimodally represented movement might be improved relative to unimodal baselines when that movement is spatiotemporally concordant in both audition and vision. In particular, we focused on thresholds for motion detection, as the neurophysiological evidence suggests that response enhancement should be stronger when the unimodal stimuli are weak [15]. On this principle, unimodal stimuli too weak to be detected alone could conceivably become detectable when part of a correlated bimodal stimulus. We therefore measured motion detection thresholds unimodally for vision and for audition, and then again when the stimuli were presented together as a bimodal motion stimulus. We compared conditions in which the auditory and visual components were either matched in direction (correlated) or were opposed (anticorrelated). This was repeated for visual motion in central and in peripheral vision, and for visual stimuli that were spatially distributed or were a spatially localised object. The results show no evidence of a facilitative audiovisual interaction for detection of linear translations, whether in the central or peripheral field.

Section snippets

Stimuli

The auditory stimuli were created digitally at a sampling rate of 65 kHz and played over loudspeakers (Yamaha MSP5) which lay in the same plane as the video monitor, 45 cm from the observer, and ±30 cm from the monitor's centre. The sound was produced by low-pass filtering white noise using a 5th-order Butterworth filter with a cut-off frequency of 2 kHz. Auditory movement was created by playing the filtered signal in stereo and varying the magnitude and sign of interaural time differences so

Visual and auditory motion thresholds

We first measured separately the coherence thresholds for discriminating the direction of motion of a visual and of an auditory sound source. Both stimuli were “broad-band” and designed to be as similar as possible (see illustration in Fig. 1). The visual stimulus—a field of 200 dots in which a random subset was displaced either leftward or rightward to create a sensation of coherent motion—was chosen because it is known that variations in its motion coherence level elicit responses in visual

Discussion

Taken together, these results show a small non-directional gain in bimodal movement detection for bimodal motion, but contain no evidence of a facilitative audiovisual interaction. This holds true for both coherently moving visual objects and for spatially distributed motions, in central and in peripheral vision. These conclusions agree with two recent reports [18], [39]. As in our study, both used spatially distributed random-dots to create visual motion that was either weakly visible or

Acknowledgements

David Alais is supported by a Marie Curie Fellowship from the European Commission.

References (39)

  • K.H. Britten et al.

    The analysis of visual motion: a comparison of neuronal and psychophysical performance

    J. Neurosci.

    (1992)
  • K.H. Britten et al.

    A relationship between behavioral choice and the visual responses of neurons in macaque MT

    Vis. Neurosci.

    (1996)
  • S. Carlile

    Virtual Auditory Space: Generation and Applications

    (1996)
  • C.L. Colby et al.

    Ventral intraparietal area of the macaque: anatomic location and visual response properties

    J. Neurophysiol.

    (1993)
  • R. Desimone et al.

    Attentional control of visual perception: cortical and subcortical mechanisms

    Cold Spring Harbor Symp.

    (1990)
  • M.O. Ernst et al.

    Humans integrate visual and haptic information in a statistically optimal fashion

    Nature

    (2002)
  • M.A. Frens et al.

    Spatial and temporal factors determine audio–visual interactions in human saccadic eye movements

    Percept. Psychophys.

    (1995)
  • J.M. Hillis et al.

    Combining sensory information: mandatory fusion within but not between the senses

    Science

    (2002)
  • D. Kadunce et al.

    Mechanisms of within- and cross-modality suppression in superior colliculus

    J. Neurophysiol.

    (1997)
  • Cited by (104)

    • The role of the interaction between the inferior parietal lobule and superior temporal gyrus in the multisensory Go/No-go task

      2022, NeuroImage
      Citation Excerpt :

      At the behavioral level, the signal detection analyses showed no significant difference in the perceptual sensitivity (d’) between the audiovisual and visual conditions. Behavioral facilitation may not be related to sensory enhancement in the present study because both audiovisual and visual stimuli were nonambiguous and easy to process (Alais and Burr, 2004; Sanabria et al., 2007; Tang et al., 2021; Wuerger et al., 2003). We used the location-based Go/No-go task with homogeneous stimuli, i.e., presenting audiovisual and visual stimuli that elicited responses on the left and right sides but not to the same stimuli in the center.

    • Audio–visual interaction in visual motion detection: Synchrony versus Asynchrony

      2017, Journal of Optometry
      Citation Excerpt :

      Additionally, congruence is an essential factor in multisensory motion perception. If both stimuli move in the same direction, detection thresholds can be significantly reduced while incongruent stimuli lead to increased detection thresholds.7,13–15,18,45,49,53,59,60 In some of these studies, clear congruence effects were shown, some studies only demonstrated the clear advantageous effect of multimodal compared to unimodal presentation.

    View all citing articles on Scopus
    View full text