Research reportNo direction-specific bimodal facilitation for audiovisual motion detection
Introduction
Many objects in the external environment are represented in two or more sensory modalities. Touch and vision are commonly co-activated, when objects are taken in hand and inspected. Audition and vision are also frequently activated by the same stimulus event (the sight and sound of speeding cars is a common example). Despite the modularity of our sensory systems, we perceive a unified and coherent world. Indeed, by synthesising complementary information, we enhance the likelihood that our internal perceptions will accurately reflect external realities [7], enabling us to respond more rapidly and appropriately. Two of the interesting questions which arise from this synthesis are: how is information combined across modalities; and does the perceptual system capitalise on complementary information about the same stimulus to improve its performance. The experiments we present deal primarily with the latter question. Specifically, we ask whether the ability to detect movement is improved when that movement is represented in both auditory and visual modalities.
The combination of information across senses has been heavily researched at the neurophysiological level [30], with particular focus on the superior colliculus. Its deep layers contain many ‘multisensory’ neurons–neurons that receive unimodal sensory input from more than one source. Multisensory cells may be bimodal, or even trimodal, with audiovisual bimodal cells a common variety. These are arranged in a topographical representation of external space and have separate but overlapping auditory and visual receptive fields so that they respond to audiovisual input from a single location [35]. Although they can be driven unimodally, they exhibit a strong non-linear response known as “superadditivity” [12], [15] when driven bimodally by spatiotemporally correlated audiovisual input.
Behavioural and attentional studies have demonstrated that cross-modal interactions do indeed occur. Behaviourally, it known that bimodal superadditivity improves orienting behaviours such as eye movements [8], [30], [32], [37] and aids stimulus localisation [1], [31]. This likely reflects response enhancement to correlated bimodal stimuli creating a salient peak on collicular topography. Several studies have also confirmed the physiological observation that response enhancement (superadditivity) is maximal when auditory and visual inputs arrive synchronously [12], [16], although for perceptual tasks the temporal window within which auditory and visual stimuli can be phenomenally integrated is rather broad [13], [21]. In contrast, discordant stimuli lead to “response depression” [11], [15] and a corresponding decrease in efficiency of orienting behaviours [32], [37]. It has also been shown that attending to a particular spatial location for visual stimuli improves task performance for auditory stimuli (and vice versa) in that location [27], [28], suggesting a linked audiovisual topography. This is consistent with the fact that the superior colliculus (containing multi-modal cells) is strongly implicated in orienting to salient stimuli, whether overtly with eye movements or covertly with attention [6], [26].
Apart from its role in orienting, the superior colliculus also has strong reciprocal links, via the pulvinar, with middle-temporal (MT) cortical area [29]. MT is an area specialised for processing visual movement and activity in this area is strongly correlated with visual motion perception [2], [3]. Outputs from MT project directly to area VIP where they combine with input from auditory areas to create bimodal cells with strong motion selectivity [5], [9], [14]. Based on the evidence for strong audiovisual interactions in sensory processing, both at an early, subcortical level as well as at higher, motion-specialised cortical areas, we conducted experiments to examine whether sensitivity to bimodally represented movement might be improved relative to unimodal baselines when that movement is spatiotemporally concordant in both audition and vision. In particular, we focused on thresholds for motion detection, as the neurophysiological evidence suggests that response enhancement should be stronger when the unimodal stimuli are weak [15]. On this principle, unimodal stimuli too weak to be detected alone could conceivably become detectable when part of a correlated bimodal stimulus. We therefore measured motion detection thresholds unimodally for vision and for audition, and then again when the stimuli were presented together as a bimodal motion stimulus. We compared conditions in which the auditory and visual components were either matched in direction (correlated) or were opposed (anticorrelated). This was repeated for visual motion in central and in peripheral vision, and for visual stimuli that were spatially distributed or were a spatially localised object. The results show no evidence of a facilitative audiovisual interaction for detection of linear translations, whether in the central or peripheral field.
Section snippets
Stimuli
The auditory stimuli were created digitally at a sampling rate of 65 kHz and played over loudspeakers (Yamaha MSP5) which lay in the same plane as the video monitor, 45 cm from the observer, and ±30 cm from the monitor's centre. The sound was produced by low-pass filtering white noise using a 5th-order Butterworth filter with a cut-off frequency of 2 kHz. Auditory movement was created by playing the filtered signal in stereo and varying the magnitude and sign of interaural time differences so
Visual and auditory motion thresholds
We first measured separately the coherence thresholds for discriminating the direction of motion of a visual and of an auditory sound source. Both stimuli were “broad-band” and designed to be as similar as possible (see illustration in Fig. 1). The visual stimulus—a field of 200 dots in which a random subset was displaced either leftward or rightward to create a sensation of coherent motion—was chosen because it is known that variations in its motion coherence level elicit responses in visual
Discussion
Taken together, these results show a small non-directional gain in bimodal movement detection for bimodal motion, but contain no evidence of a facilitative audiovisual interaction. This holds true for both coherently moving visual objects and for spatially distributed motions, in central and in peripheral vision. These conclusions agree with two recent reports [18], [39]. As in our study, both used spatially distributed random-dots to create visual motion that was either weakly visible or
Acknowledgements
David Alais is supported by a Marie Curie Fellowship from the European Commission.
References (39)
A system of multimodal areas in the primate brain
Neuron
(2001)- et al.
Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli
Brain Res. Cogn. Brain Res.
(2003) - et al.
What is noise for the motion system?
Vis. Res.
(1996) - et al.
Differences in the processing of short-range apparent motion at small and large displacements
Vis. Res.
(1990) - et al.
The organization of connections between the pulvinar and visual area MT in the macaque monkey
Brain Res.
(1983) - et al.
Neurons and behavior: the same rules of multisensory integration apply
Brain Res.
(1988) - et al.
Signal detection theory in the 2AFC paradigm: attention, channel uncertainty and probability summation
Vis. Res.
(2000) - et al.
Temporal contrast sensitivity and cortical magnification
Vis. Res.
(1982) - et al.
Spatiotemporal contrast sensitivity and visual field locus
Vis. Res.
(1983) - et al.
An auditory–visual space: evidence for its reality
Percept. Psychophys.
(1974)
The analysis of visual motion: a comparison of neuronal and psychophysical performance
J. Neurosci.
A relationship between behavioral choice and the visual responses of neurons in macaque MT
Vis. Neurosci.
Virtual Auditory Space: Generation and Applications
Ventral intraparietal area of the macaque: anatomic location and visual response properties
J. Neurophysiol.
Attentional control of visual perception: cortical and subcortical mechanisms
Cold Spring Harbor Symp.
Humans integrate visual and haptic information in a statistically optimal fashion
Nature
Spatial and temporal factors determine audio–visual interactions in human saccadic eye movements
Percept. Psychophys.
Combining sensory information: mandatory fusion within but not between the senses
Science
Mechanisms of within- and cross-modality suppression in superior colliculus
J. Neurophysiol.
Cited by (104)
The effects of synchronous and asynchronous steady-state auditory-visual motion on EEG characteristics in healthy young adults
2024, Expert Systems with ApplicationsThe role of the interaction between the inferior parietal lobule and superior temporal gyrus in the multisensory Go/No-go task
2022, NeuroImageCitation Excerpt :At the behavioral level, the signal detection analyses showed no significant difference in the perceptual sensitivity (d’) between the audiovisual and visual conditions. Behavioral facilitation may not be related to sensory enhancement in the present study because both audiovisual and visual stimuli were nonambiguous and easy to process (Alais and Burr, 2004; Sanabria et al., 2007; Tang et al., 2021; Wuerger et al., 2003). We used the location-based Go/No-go task with homogeneous stimuli, i.e., presenting audiovisual and visual stimuli that elicited responses on the left and right sides but not to the same stimuli in the center.
Audio–visual interaction in visual motion detection: Synchrony versus Asynchrony
2017, Journal of OptometryCitation Excerpt :Additionally, congruence is an essential factor in multisensory motion perception. If both stimuli move in the same direction, detection thresholds can be significantly reduced while incongruent stimuli lead to increased detection thresholds.7,13–15,18,45,49,53,59,60 In some of these studies, clear congruence effects were shown, some studies only demonstrated the clear advantageous effect of multimodal compared to unimodal presentation.