No direction-specific bimodal facilitation for audiovisual motion detection

doi:10.1016/j.cogbrainres.2003.11.011

Cognitive Brain Research

Volume 19, Issue 2, April 2004, Pages 185-194

https://doi.org/10.1016/j.cogbrainres.2003.11.011 Get rights and content

Abstract

After several decades of unimodal perceptual research, interest is turning increasingly to cross-modal interactions. At a physiological level, the existence of bimodal cells is well documented and it is known that correlated audiovisual input enhances localisation and orienting behaviours. Audiovisual perceptual interactions have also been demonstrated (e.g., the well-known McGurk effect). The present study explores motion perception and asks whether correlated audiovisual motion signals would be better detected than unimodal motions or bimodal motions in opposing directions. Using a dynamic random-dot field with variable motion coherence as a visual stimulus, together with an auditory motion defined by a stereo noise source smoothly translating along a horizontal trajectory, we find that correlated bimodal motion yields only a slight improvement (approximately a square root of two advantage) in detection threshold relative to unimodal detection. The size of this benefit is consistent with a statistical advantage rather than a bimodal facilitation account. Moreover, anticorrelated bimodal motion showed the same modest improvement, again speaking against linear summation but consistent with statistical combination of visual and auditory signals. These findings were replicated in peripheral as well as in central vision, and with translating visual objects as well as with spatially distributed visual motion. The superadditivity observed neurally (especially in deep-layer superior collicular cells), when weak unimodal signals are combined in bimodal cells does not apply to the detection of linear translational motion.

Introduction

Many objects in the external environment are represented in two or more sensory modalities. Touch and vision are commonly co-activated, when objects are taken in hand and inspected. Audition and vision are also frequently activated by the same stimulus event (the sight and sound of speeding cars is a common example). Despite the modularity of our sensory systems, we perceive a unified and coherent world. Indeed, by synthesising complementary information, we enhance the likelihood that our internal perceptions will accurately reflect external realities [7], enabling us to respond more rapidly and appropriately. Two of the interesting questions which arise from this synthesis are: how is information combined across modalities; and does the perceptual system capitalise on complementary information about the same stimulus to improve its performance. The experiments we present deal primarily with the latter question. Specifically, we ask whether the ability to detect movement is improved when that movement is represented in both auditory and visual modalities.

The combination of information across senses has been heavily researched at the neurophysiological level [30], with particular focus on the superior colliculus. Its deep layers contain many ‘multisensory’ neurons–neurons that receive unimodal sensory input from more than one source. Multisensory cells may be bimodal, or even trimodal, with audiovisual bimodal cells a common variety. These are arranged in a topographical representation of external space and have separate but overlapping auditory and visual receptive fields so that they respond to audiovisual input from a single location [35]. Although they can be driven unimodally, they exhibit a strong non-linear response known as “superadditivity” [12], [15] when driven bimodally by spatiotemporally correlated audiovisual input.

Behavioural and attentional studies have demonstrated that cross-modal interactions do indeed occur. Behaviourally, it known that bimodal superadditivity improves orienting behaviours such as eye movements [8], [30], [32], [37] and aids stimulus localisation [1], [31]. This likely reflects response enhancement to correlated bimodal stimuli creating a salient peak on collicular topography. Several studies have also confirmed the physiological observation that response enhancement (superadditivity) is maximal when auditory and visual inputs arrive synchronously [12], [16], although for perceptual tasks the temporal window within which auditory and visual stimuli can be phenomenally integrated is rather broad [13], [21]. In contrast, discordant stimuli lead to “response depression” [11], [15] and a corresponding decrease in efficiency of orienting behaviours [32], [37]. It has also been shown that attending to a particular spatial location for visual stimuli improves task performance for auditory stimuli (and vice versa) in that location [27], [28], suggesting a linked audiovisual topography. This is consistent with the fact that the superior colliculus (containing multi-modal cells) is strongly implicated in orienting to salient stimuli, whether overtly with eye movements or covertly with attention [6], [26].

Apart from its role in orienting, the superior colliculus also has strong reciprocal links, via the pulvinar, with middle-temporal (MT) cortical area [29]. MT is an area specialised for processing visual movement and activity in this area is strongly correlated with visual motion perception [2], [3]. Outputs from MT project directly to area VIP where they combine with input from auditory areas to create bimodal cells with strong motion selectivity [5], [9], [14]. Based on the evidence for strong audiovisual interactions in sensory processing, both at an early, subcortical level as well as at higher, motion-specialised cortical areas, we conducted experiments to examine whether sensitivity to bimodally represented movement might be improved relative to unimodal baselines when that movement is spatiotemporally concordant in both audition and vision. In particular, we focused on thresholds for motion detection, as the neurophysiological evidence suggests that response enhancement should be stronger when the unimodal stimuli are weak [15]. On this principle, unimodal stimuli too weak to be detected alone could conceivably become detectable when part of a correlated bimodal stimulus. We therefore measured motion detection thresholds unimodally for vision and for audition, and then again when the stimuli were presented together as a bimodal motion stimulus. We compared conditions in which the auditory and visual components were either matched in direction (correlated) or were opposed (anticorrelated). This was repeated for visual motion in central and in peripheral vision, and for visual stimuli that were spatially distributed or were a spatially localised object. The results show no evidence of a facilitative audiovisual interaction for detection of linear translations, whether in the central or peripheral field.

Section snippets

Stimuli

The auditory stimuli were created digitally at a sampling rate of 65 kHz and played over loudspeakers (Yamaha MSP5) which lay in the same plane as the video monitor, 45 cm from the observer, and ±30 cm from the monitor's centre. The sound was produced by low-pass filtering white noise using a 5th-order Butterworth filter with a cut-off frequency of 2 kHz. Auditory movement was created by playing the filtered signal in stereo and varying the magnitude and sign of interaural time differences so

Visual and auditory motion thresholds

We first measured separately the coherence thresholds for discriminating the direction of motion of a visual and of an auditory sound source. Both stimuli were “broad-band” and designed to be as similar as possible (see illustration in Fig. 1). The visual stimulus—a field of 200 dots in which a random subset was displaced either leftward or rightward to create a sensation of coherent motion—was chosen because it is known that variations in its motion coherence level elicit responses in visual

Discussion

Taken together, these results show a small non-directional gain in bimodal movement detection for bimodal motion, but contain no evidence of a facilitative audiovisual interaction. This holds true for both coherently moving visual objects and for spatially distributed motions, in central and in peripheral vision. These conclusions agree with two recent reports [18], [39]. As in our study, both used spatially distributed random-dots to create visual motion that was either weakly visible or

Acknowledgements

David Alais is supported by a Marie Curie Fellowship from the European Commission.

References (39)

M.S. Graziano
A system of multimodal areas in the primate brain
Neuron
(2001)
J. Lewald et al.
Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli
Brain Res. Cogn. Brain Res.
(2003)
M.O. Scase et al.
What is noise for the motion system?
Vis. Res.
(1996)
R.J. Snowden et al.
Differences in the processing of short-range apparent motion at small and large displacements
Vis. Res.
(1990)
G.P. Standage et al.
The organization of connections between the pulvinar and visual area MT in the macaque monkey
Brain Res.
(1983)
B.E. Stein et al.
Neurons and behavior: the same rules of multisensory integration apply
Brain Res.
(1988)
C.W. Tyler et al.
Signal detection theory in the 2AFC paradigm: attention, channel uncertainty and probability summation
Vis. Res.
(2000)
V. Virsu et al.
Temporal contrast sensitivity and cortical magnification
Vis. Res.
(1982)
M.J. Wright et al.
Spatiotemporal contrast sensitivity and visual field locus
Vis. Res.
(1983)
C. Auerbach et al.
An auditory–visual space: evidence for its reality
Percept. Psychophys.
(1974)

K.H. Britten et al.

The analysis of visual motion: a comparison of neuronal and psychophysical performance

J. Neurosci.

(1992)

K.H. Britten et al.

A relationship between behavioral choice and the visual responses of neurons in macaque MT

Vis. Neurosci.

(1996)

S. Carlile

Virtual Auditory Space: Generation and Applications

(1996)

C.L. Colby et al.

Ventral intraparietal area of the macaque: anatomic location and visual response properties

J. Neurophysiol.

(1993)

R. Desimone et al.

Attentional control of visual perception: cortical and subcortical mechanisms

Cold Spring Harbor Symp.

(1990)

M.O. Ernst et al.

Humans integrate visual and haptic information in a statistically optimal fashion

Nature

(2002)

M.A. Frens et al.

Spatial and temporal factors determine audio–visual interactions in human saccadic eye movements

Percept. Psychophys.

(1995)

J.M. Hillis et al.

Combining sensory information: mandatory fusion within but not between the senses

Science

(2002)

D. Kadunce et al.

Mechanisms of within- and cross-modality suppression in superior colliculus

J. Neurophysiol.

(1997)

Cited by (104)

The effects of synchronous and asynchronous steady-state auditory-visual motion on EEG characteristics in healthy young adults
2024, Expert Systems with Applications
The brain’s integration of motion information from visual and auditory senses is essential for human to be able to duly respond to the dynamic living environment. Research on the impact of steady-state auditory motion and steady-state visual motion synchronous stimuli and minor asynchronous stimuli on electroencephalogram (EEG) is lacking. This study designed steady-state visual motion paradigms, steady-state auditory motion paradigms, steady-state auditory-visual synchronous motion paradigms, and steady-state auditory-visual asynchronous motion paradigms. Then the effects of steady-state auditory-visual motion stimuli on EEG were analyzed from the aspects of time domain, frequency domain, connectivity, and source localization. Our study showed that the synchronous steady-state auditory-visual motion stimuli and the asynchronous steady-state auditory-visual motion stimuli can enhance brain responses in a subadditive mode and significantly activated auditory and visual integration areas compared to single-modal visual and auditory stimuli. Moreover, asynchronous steady-state auditory-visual motion stimuli activated the Anterior Cingulate (ACG) compared to synchronous steady-state auditory-visual motion stimuli. Our study demonstrated that ACG regions were involved in conflicting processing of steady-state auditory-visual motion information. This study further enriches the cutting edge of auditory-visual motion integration.
The computational rules of cross-modality suppression in the visual posterior sylvian area
2023, iScience
The macaque visual posterior sylvian area (VPS) is an area with neurons responding selectively to heading direction in both visual and vestibular modalities, but how VPS neurons combined these two sensory signals is still unknown. In contrast to the subadditive characteristics in the medial superior temporal area (MSTd), responses in VPS were dominated by vestibular signals, with approximately a winner-take-all competition. The conditional Fisher information analysis shows that VPS neural population encodes information from distinct sensory modalities under large and small offset conditions, which differs from MSTd whose neural population contains more information about visual stimuli in both conditions. However, the combined responses of single neurons in both areas can be well fit by weighted linear sums of unimodal responses. Furthermore, a normalization model captured most vestibular and visual interaction characteristics for both VPS and MSTd, indicating the divisive normalization mechanism widely exists in the cortex.
The role of the interaction between the inferior parietal lobule and superior temporal gyrus in the multisensory Go/No-go task
2022, NeuroImage
Citation Excerpt :
At the behavioral level, the signal detection analyses showed no significant difference in the perceptual sensitivity (d’) between the audiovisual and visual conditions. Behavioral facilitation may not be related to sensory enhancement in the present study because both audiovisual and visual stimuli were nonambiguous and easy to process (Alais and Burr, 2004; Sanabria et al., 2007; Tang et al., 2021; Wuerger et al., 2003). We used the location-based Go/No-go task with homogeneous stimuli, i.e., presenting audiovisual and visual stimuli that elicited responses on the left and right sides but not to the same stimuli in the center.
Information from multiple sensory modalities interacts. Using functional magnetic resonance imaging (fMRI), we aimed to identify the neural structures correlated with how cooccurring sound modulates the visual motor response execution. The reaction time (RT) to audiovisual stimuli was significantly faster than the RT to visual stimuli. Signal detection analyses showed no significant difference in the perceptual sensitivity (d’) between audiovisual and visual stimuli, while the response criteria (β or c) of the audiovisual stimuli was decreased compared to the visual stimuli. The functional connectivity between the left inferior parietal lobule (IPL) and bilateral superior temporal gyrus (STG) was enhanced in Go processing compared with No-go processing of audiovisual stimuli. Furthermore, the left precentral gyrus (PreCG) showed enhanced functional connectivity with the bilateral STG and other areas of the ventral stream in Go processing compared with No-go processing of audiovisual stimuli. These results revealed that the neuronal network correlated with modulations of the motor response execution after the presentation of both visual stimuli along with cooccurring sound in a multisensory Go/Nogo task, including the left IPL, left PreCG, bilateral STG and some areas of the ventral stream. The role of the interaction between the IPL and STG in transforming audiovisual information into motor behavior is discussed. The current study provides a new perspective for exploring potential brain mechanisms underlying how humans execute appropriate behaviors on the basis of multisensory information.
Auditory cues facilitate object movement processing in human extrastriate visual cortex during simulated self-motion: A pilot study
2021, Brain Research
Visual segregation of moving objects is a considerable computational challenge when the observer moves through space. Recent psychophysical studies suggest that directionally congruent, moving auditory cues can substantially improve parsing object motion in such settings, but the exact brain mechanisms and visual processing stages that mediate these effects are still incompletely known. Here, we utilized multivariate pattern analyses (MVPA) of MRI-informed magnetoencephalography (MEG) source estimates to examine how crossmodal auditory cues facilitate motion detection during the observer's self-motion. During MEG recordings, participants identified a target object that moved either forward or backward within a visual scene that included nine identically textured objects simulating forward observer translation. Auditory motion cues 1) improved the behavioral accuracy of target localization, 2) significantly modulated the MEG source activity in the areas V2 and human middle temporal complex (hMT+), and 3) increased the accuracy at which the target movement direction could be decoded from hMT+ activity using MVPA. The increase of decoding accuracy by auditory cues in hMT+ was significant also when superior temporal activations in or near auditory cortices were regressed out from the hMT+ source activity to control for source estimation biases caused by point spread. Taken together, these results suggest that parsing object motion from self-motion-induced optic flow in the human extrastriate visual cortex can be facilitated by crossmodal influences from auditory system.
Effects of auditory feedback on movements with two-segment sequence and eye-hand coordination: Using a short auditory contact cue
2020, Neuroscience Letters
During sequential reaches to multiple targets, eye and hand movements are highly coordinated, and the gaze is anchored to each target until the reaching hand makes contact to each of them. Such contact events are monitored by multimodal (visual, proprioceptive) sensory systems, and one function of the gaze anchoring to each target is verification of successful target contact (reach completion). The present study focused on this verification function and examined how planning and control of eye and hand movements during two-segment eye-hand movements are affected by augmented auditory feedback of reach completion. Young adults made a reach to the first target with a saccade, and then made another saccade to the second target in blocked trials. An auditory target-contact cue condition delivered four short sounds during the initial reach, and the last sound was synchronized with target contact, whereas a control condition lacked the last target-contact sound. The results showed that saccadic reaction time increased with the target-contact cue, especially when the reaching accuracy demand was high. The reach also became slower with lower peak velocity and longer time to peak velocity with that cue, suggesting that the limb-motor system lower the preplanned speed of the reach in a top-down fashion for a better preparation toward reach completion. However, no auditory effects were found for the timing of gaze shift to the second target. These results were different from those seen in previous studies, indicating that the effects of the additional auditory contact feedback differ depending on behavioral tasks and cue characteristics.
Audio–visual interaction in visual motion detection: Synchrony versus Asynchrony
2017, Journal of Optometry
Citation Excerpt :
Additionally, congruence is an essential factor in multisensory motion perception. If both stimuli move in the same direction, detection thresholds can be significantly reduced while incongruent stimuli lead to increased detection thresholds.7,13–15,18,45,49,53,59,60 In some of these studies, clear congruence effects were shown, some studies only demonstrated the clear advantageous effect of multimodal compared to unimodal presentation.
Detection and identification of moving targets is of paramount importance in everyday life, even if it is not widely tested in optometric practice, mostly for technical reasons. There are clear indications in the literature that in perception of moving targets, vision and hearing interact, for example in noisy surrounds and in understanding speech. The main aim of visual perception, the ability that optometry aims to optimize, is the identification of objects, from everyday objects to letters, but also the spatial orientation of subjects in natural surrounds. To subserve this aim, corresponding visual and acoustic features from the rich spectrum of signals supplied by natural environments have to be combined.
Here, we investigated the influence of an auditory motion stimulus on visual motion detection, both with a concrete (left/right movement) and an abstract auditory motion (increase/decrease of pitch).
We found that incongruent audiovisual stimuli led to significantly inferior detection compared to the visual only condition. Additionally, detection was significantly better in abstract congruent than incongruent trials. For the concrete stimuli the detection threshold was significantly better in asynchronous audiovisual conditions than in the unimodal visual condition.
We find a clear but complex pattern of partly synergistic and partly inhibitory audio–visual interactions. It seems that asynchrony plays only a positive role in audiovisual motion while incongruence mostly disturbs in simultaneous abstract configurations but not in concrete configurations. As in speech perception in hearing-impaired patients, patients suffering from visual deficits should be able to benefit from acoustic information.
La detección e identificación de los objetivos en movimiento es de extrema importancia en la vida diaria, aun cuando no se ha probado ampliamente en la práctica optométrica por motivos técnicos. La literatura incluye indicaciones claras acerca de la interacción entre la percepción de objetivos en movimiento, la visión y la audición, como por ejemplo en los ambientes ruidosos y en la comprensión del habla. La meta principal de la percepción visual, la capacidad que trata de optimizar la optometría, es la identificación de objetos, desde los cotidianos a las letras, y también la orientación espacial de los sujetos en entornos naturales. Para ayudar a lograr esta meta, deben combinarse las correspondientes características visuales y acústicas de entre el amplio espectro de señales que aportan los ambientes naturales.
Investigamos la influencia de un estímulo de movimiento auditivo sobre la detección del movimiento visual, tanto en el movimiento auditivo concreto (movimiento izquierda/derecha) como abstracto (incremento/decremento de tono).
Encontramos que los estímulos audiovisuales incongruentes originaban una detección significativamente inferior en comparación a la situación únicamente visual. Además, la detección fue considerablemente mejor en los campos congruentes abstractos que en los incongruentes. Para los estímulos concretos, el umbral de detección fue significativamente inferior en situaciones audiovisuales asíncronas que en la situación visual unimodal.
Encontramos un patrón claro aunque complejo de interacciones audio-visuales parcialmente sinérgicas y parcialmente inhibitorias. Parece ser que la asincronía juega únicamente un papel positivo en el movimiento audiovisual, mientras que la incongruencia se altera principalmente en las configuraciones abstractas simultáneas pero no en las configuraciones concretas. Como en la percepción del habla en pacientes con deficiencias auditivas, los pacientes que padecen déficits visuales deberían poder beneficiarse de la información acústica.

View all citing articles on Scopus

View full text

Research reportNo direction-specific bimodal facilitation for audiovisual motion detection

Abstract

Introduction

Section snippets

Stimuli

Visual and auditory motion thresholds

Discussion

Acknowledgements

Neuron

Brain Res. Cogn. Brain Res.

Vis. Res.

Vis. Res.

Brain Res.

Brain Res.

Vis. Res.

Vis. Res.

Vis. Res.

An auditory–visual space: evidence for its reality

Percept. Psychophys.

The analysis of visual motion: a comparison of neuronal and psychophysical performance

J. Neurosci.

A relationship between behavioral choice and the visual responses of neurons in macaque MT

Vis. Neurosci.

Virtual Auditory Space: Generation and Applications

Ventral intraparietal area of the macaque: anatomic location and visual response properties

J. Neurophysiol.

Attentional control of visual perception: cortical and subcortical mechanisms

Cold Spring Harbor Symp.

Humans integrate visual and haptic information in a statistically optimal fashion

Nature

Spatial and temporal factors determine audio–visual interactions in human saccadic eye movements

Percept. Psychophys.

Combining sensory information: mandatory fusion within but not between the senses

Science

Mechanisms of within- and cross-modality suppression in superior colliculus

J. Neurophysiol.

Research report
No direction-specific bimodal facilitation for audiovisual motion detection