Research report
Multisensory auditory–visual interactions during early sensory processing in humans: a high-density electrical mapping study

https://doi.org/10.1016/S0926-6410(02)00066-6Get rights and content

Abstract

Integration of information from multiple senses is fundamental to perception and cognition, but when and where this is accomplished in the brain is not well understood. This study examined the timing and topography of cortical auditory–visual interactions using high-density event-related potentials (ERPs) during a simple reaction-time (RT) task. Visual and auditory stimuli were presented alone and simultaneously. ERPs elicited by the auditory and visual stimuli when presented alone were summed (‘sum’ ERP) and compared to the ERP elicited when they were presented simultaneously (‘simultaneous’ ERP). Divergence between the ‘simultaneous’ and ‘sum’ ERP indicated auditory–visual (AV) neural response interactions. There was a surprisingly early right parieto-occipital AV interaction, consistent with the finding of an earlier study [J. Cogn. Neurosci. 11 (1999) 473]. The timing of onset of this effect (46 ms) was essentially simultaneous with the onset of visual cortical processing, as indexed by the onset of the visual C1 component, which is thought to represent the earliest cortical visual evoked potential. The coincident timing of the early AV interaction and C1 strongly suggests that AV interactions can affect early visual sensory processing. Additional AV interactions were found within the time course of sensory processing (up to 200 ms post stimulus onset). In total, this system of AV effects over the scalp was suggestive of both activity unique to multisensory processing, and the modulation of ‘unisensory’ activity. RTs to the stimuli when presented simultaneously were significantly faster than when they were presented alone. This RT facilitation could not be accounted for by probability summation, as evidenced by violation of the ‘race’ model, providing compelling evidence that auditory–visual neural interactions give rise to this RT effect.

Introduction

Everyday tasks involve the seemingly automatic integration of information from multiple sensory modalities. For instance, driving a car involves the synthesis of visual (seeing the road), auditory (hearing the car engine; a passing car), somatosensory (feeling the steering wheel), and motor (depressing the gas pedal) activity. The combination of inputs from different senses can function to reduce perceptual ambiguity (e.g. Ref. [32]) and enhance stimulus detection (e.g. Ref. [53]). Despite the fundamental role that sensory integration plays in performance and perception, how and when information from separate sensory modalities comes together in the human neocortex is not well understood. The bulk of our knowledge on the mechanisms of multisensory integration in the brain comes from the pioneering research of Stein, Meredith, and co-workers (see Ref. [54] for a review) on multisensory processing in the superior colliculus (of anaesthetized cats), a sub-cortical structure involved in orienting to auditory, visual, and somatosensory stimuli. However, the extent to which the multisensory mechanisms defined in the superior colliculus generalize to cortical processes remains to be fully elucidated. Knowledge of the timing and anatomical distribution of cortical multisensory processing is essential to determining the roles that it plays in information processing.

Generally, it has been assumed that cortical multisensory processing occurs relatively late, following extensive processing of sensory inputs, and that it occurs in higher order cortical areas specialized for this purpose (e.g. Ref. [33]). This assumption can be partially attributed to: (1) a bias resulting from the tradition of studying sensory systems in isolation, and (2) animal studies that reveal multisensory convergence in higher-order regions of the parietal (e.g. Refs. [17], [28], [50]), temporal (e.g. Refs. [5], [12], [27]), and frontal lobes (e.g. Refs. [3], [61]) along with a general lack of corresponding studies demonstrating convergence in lower-tier cortical areas. However, recent evidence suggests that multisensory processing occurs during initial sensory transmission, and in cortical areas that are usually held to be unisensory. An investigation by Schroeder et al. [47] in the caudomedial (CM) belt area of the auditory association cortex of awake behaving macaque monkeys, which gets direct input from primary auditory cortex (A1), showed auditory–somatosensory co-representation. Critically, both the auditory and somatosensory inputs to CM had characteristic feed-forward patterns, with both inputs arriving first in layer 4 at about 12 ms post stimulus onset, strongly suggesting bottom-up multisensory integration that occurs early in the sensory processing hierarchy. Functional imaging studies have suggested multisensory effects in what have been classically considered unisensory cortical areas [7], [31], although the prevailing opinion is that these interactions represent feedback from higher-tier multisensory onto the lower-tier unisensory areas. Direct empirical evidence of feedback-mediated multisensory convergence in classical sensory cortex is sparse but supports this possibility (see Ref. [49]). Very recently two event-related potential (ERP) studies found surprisingly early multisensory effects that, in light of their scalp topography, appear to indicate the early integration of sensory information in traditionally held unisensory cortex. In Giard and Peronnet [24], auditory–visual (AV) effects were found to onset at just 40 ms over right parieto-occipital scalp; this is consistent with generators in early visual cortices, although the spatial resolution of ERPs does not allow the contribution of the abutting multisensory areas in posterior parietal cortex or superior temporal sulcus (STS), to be ruled out. And in Foxe et al. [20], auditory–somatosensory effects onset at about 50 ms over central/post-central scalp, consistent with generators in early somatosensory cortex; and at 70 ms over scalp areas consistent with neural activity from posterior auditory areas, in line with the findings of Schroeder et al. [47].

The finding of an AV effect that onsets at 40 ms over parieto-occipital scalp [24] suggests that AV effects can occur at about the same time that initial activation of primary visual cortex (V1) is usually assumed to occur (45–55 ms as represented by the onset of the earliest cortical visual evoked potential, C1: e.g. Refs. [10], [11], [21]). This surprisingly early latency finding suggests a model of auditory–visual interaction in the cortex where auditory input, which reaches the cortex in less than half the time of visual input (9 to 15 ms: [9], [59]), is transmitted from auditory cortices to visual or nearby visually dominant cortical areas, and consequently affects the early sensory processing of visual input.

The purpose of the present study was to advance the understanding of cortical multisensory processing by placing early AV interactions within the temporal and topographical framework of cortical sensory processing of the individual auditory and visual inputs. We first endeavored to determine if the early AV effect reported in Giard and Peronnet [24] would be elicited using a simple task and basic stimuli. In Giard and Peronnet a relatively complicated task and stimulus set were employed: on each trial, one of two tones was presented, and/or a permanently placed circle morphed into a horizontal or vertical ellipse. Subjects made forced two-choice classifications of the six randomly occurring stimulus conditions. In contrast, in the present study, single visually presented disks and auditory pure tones were presented either alone or simultaneously, and subjects performed a speeded simple reaction-time task. Elicitation of the effect under these conditions, in conjunction with its elicitation in the very different conditions of Giard and Peronnet [24], would suggest that early cortical multisensory processing for auditory and visual stimuli that onset simultaneously may be present for a variety of stimuli and tasks. We then compared the onset of the earliest AV effect to the onset of C1 in response to the visual stimulus alone. We expected that the initial cortical response to the visual stimulus would precede any AV effects, reflecting that cortical unisensory processing of the visual input began prior to cortical multisensory processing.

The technique of high-density electrical mapping (from 128 scalp electrodes) was used to establish the spatio–temporal dynamics of auditory–visual multisensory processing in relation to activation across a distributed sensory processing network. To assess multisensory processing, ERPs to the ‘visual alone’ and ‘auditory alone’ stimulus conditions were summed (hereafter referred to as the ‘sum’ ERP) and compared to the ERP to the simultaneously presented auditory and visual stimuli (the ‘simultaneous’ ERP). If neural responses to the auditory and visual inputs were processed in the same way when they were presented simultaneously as when they were presented alone, then, based on the principle of superposition of electrical fields, the ‘simultaneous’ ERP would be equivalent to the ‘sum’ ERP. However, if the neural responses to the simultaneously presented auditory and visual stimuli interacted during processing, the ‘simultaneous’ and ‘sum’ ERPs would diverge. This method of measuring multisensory processing is valid when neural responses reflect sensory processing unique to the stimulus, and do not reflect processes common to all three stimulus types such as target (e.g. the P3) or response (e.g. motor cortex activity) related neural activity. Several forms of interaction effects have been reported from this comparison (e.g. Refs. [20], [24], [38], [40], [55]). Although our primary focus was on the earliest AV interaction, AV interactions up to 200 ms were considered.

We also tested whether multisensory processing was reflected in our behavioral data. Simple reaction-times are generally facilitated when location concordant stimuli are simultaneously presented. This has been called the ‘redundant signal effect’ (RSE) (e.g. Refs. [29], [37]). There are two classes of models to explain this effect: race models and coactivation models. In race models each stimulus of a pair independently competes for response initiation, and the faster of the two mediates the response for any trial. According to this model probability summation produces the RSE, since the likelihood of either of the two stimuli yielding a fast reaction-time is higher than that from one stimulus alone. In coactivation models, the interaction of neural responses to the simultaneously presented stimuli facilitates response initiation and produces the RSE. We tested whether the RSE exceeded the statistical facilitation predicted by the race model, and thereby provided evidence for the contribution of AV neural interactions to RT facilitation.

Section snippets

Subjects

Twelve neurologically normal, paid volunteers participated (mean age 23.8±2.69 S.D.; five female; 11 right-handed), all reported normal hearing and normal or corrected-to-normal vision. The Institutional Review Board of the Nathan Kline Institute for Psychiatric Research approved the experimental procedures, and each subject provided written informed consent. Data from two additional subjects were excluded, one for excessive blinking, and the other for failure to maintain central fixation.

Auditory alone

A

Redundant signal effect

Mean reaction-times to the simultaneous condition (255 ms) were faster than mean reaction-times to either the visual or the auditory alone conditions (305 and 297, respectively). An RSE was confirmed with planned comparisons of each of the alone conditions to the simultaneous condition (for auditory alone condition compared to the simultaneous condition: t11=7.972, P<0.001; for the visual alone condition compared to the simultaneous condition: t11=11.057, P<0.001).

Test of the race model

There was violation of the

Discussion

The present study examined the spatial and temporal properties of cortical AV interactions to basic stimuli, while subjects performed a simple reaction-time task. In the behavioral data there was a significant reaction-time advantage when the visual and auditory stimuli were presented simultaneously compared to when they were presented alone—the so-called redundant sensory effect (RSE). There was substantial violation of the race model. Hence the RSE could not be accounted for by simple

Summary and conclusions

Our electrophysiological data are in general agreement with those reported by Giard and Peronnet [24], the only comparable ERP study to date. Both studies revealed AV interactions of the same polarity over right parieto-occipital scalp (∼40–50 ms), over occipito-temporal scalp (∼165 ms), and over fronto-central (in Giard and Peronnet [24]) and central (the present study) scalp (∼180 ms). However, our AV effects were of considerably shorter duration than those reported in Giard and Peronnet.

Acknowledgements

Sincere appreciation to Ms. Beth Higgins and Ms. Deirdre Foxe for excellent technical assistance. Work supported in part by grants from the NIH—NS30029-23 (WR), MH63434 (JJF) and MH61989 (CES).

References (62)

  • J. Miller

    Divided attention: evidence for coactivation with redundant signals

    Cogn. Psychol.

    (1982)
  • M.M. Murray et al.

    Visuo–spatial response interactions in early cortical processing during a simple reaction time task: a high density electrical mapping study

    Neuropsychologia

    (2001)
  • T.W. Picton et al.

    Human auditory evoked potentials. I. Evaluation of components

    Electroencephalogr. Clin. Neurophys.

    (1974)
  • S. Supek et al.

    Single vs. paired visual stimulation: superposition of early neuromagnetic responses and retinotopy in extrastriate cortex in humans

    Brain Res.

    (1999)
  • H.G. Vaughan et al.

    The sources of auditory evoked responses recorded from the human scalp

    Electroencephalogr. Clin. Neurophys.

    (1970)
  • H.G. Vaughan et al.

    Topographic analysis of auditory event-related potentials

    Prog. Brain Res.

    (1980)
  • T. Allison et al.

    Electrophysiological studies of human face perception I: potentials generated in occipitotemporal cortex by face and non-face stimuli

    Cereb. Cortex

    (1999)
  • R.A. Andersen et al.

    Multimodal representation of space in the posterior parietal cortex and its use in planning movements

    Annu. Rev. Neurosci.

    (1997)
  • R.A. Berman, C.L. Colby, Both auditory and visual attention modulate motion processing in area MT+, Cogn. Brain Res....
  • C. Bruce et al.

    Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque

    J. Neurophysiol.

    (1981)
  • H. Buchner et al.

    Fast visual evoked potential input into human area V5

    Neuroreport

    (1997)
  • G.A. Calvert et al.

    Response amplification in sensory-specific cortices during crossmodal binding

    Neuroreport

    (1999)
  • V.P. Clark et al.

    Identification of early visual evoked potential generators by retinotopic and topographic analyses

    Hum. Brain Map.

    (1995)
  • V.P. Clark et al.

    Spatial selective attention affects early extrastriate but not striate components of the visual evoked potential

    J. Cogn. Neurosci.

    (1996)
  • G.M. Doniger et al.

    Activation time-course of ventral visual stream object-recognition areas: high density electrical mapping of perceptual closure processes

    J. Cogn. Neurosci.

    (2000)
  • J. Driver et al.

    Multisensory perception: beyond modularity and convergence

    Curr. Biol.

    (2000)
  • J.R. Duhamel et al.

    Congruent representation of visual and somatosensory space in single neurons of monkey ventral intraparietal cortex (VIP)

  • A. Falchier et al.

    Extensive projections from the primary auditory cortex and polysensory area stp to peripheral area v1 in the macaque

    Soc. Neurosci. Abs.

    (2001)
  • J.J. Foxe et al.

    Cued shifts of intermodal attention: parieto-occipital ∼10 Hz activity reflects anticipatory state of visual attention mechanisms

    Neuroreport

    (1998)
  • J.J. Foxe et al.

    Flow of activation from V1 to frontal cortex in humans: a framework for defining ‘early’ visual processing

    Exp. Brain Res.

    (2002)
  • D.H. ffytche et al.

    The parallel visual motion inputs into areas V1 and V5 of human cerebral cortex

    Brain

    (1995)
  • Cited by (0)

    View full text