Research reportMultisensory auditory–visual interactions during early sensory processing in humans: a high-density electrical mapping study
Introduction
Everyday tasks involve the seemingly automatic integration of information from multiple sensory modalities. For instance, driving a car involves the synthesis of visual (seeing the road), auditory (hearing the car engine; a passing car), somatosensory (feeling the steering wheel), and motor (depressing the gas pedal) activity. The combination of inputs from different senses can function to reduce perceptual ambiguity (e.g. Ref. [32]) and enhance stimulus detection (e.g. Ref. [53]). Despite the fundamental role that sensory integration plays in performance and perception, how and when information from separate sensory modalities comes together in the human neocortex is not well understood. The bulk of our knowledge on the mechanisms of multisensory integration in the brain comes from the pioneering research of Stein, Meredith, and co-workers (see Ref. [54] for a review) on multisensory processing in the superior colliculus (of anaesthetized cats), a sub-cortical structure involved in orienting to auditory, visual, and somatosensory stimuli. However, the extent to which the multisensory mechanisms defined in the superior colliculus generalize to cortical processes remains to be fully elucidated. Knowledge of the timing and anatomical distribution of cortical multisensory processing is essential to determining the roles that it plays in information processing.
Generally, it has been assumed that cortical multisensory processing occurs relatively late, following extensive processing of sensory inputs, and that it occurs in higher order cortical areas specialized for this purpose (e.g. Ref. [33]). This assumption can be partially attributed to: (1) a bias resulting from the tradition of studying sensory systems in isolation, and (2) animal studies that reveal multisensory convergence in higher-order regions of the parietal (e.g. Refs. [17], [28], [50]), temporal (e.g. Refs. [5], [12], [27]), and frontal lobes (e.g. Refs. [3], [61]) along with a general lack of corresponding studies demonstrating convergence in lower-tier cortical areas. However, recent evidence suggests that multisensory processing occurs during initial sensory transmission, and in cortical areas that are usually held to be unisensory. An investigation by Schroeder et al. [47] in the caudomedial (CM) belt area of the auditory association cortex of awake behaving macaque monkeys, which gets direct input from primary auditory cortex (A1), showed auditory–somatosensory co-representation. Critically, both the auditory and somatosensory inputs to CM had characteristic feed-forward patterns, with both inputs arriving first in layer 4 at about 12 ms post stimulus onset, strongly suggesting bottom-up multisensory integration that occurs early in the sensory processing hierarchy. Functional imaging studies have suggested multisensory effects in what have been classically considered unisensory cortical areas [7], [31], although the prevailing opinion is that these interactions represent feedback from higher-tier multisensory onto the lower-tier unisensory areas. Direct empirical evidence of feedback-mediated multisensory convergence in classical sensory cortex is sparse but supports this possibility (see Ref. [49]). Very recently two event-related potential (ERP) studies found surprisingly early multisensory effects that, in light of their scalp topography, appear to indicate the early integration of sensory information in traditionally held unisensory cortex. In Giard and Peronnet [24], auditory–visual (AV) effects were found to onset at just 40 ms over right parieto-occipital scalp; this is consistent with generators in early visual cortices, although the spatial resolution of ERPs does not allow the contribution of the abutting multisensory areas in posterior parietal cortex or superior temporal sulcus (STS), to be ruled out. And in Foxe et al. [20], auditory–somatosensory effects onset at about 50 ms over central/post-central scalp, consistent with generators in early somatosensory cortex; and at 70 ms over scalp areas consistent with neural activity from posterior auditory areas, in line with the findings of Schroeder et al. [47].
The finding of an AV effect that onsets at 40 ms over parieto-occipital scalp [24] suggests that AV effects can occur at about the same time that initial activation of primary visual cortex (V1) is usually assumed to occur (45–55 ms as represented by the onset of the earliest cortical visual evoked potential, C1: e.g. Refs. [10], [11], [21]). This surprisingly early latency finding suggests a model of auditory–visual interaction in the cortex where auditory input, which reaches the cortex in less than half the time of visual input (9 to 15 ms: [9], [59]), is transmitted from auditory cortices to visual or nearby visually dominant cortical areas, and consequently affects the early sensory processing of visual input.
The purpose of the present study was to advance the understanding of cortical multisensory processing by placing early AV interactions within the temporal and topographical framework of cortical sensory processing of the individual auditory and visual inputs. We first endeavored to determine if the early AV effect reported in Giard and Peronnet [24] would be elicited using a simple task and basic stimuli. In Giard and Peronnet a relatively complicated task and stimulus set were employed: on each trial, one of two tones was presented, and/or a permanently placed circle morphed into a horizontal or vertical ellipse. Subjects made forced two-choice classifications of the six randomly occurring stimulus conditions. In contrast, in the present study, single visually presented disks and auditory pure tones were presented either alone or simultaneously, and subjects performed a speeded simple reaction-time task. Elicitation of the effect under these conditions, in conjunction with its elicitation in the very different conditions of Giard and Peronnet [24], would suggest that early cortical multisensory processing for auditory and visual stimuli that onset simultaneously may be present for a variety of stimuli and tasks. We then compared the onset of the earliest AV effect to the onset of C1 in response to the visual stimulus alone. We expected that the initial cortical response to the visual stimulus would precede any AV effects, reflecting that cortical unisensory processing of the visual input began prior to cortical multisensory processing.
The technique of high-density electrical mapping (from 128 scalp electrodes) was used to establish the spatio–temporal dynamics of auditory–visual multisensory processing in relation to activation across a distributed sensory processing network. To assess multisensory processing, ERPs to the ‘visual alone’ and ‘auditory alone’ stimulus conditions were summed (hereafter referred to as the ‘sum’ ERP) and compared to the ERP to the simultaneously presented auditory and visual stimuli (the ‘simultaneous’ ERP). If neural responses to the auditory and visual inputs were processed in the same way when they were presented simultaneously as when they were presented alone, then, based on the principle of superposition of electrical fields, the ‘simultaneous’ ERP would be equivalent to the ‘sum’ ERP. However, if the neural responses to the simultaneously presented auditory and visual stimuli interacted during processing, the ‘simultaneous’ and ‘sum’ ERPs would diverge. This method of measuring multisensory processing is valid when neural responses reflect sensory processing unique to the stimulus, and do not reflect processes common to all three stimulus types such as target (e.g. the P3) or response (e.g. motor cortex activity) related neural activity. Several forms of interaction effects have been reported from this comparison (e.g. Refs. [20], [24], [38], [40], [55]). Although our primary focus was on the earliest AV interaction, AV interactions up to 200 ms were considered.
We also tested whether multisensory processing was reflected in our behavioral data. Simple reaction-times are generally facilitated when location concordant stimuli are simultaneously presented. This has been called the ‘redundant signal effect’ (RSE) (e.g. Refs. [29], [37]). There are two classes of models to explain this effect: race models and coactivation models. In race models each stimulus of a pair independently competes for response initiation, and the faster of the two mediates the response for any trial. According to this model probability summation produces the RSE, since the likelihood of either of the two stimuli yielding a fast reaction-time is higher than that from one stimulus alone. In coactivation models, the interaction of neural responses to the simultaneously presented stimuli facilitates response initiation and produces the RSE. We tested whether the RSE exceeded the statistical facilitation predicted by the race model, and thereby provided evidence for the contribution of AV neural interactions to RT facilitation.
Section snippets
Subjects
Twelve neurologically normal, paid volunteers participated (mean age 23.8±2.69 S.D.; five female; 11 right-handed), all reported normal hearing and normal or corrected-to-normal vision. The Institutional Review Board of the Nathan Kline Institute for Psychiatric Research approved the experimental procedures, and each subject provided written informed consent. Data from two additional subjects were excluded, one for excessive blinking, and the other for failure to maintain central fixation.
Auditory alone
A
Redundant signal effect
Mean reaction-times to the simultaneous condition (255 ms) were faster than mean reaction-times to either the visual or the auditory alone conditions (305 and 297, respectively). An RSE was confirmed with planned comparisons of each of the alone conditions to the simultaneous condition (for auditory alone condition compared to the simultaneous condition: t11=7.972, P<0.001; for the visual alone condition compared to the simultaneous condition: t11=11.057, P<0.001).
Test of the race model
There was violation of the
Discussion
The present study examined the spatial and temporal properties of cortical AV interactions to basic stimuli, while subjects performed a simple reaction-time task. In the behavioral data there was a significant reaction-time advantage when the visual and auditory stimuli were presented simultaneously compared to when they were presented alone—the so-called redundant sensory effect (RSE). There was substantial violation of the race model. Hence the RSE could not be accounted for by simple
Summary and conclusions
Our electrophysiological data are in general agreement with those reported by Giard and Peronnet [24], the only comparable ERP study to date. Both studies revealed AV interactions of the same polarity over right parieto-occipital scalp (∼40–50 ms), over occipito-temporal scalp (∼165 ms), and over fronto-central (in Giard and Peronnet [24]) and central (the present study) scalp (∼180 ms). However, our AV effects were of considerably shorter duration than those reported in Giard and Peronnet.
Acknowledgements
Sincere appreciation to Ms. Beth Higgins and Ms. Deirdre Foxe for excellent technical assistance. Work supported in part by grants from the NIH—NS30029-23 (WR), MH63434 (JJF) and MH61989 (CES).
References (62)
- et al.
Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey
Exp. Neurol.
(1977) - et al.
Detection of audio–visual integration sites in humans by application of electrophysiological criteria to the BOLD effect
Neuroimage
(2001) - et al.
Auditory input to the human cortex during states of drowsiness and surgical anesthesia
Electroencephalogr. Clin. Neurophys.
(1971) - et al.
Visual areas in the temporal cortex of the macaque
Brain Res.
(1979) - et al.
On the independence of the CNV and the P300 components of the human averaged evoked potential
Electroencephalogr. Clin. Neurophys.
(1975) - et al.
Visual perceptual learning in human object based recognition areas: a repetition priming study using high-density electrical mapping
Neuroimage
(2001) - et al.
Multisensory auditory–somatosensory interactions in early cortical processing revealed by high density electrical mapping
Cogn. Brain Res.
(2000) - et al.
Cross-modality cued attention-dependent suppression of distracter visual input indexed by anticipatory parieto-occipital alpha-band oscillations
Cogn. Brain Res.
(2001) - et al.
Distribution of visual and somatic functions in the parietal associative area 7 of the monkey
Brain Res.
(1979) Speechreading: illusion or window into pattern recognition
Trends Cogn. Sci.
(1999)