Cerebral processing of emotional prosody—influence of acoustic parameters and arousal
Introduction
In a multifaceted environment, our sensory systems are confronted with an abundance of information which contrasts with the limited processing capacity of the human brain. To cope with these limitations, stimuli of potential behavioral relevance have to be separated automatically from irrelevant ones to enable filtering of vital information. This separation can either occur by voluntary attention to certain stimulus features or in an involuntary stimulus-driven manner (Desimone and Duncan, 1995). Such automatic processing has been observed in response to novel stimuli (Näätänen, 1990) or emotionally salient stimuli (Vuilleumier and Schwartz, 2001, Vuilleumier, 2005).
In the auditory domain, emotionally salient information can be expressed via modulation of speech melody (emotional prosody). Findings obtained from auditory evoked potentials indicate differential processing of emotionally and neutrally spoken vowels around 400 ms after stimulus onset (Alter et al., 2003). In previous functional magnetic resonance imaging (fMRI) experiments, it has been demonstrated (Grandjean et al., 2005, Ethofer et al., 2006a, Beaucousin et al., 2007) that voice-sensitive regions (Belin et al., 2000) in the associative auditory cortex in the middle part of the superior temporal gyrus (mid STG) adjacent to superior temporal sulcus respond stronger to happy and angry than to neutral intonations. This response pattern was found independently of whether the subjects were instructed to attend to emotional prosody (Ethofer et al., 2006a, Beaucousin et al., 2007) or some other feature of the presented stimuli, such as emotional semantics (Ethofer et al., 2006a) or speakers’ gender (Grandjean et al., 2005). These findings suggest that activity in mid STG regions is mainly driven by bottom-up mechanisms which rely on features of the presented stimuli and is relatively unaffected by top-down processes that focus the attention of the subjects on certain stimulus characteristics.
So far, it is unclear which stimulus-bound features mediate these bottom-up mechanisms and whether stronger responses to emotional as compared to neutral prosody can be explained by differences in one single acoustic parameter. To minimize such biases, previous studies investigating the neural correlates underlying perception of emotional prosody employed stimuli which were matched for basic parameters, such as acoustic energy (Grandjean et al., 2005) or maximum peak intensity (Ethofer et al., 2006a). Obviously, such approaches are limited by the fact that emotional information in the voice is transmitted via certain acoustic features (Banse and Scherer, 1996) and matching emotional stimuli for all possible acoustic parameters to neutral stimuli would presumably remove the emotional information conveyed by prosody. Therefore, an alternative approach was employed in the present study: first, the impact of single acoustic parameters or the conjoint effect of a set of acoustic parameters was evaluated in simple or multiple regression analyses, respectively. After removing all the variance correlating with the parameters in question, we tested whether the respective regression residuals still show the effect of stronger responsiveness to emotional than neutral prosody.
Here, we used event-related fMRI to investigate the neuronal correlates underlying automatic processing of a broad variety of prosodic categories expressing highly arousing emotional information. To this end, we presented words spoken in neutral, happy, erotic, angry, or fearful intonation in a passive-listening paradigm. Based on results of previous neuroimaging experiments (Grandjean et al., 2005, Ethofer et al., 2006a, Beaucousin et al., 2007), we hypothesized that voice-sensitive regions in mid STG show stronger blood oxygen level dependent (BOLD) responses to emotional intonations expressing high emotional arousal than to low arousing neutral intonations. To evaluate whether stronger responsiveness of this region can be explained by the perceived emotional arousal or by basic acoustic parameters, such as mean intensity, variability (standard deviation) of intensity, mean fundamental frequency, variability of fundamental frequency, or stimulus duration, separate simple regression analyses were carried out and the resulting regression residuals of emotional and neutral trials were statistically compared. Furthermore, a multiple regression analysis, including all five acoustic parameters investigated here, was conducted and regression residuals of emotional and neutral trials obtained in this analysis were compared to evaluate the conjoint effect of these parameters on event-related responses of voice-sensitive regions in mid STG.
Section snippets
Subjects
Twenty-four right-handed, healthy, native, or highly proficient German speaking participants (12 female, 12 male, mean age 25.1 years) were included in an fMRI study. None of the subjects previously participated in an fMRI experiment on emotional processing. Right-handedness was assessed using the Edinburgh Inventory (Oldfield, 1971). The study was approved by the Ethical Committee of the University of Tuebingen and conducted according to the Declaration of Helsinki.
Stimuli
126 adjectives and nouns
Results
Comparison of hemodynamic responses to emotional and neutral prosody revealed a cluster with two distinct maxima in the right temporal lobe, which were located within Heschl’s gyrus (MNI coordinates: x = 48; y = − 24; z = 3; Z score = 4.06, cytoarchitectonic subdivision TE 1.1 as defined by probabilistic maps obtained from Morosan et al., 2001 and Rademacher et al., 2001) and in the mid STG (MNI coordinates: x = 63; y = − 12; z = 0; Z score = 3.72, k = 99, p < 0.05 corrected, see Figs. 1a–c). Other brain regions
Discussion
The present study was conducted to investigate the neural correlates underlying automatic processing of emotional prosody. In particular, we wanted to test whether our region of interest in right mid STG shows increased responses to a broad variety of emotional intonations. Furthermore, we wanted to clarify whether stronger responses within this brain region can be explained on the basis of emotional arousal, a single acoustic parameter, or the conjoint effect of mean intensity, variability of
Conclusion
Prosody of all four emotional categories investigated in the present study induced stronger responses than neutral prosody in right mid STG indicating that this brain region is sensitive to a broad variety of behaviorally relevant information expressed by prosody. Demonstration of this effect in a passive-listening paradigm underlines its stimulus-driven nature and confirms previous suggestions (Ethofer et al., 2006a) that this effect occurs irrespective of task instructions. Event-related
Acknowledgments
Sarah Wiethoff was supported by a grant of the Studienstiftung des Deutschen Volkes. The study was supported by the Deutsche Forschungsgemeinschaft (Sonderforschungsbereich 550-B10) and by the Junior Science Program of the Heidelberger Academy of Sciences and Humanities.
References (55)
- et al.
Affective encoding in the speech signal and in event-related potentials
Speech Commun.
(2003) - et al.
Modeling geometric deformations in EPI time series
NeuroImage
(2001) - et al.
Response preferences for “what” and “where” in human non-primary auditory cortex
NeuroImage
(2006) - et al.
A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data
NeuroImage
(2005) - et al.
Cerebral pathways in processing of affective prosody: a dynamic causal modeling study
NeuroImage
(2006) - et al.
Classical and Bayesian inference in neuroimaging: applications
NeuroImage
(2002) Neuroimaging studies of priming
Prog. Neurobiol.
(2003)- et al.
Intensity coding of auditory stimuli: an fMRI study
Neuropsychologia
(1998) - et al.
Phonetic perception and the temporal cortex
NeuroImage
(2002) - et al.
Overlapping neural regions for processing rapid temporal cues in speech and nonspeech signals
NeuroImage
(2003)
fMRI activation in relation to sound intensity and loudness
NeuroImage
Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system
NeuroImage
Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus
NeuroImage
Assessment and analysis of handedness: the Edinburgh inventory
Neuropsychologia
Probabilistic mapping and volume measurement of human primary auditory cortex
NeuroImage
Functional neuroanatomy of auditory mismatch processing: an event-related fMRI study of duration-deviant oddballs
NeuroImage
Vocal communication of emotion: a review of research paradigms
Speech Commun.
Male and female voices activate distinct regions in the male brain
NeuroImage
Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain
NeuroImage
Analysis of the spectral envelope of sounds by the human brain
NeuroImage
Human brain mechanisms for the early analysis of voices
NeuroImage
Comparison of “silent” clustered and sparse temporal fMRI acquisitions in tonal and speech perception tasks
NeuroImage
Structure and function of auditory cortex: music and speech
Trends Cogn. Sci.
Acoustic profiles in vocal emotion expression
J. Pers. Soc. Psychol.
FMRI study of emotional speech comprehension
Cereb. Cortex
Voice-selective areas in human auditory cortex
Nature
Praat, a system for doing phonetics by computer
Glot Int.
Cited by (144)
Electroconvulsive therapy enhances degree centrality in the orbitofrontal cortex in depressive rumination
2024, Psychiatry Research - NeuroimagingPreterm infants show an atypical processing of the mother's voice
2023, Brain and CognitionThe emotional component of inner speech: A pilot exploratory fMRI study
2023, Brain and CognitionThe hearing hippocampus
2022, Progress in NeurobiologyPhonetic and phonological cues to prediction: Neurophysiology of Danish stød
2022, Journal of Phonetics