Cerebral processing of emotional prosody—influence of acoustic parameters and arousal

doi:10.1016/j.neuroimage.2007.09.028

NeuroImage

Volume 39, Issue 2, 15 January 2008, Pages 885-893

https://doi.org/10.1016/j.neuroimage.2007.09.028 Get rights and content

Abstract

The human brain has a preference for processing of emotionally salient stimuli. In the auditory modality, emotional prosody can induce such involuntary biasing of processing resources. To investigate the neural correlates underlying automatic processing of emotional information in the voice, words spoken in neutral, happy, erotic, angry, and fearful prosody were presented in a passive-listening functional magnetic resonance imaging (fMRI) experiment. Hemodynamic responses in right mid superior temporal gyrus (STG) were significantly stronger for all emotional than for neutral intonations. To disentangle the contribution of basic acoustic features and emotional arousal to this activation, the relation between event-related responses and these parameters was evaluated by means of regression analyses. A significant linear dependency between hemodynamic responses of right mid STG and mean intensity, mean fundamental frequency, variability of fundamental frequency, duration, and arousal of the stimuli was observed. While none of the acoustic parameters alone explained the stronger responses of right mid STG to emotional relative to neutral prosody, this stronger responsiveness was abolished both by correcting for arousal or the conjoint effect of the acoustic parameters. In conclusion, our results demonstrate that right mid STG is sensitive to various emotions conveyed by prosody, an effect which is driven by a combination of acoustic features that express the emotional arousal in the speaker’s voice.

Introduction

In a multifaceted environment, our sensory systems are confronted with an abundance of information which contrasts with the limited processing capacity of the human brain. To cope with these limitations, stimuli of potential behavioral relevance have to be separated automatically from irrelevant ones to enable filtering of vital information. This separation can either occur by voluntary attention to certain stimulus features or in an involuntary stimulus-driven manner (Desimone and Duncan, 1995). Such automatic processing has been observed in response to novel stimuli (Näätänen, 1990) or emotionally salient stimuli (Vuilleumier and Schwartz, 2001, Vuilleumier, 2005).

In the auditory domain, emotionally salient information can be expressed via modulation of speech melody (emotional prosody). Findings obtained from auditory evoked potentials indicate differential processing of emotionally and neutrally spoken vowels around 400 ms after stimulus onset (Alter et al., 2003). In previous functional magnetic resonance imaging (fMRI) experiments, it has been demonstrated (Grandjean et al., 2005, Ethofer et al., 2006a, Beaucousin et al., 2007) that voice-sensitive regions (Belin et al., 2000) in the associative auditory cortex in the middle part of the superior temporal gyrus (mid STG) adjacent to superior temporal sulcus respond stronger to happy and angry than to neutral intonations. This response pattern was found independently of whether the subjects were instructed to attend to emotional prosody (Ethofer et al., 2006a, Beaucousin et al., 2007) or some other feature of the presented stimuli, such as emotional semantics (Ethofer et al., 2006a) or speakers’ gender (Grandjean et al., 2005). These findings suggest that activity in mid STG regions is mainly driven by bottom-up mechanisms which rely on features of the presented stimuli and is relatively unaffected by top-down processes that focus the attention of the subjects on certain stimulus characteristics.

So far, it is unclear which stimulus-bound features mediate these bottom-up mechanisms and whether stronger responses to emotional as compared to neutral prosody can be explained by differences in one single acoustic parameter. To minimize such biases, previous studies investigating the neural correlates underlying perception of emotional prosody employed stimuli which were matched for basic parameters, such as acoustic energy (Grandjean et al., 2005) or maximum peak intensity (Ethofer et al., 2006a). Obviously, such approaches are limited by the fact that emotional information in the voice is transmitted via certain acoustic features (Banse and Scherer, 1996) and matching emotional stimuli for all possible acoustic parameters to neutral stimuli would presumably remove the emotional information conveyed by prosody. Therefore, an alternative approach was employed in the present study: first, the impact of single acoustic parameters or the conjoint effect of a set of acoustic parameters was evaluated in simple or multiple regression analyses, respectively. After removing all the variance correlating with the parameters in question, we tested whether the respective regression residuals still show the effect of stronger responsiveness to emotional than neutral prosody.

Here, we used event-related fMRI to investigate the neuronal correlates underlying automatic processing of a broad variety of prosodic categories expressing highly arousing emotional information. To this end, we presented words spoken in neutral, happy, erotic, angry, or fearful intonation in a passive-listening paradigm. Based on results of previous neuroimaging experiments (Grandjean et al., 2005, Ethofer et al., 2006a, Beaucousin et al., 2007), we hypothesized that voice-sensitive regions in mid STG show stronger blood oxygen level dependent (BOLD) responses to emotional intonations expressing high emotional arousal than to low arousing neutral intonations. To evaluate whether stronger responsiveness of this region can be explained by the perceived emotional arousal or by basic acoustic parameters, such as mean intensity, variability (standard deviation) of intensity, mean fundamental frequency, variability of fundamental frequency, or stimulus duration, separate simple regression analyses were carried out and the resulting regression residuals of emotional and neutral trials were statistically compared. Furthermore, a multiple regression analysis, including all five acoustic parameters investigated here, was conducted and regression residuals of emotional and neutral trials obtained in this analysis were compared to evaluate the conjoint effect of these parameters on event-related responses of voice-sensitive regions in mid STG.

Section snippets

Subjects

Twenty-four right-handed, healthy, native, or highly proficient German speaking participants (12 female, 12 male, mean age 25.1 years) were included in an fMRI study. None of the subjects previously participated in an fMRI experiment on emotional processing. Right-handedness was assessed using the Edinburgh Inventory (Oldfield, 1971). The study was approved by the Ethical Committee of the University of Tuebingen and conducted according to the Declaration of Helsinki.

Stimuli

126 adjectives and nouns

Results

Comparison of hemodynamic responses to emotional and neutral prosody revealed a cluster with two distinct maxima in the right temporal lobe, which were located within Heschl’s gyrus (MNI coordinates: x = 48; y = − 24; z = 3; Z score = 4.06, cytoarchitectonic subdivision TE 1.1 as defined by probabilistic maps obtained from Morosan et al., 2001 and Rademacher et al., 2001) and in the mid STG (MNI coordinates: x = 63; y = − 12; z = 0; Z score = 3.72, k = 99, p < 0.05 corrected, see Figs. 1a–c). Other brain regions

Discussion

The present study was conducted to investigate the neural correlates underlying automatic processing of emotional prosody. In particular, we wanted to test whether our region of interest in right mid STG shows increased responses to a broad variety of emotional intonations. Furthermore, we wanted to clarify whether stronger responses within this brain region can be explained on the basis of emotional arousal, a single acoustic parameter, or the conjoint effect of mean intensity, variability of

Conclusion

Prosody of all four emotional categories investigated in the present study induced stronger responses than neutral prosody in right mid STG indicating that this brain region is sensitive to a broad variety of behaviorally relevant information expressed by prosody. Demonstration of this effect in a passive-listening paradigm underlines its stimulus-driven nature and confirms previous suggestions (Ethofer et al., 2006a) that this effect occurs irrespective of task instructions. Event-related

Acknowledgments

Sarah Wiethoff was supported by a grant of the Studienstiftung des Deutschen Volkes. The study was supported by the Deutsche Forschungsgemeinschaft (Sonderforschungsbereich 550-B10) and by the Junior Science Program of the Heidelberger Academy of Sciences and Humanities.

References (55)

K. Alter et al.
Affective encoding in the speech signal and in event-related potentials
Speech Commun.
(2003)
J.L. Andersson et al.
Modeling geometric deformations in EPI time series
NeuroImage
(2001)
D.J.K. Barrett et al.
Response preferences for “what” and “where” in human non-primary auditory cortex
NeuroImage
(2006)
S. Eickhoff et al.
A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data
NeuroImage
(2005)
T. Ethofer et al.
Cerebral pathways in processing of affective prosody: a dynamic causal modeling study
NeuroImage
(2006)
K.J. Friston et al.
Classical and Bayesian inference in neuroimaging: applications
NeuroImage
(2002)
R.N.A. Henson
Neuroimaging studies of priming
Prog. Neurobiol.
(2003)
L. Jäncke et al.
Intensity coding of auditory stimuli: an fMRI study
Neuropsychologia
(1998)
L. Jäncke et al.
Phonetic perception and the temporal cortex
NeuroImage
(2002)
M.F Joanisse et al.
Overlapping neural regions for processing rapid temporal cues in speech and nonspeech signals
NeuroImage
(2003)

D.R.M. Langers et al.

fMRI activation in relation to sound intensity and loudness

NeuroImage

(2007)

P. Morosan et al.

Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system

NeuroImage

(2001)

R. Mottonen et al.

Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus

NeuroImage

(2006)

R.C. Oldfield

Assessment and analysis of handedness: the Edinburgh inventory

Neuropsychologia

(1971)

J. Rademacher et al.

Probabilistic mapping and volume measurement of human primary auditory cortex

NeuroImage

(2001)

U. Schall et al.

Functional neuroanatomy of auditory mismatch processing: an event-related fMRI study of duration-deviant oddballs

NeuroImage

(2003)

K.R. Scherer

Vocal communication of emotion: a review of research paradigms

Speech Commun.

(2003)

D.S. Sokhi et al.

Male and female voices activate distinct regions in the male brain

NeuroImage

(2005)

N. Tzourio-Mazoyer et al.

Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

NeuroImage

(2002)

J.D. Warren et al.

Analysis of the spectral envelope of sounds by the human brain

NeuroImage

(2005)

J.D. Warren et al.

Human brain mechanisms for the early analysis of voices

NeuroImage

(2006)

T. Zaehle et al.

Comparison of “silent” clustered and sparse temporal fMRI acquisitions in tonal and speech perception tasks

NeuroImage

(2007)

R.J. Zatorre et al.

Structure and function of auditory cortex: music and speech

Trends Cogn. Sci.

(2002)

R. Banse et al.

Acoustic profiles in vocal emotion expression

J. Pers. Soc. Psychol.

(1996)

V. Beaucousin et al.

FMRI study of emotional speech comprehension

Cereb. Cortex

(2007)

P. Belin et al.

Voice-selective areas in human auditory cortex

Nature

(2000)

P. Boersma

Praat, a system for doing phonetics by computer

Glot Int.

(2001)

Cited by (144)

Electroconvulsive therapy enhances degree centrality in the orbitofrontal cortex in depressive rumination
2024, Psychiatry Research - Neuroimaging
Depressive rumination has been implicated in the onset, duration, and treatment response of refractory depression. Electroconvulsive therapy (ECT) is remarkably effective in treatment of refractory depression by modulating the functional coordination between brain hubs. However, the mechanisms by which ECT regulates depressive rumination remain unsolved. We investigated degree centrality (DC) in 32 pre- and post-ECT depression patients as well as 38 matched healthy controls. An identified brain region was defined as the seed to calculate functional connectivity (FC) in whole brains. Rumination was measured by the Ruminative Response Scale (RRS) and its relationships with identified DC and FC alterations were examined. We found a significant negative correlation between DC of the right orbitofrontal cortex (rOFC) before ECT and brooding level before and after treatment. Moreover, rOFC DC increased after ECT. DC of the left superior temporal gyrus (lSTG) was positively correlated with reflective level before intervention, while lSTG DC decreased after ECT. Patients showed elevated FC in the rOFC with default mode network. No significant association was found between decreased RRS scores and changes in DC and FC. Our findings suggest that functional changes in rOFC and lSTG may be associated with the beneficial effects of ECT on depressive rumination.
Preterm infants show an atypical processing of the mother's voice
2023, Brain and Cognition
To understand the consequences of prematurity on language perception, it is fundamental to determine how atypical early sensory experience affects brain development. At term equivalent age, ten preterm and ten full-term newborns underwent high-density EEG during mother or stranger speech presentation, in the forward or backward order. A general group effect terms > preterms is evident in the theta frequency band, in the left temporal area, with preterms showing significant activation for strangers’ and terms for the mother’s voice. A significant group contrast in the low and high theta in the right temporal regions indicates higher activations for the stranger's voice in preterms. Finally, only full terms presented a late gamma band increase for the maternal voice, indicating a more mature brain response.
EEG time–frequency analysis demonstrate that preterm infants are selectively responsive to stranger voices in both temporal hemispheres, and that they lack selective brain responses to their mother’s forward voice.
Pupil dilation during encoding, but not type of auditory stimulation, predicts recognition success in face memory
2023, Biological Psychology
We encounter and process information from multiple sensory modalities in our daily lives, and research suggests that learning can be more efficient when contexts are multisensory. In this study, we were interested in whether face identity recognition memory might be improved in multisensory learning conditions, and to explore associated changes in pupil dilation during encoding and recognition. In two studies participants completed old/new face recognition tasks wherein visual face stimuli were presented in the context of sounds. Faces were learnt alongside no sound, low arousal sounds (Experiment 1), high arousal non-face relevant, or high arousal face relevant (Experiment 2) sounds. We predicted that the presence of sounds during encoding would improve later recognition accuracy, however, the results did not support this with no effect of sound condition on memory. Pupil dilation, however, was found to predict later successful recognition both at encoding and during recognition. While these results do not provide support to the notion that face learning is improved under multisensory conditions relative to unisensory conditions, they do suggest that pupillometry may be a useful tool to further explore face identity learning and recognition.
The emotional component of inner speech: A pilot exploratory fMRI study
2023, Brain and Cognition
Inner speech is one of the most important human cognitive processes. Nevertheless, until now, many aspects of inner speech, particularly the emotional characteristics of inner speech, remain poorly understood. The main objectives of our study are to identify the neural substrate for the emotional (prosodic) dimension of inner speech and brain structures that control the suppression of expression in inner speech. To achieve these goals, a pilot exploratory fMRI study was carried out on 33 people. The subjects listened to pre-recorded phrases or individual words pronounced with different emotional connotations, after which they were internally spoken with the same emotion or with suppression of expression (neutral). The results show that there is an emotional component in inner speech, which is encoded by similar structures as in spoken speech. The unique role of the caudate nuclei in the suppression of expression in the inner speech was also shown.
The hearing hippocampus
2022, Progress in Neurobiology
The hippocampus has a well-established role in spatial and episodic memory but a broader function has been proposed including aspects of perception and relational processing. Neural bases of sound analysis have been described in the pathway to auditory cortex, but wider networks supporting auditory cognition are still being established. We review what is known about the role of the hippocampus in processing auditory information, and how the hippocampus itself is shaped by sound. In examining imaging, recording, and lesion studies in species from rodents to humans, we uncover a hierarchy of hippocampal responses to sound including during passive exposure, active listening, and the learning of associations between sounds and other stimuli. We describe how the hippocampus' connectivity and computational architecture allow it to track and manipulate auditory information – whether in the form of speech, music, or environmental, emotional, or phantom sounds. Functional and structural correlates of auditory experience are also identified. The extent of auditory-hippocampal interactions is consistent with the view that the hippocampus makes broad contributions to perception and cognition, beyond spatial and episodic memory. More deeply understanding these interactions may unlock applications including entraining hippocampal rhythms to support cognition, and intervening in links between hearing loss and dementia.
Phonetic and phonological cues to prediction: Neurophysiology of Danish stød
2022, Journal of Phonetics
A corpus study and a combined behavioural and neurophysiological study tested how phonetic and phonological features of the Danish creaky voice feature ‘stød’ influence predictive processing. Being associated with certain word endings, stød and its modal voice counterpart non-stød can cue upcoming speech. Stød has two phases. The first shows phonetic differences in pitch while the second, characterised by creaky voice, has been interpreted as the phonological stød proper. Participants listened to nouns cross-spliced between the two stød phases and between stem and a following singular or plural suffix. Suffixes invalidly cued by phonological stød or non-stød showed longer response times and N400 and P600 effects, the former suggesting that stød/non-stød are becoming grammaticalized as singular and plural morphemes. Even subtle phonetic differences preceding stød proper increased response times, but N400 and P600 amplitudes were not significantly increased. Results suggest predictive use of both phonetic and phonological features, but that phonological stød cues override phonetic cues. The corpus study indicated that word beginnings with stød are less frequent and have fewer possible continuations than non-stød. Stød yielded increased negativity 280–430 ms after stød proper onset, which might be interpreted as a pre-activation negativity for the more predictively useful cue.

View all citing articles on Scopus

View full text

Cerebral processing of emotional prosody—influence of acoustic parameters and arousal

Abstract

Introduction

Section snippets

Subjects

Stimuli

Results

Discussion

Conclusion

Acknowledgments

Speech Commun.

NeuroImage

NeuroImage

NeuroImage

NeuroImage

NeuroImage

Prog. Neurobiol.

Neuropsychologia

NeuroImage

NeuroImage

NeuroImage

NeuroImage

NeuroImage

Neuropsychologia

NeuroImage

NeuroImage

Speech Commun.

NeuroImage

NeuroImage

NeuroImage

NeuroImage

NeuroImage

Trends Cogn. Sci.

Acoustic profiles in vocal emotion expression

J. Pers. Soc. Psychol.

FMRI study of emotional speech comprehension

Cereb. Cortex

Voice-selective areas in human auditory cortex

Nature

Praat, a system for doing phonetics by computer

Glot Int.