Elsevier

NeuroImage

Volume 39, Issue 2, 15 January 2008, Pages 885-893
NeuroImage

Cerebral processing of emotional prosody—influence of acoustic parameters and arousal

https://doi.org/10.1016/j.neuroimage.2007.09.028Get rights and content

Abstract

The human brain has a preference for processing of emotionally salient stimuli. In the auditory modality, emotional prosody can induce such involuntary biasing of processing resources. To investigate the neural correlates underlying automatic processing of emotional information in the voice, words spoken in neutral, happy, erotic, angry, and fearful prosody were presented in a passive-listening functional magnetic resonance imaging (fMRI) experiment. Hemodynamic responses in right mid superior temporal gyrus (STG) were significantly stronger for all emotional than for neutral intonations. To disentangle the contribution of basic acoustic features and emotional arousal to this activation, the relation between event-related responses and these parameters was evaluated by means of regression analyses. A significant linear dependency between hemodynamic responses of right mid STG and mean intensity, mean fundamental frequency, variability of fundamental frequency, duration, and arousal of the stimuli was observed. While none of the acoustic parameters alone explained the stronger responses of right mid STG to emotional relative to neutral prosody, this stronger responsiveness was abolished both by correcting for arousal or the conjoint effect of the acoustic parameters. In conclusion, our results demonstrate that right mid STG is sensitive to various emotions conveyed by prosody, an effect which is driven by a combination of acoustic features that express the emotional arousal in the speaker’s voice.

Introduction

In a multifaceted environment, our sensory systems are confronted with an abundance of information which contrasts with the limited processing capacity of the human brain. To cope with these limitations, stimuli of potential behavioral relevance have to be separated automatically from irrelevant ones to enable filtering of vital information. This separation can either occur by voluntary attention to certain stimulus features or in an involuntary stimulus-driven manner (Desimone and Duncan, 1995). Such automatic processing has been observed in response to novel stimuli (Näätänen, 1990) or emotionally salient stimuli (Vuilleumier and Schwartz, 2001, Vuilleumier, 2005).

In the auditory domain, emotionally salient information can be expressed via modulation of speech melody (emotional prosody). Findings obtained from auditory evoked potentials indicate differential processing of emotionally and neutrally spoken vowels around 400 ms after stimulus onset (Alter et al., 2003). In previous functional magnetic resonance imaging (fMRI) experiments, it has been demonstrated (Grandjean et al., 2005, Ethofer et al., 2006a, Beaucousin et al., 2007) that voice-sensitive regions (Belin et al., 2000) in the associative auditory cortex in the middle part of the superior temporal gyrus (mid STG) adjacent to superior temporal sulcus respond stronger to happy and angry than to neutral intonations. This response pattern was found independently of whether the subjects were instructed to attend to emotional prosody (Ethofer et al., 2006a, Beaucousin et al., 2007) or some other feature of the presented stimuli, such as emotional semantics (Ethofer et al., 2006a) or speakers’ gender (Grandjean et al., 2005). These findings suggest that activity in mid STG regions is mainly driven by bottom-up mechanisms which rely on features of the presented stimuli and is relatively unaffected by top-down processes that focus the attention of the subjects on certain stimulus characteristics.

So far, it is unclear which stimulus-bound features mediate these bottom-up mechanisms and whether stronger responses to emotional as compared to neutral prosody can be explained by differences in one single acoustic parameter. To minimize such biases, previous studies investigating the neural correlates underlying perception of emotional prosody employed stimuli which were matched for basic parameters, such as acoustic energy (Grandjean et al., 2005) or maximum peak intensity (Ethofer et al., 2006a). Obviously, such approaches are limited by the fact that emotional information in the voice is transmitted via certain acoustic features (Banse and Scherer, 1996) and matching emotional stimuli for all possible acoustic parameters to neutral stimuli would presumably remove the emotional information conveyed by prosody. Therefore, an alternative approach was employed in the present study: first, the impact of single acoustic parameters or the conjoint effect of a set of acoustic parameters was evaluated in simple or multiple regression analyses, respectively. After removing all the variance correlating with the parameters in question, we tested whether the respective regression residuals still show the effect of stronger responsiveness to emotional than neutral prosody.

Here, we used event-related fMRI to investigate the neuronal correlates underlying automatic processing of a broad variety of prosodic categories expressing highly arousing emotional information. To this end, we presented words spoken in neutral, happy, erotic, angry, or fearful intonation in a passive-listening paradigm. Based on results of previous neuroimaging experiments (Grandjean et al., 2005, Ethofer et al., 2006a, Beaucousin et al., 2007), we hypothesized that voice-sensitive regions in mid STG show stronger blood oxygen level dependent (BOLD) responses to emotional intonations expressing high emotional arousal than to low arousing neutral intonations. To evaluate whether stronger responsiveness of this region can be explained by the perceived emotional arousal or by basic acoustic parameters, such as mean intensity, variability (standard deviation) of intensity, mean fundamental frequency, variability of fundamental frequency, or stimulus duration, separate simple regression analyses were carried out and the resulting regression residuals of emotional and neutral trials were statistically compared. Furthermore, a multiple regression analysis, including all five acoustic parameters investigated here, was conducted and regression residuals of emotional and neutral trials obtained in this analysis were compared to evaluate the conjoint effect of these parameters on event-related responses of voice-sensitive regions in mid STG.

Section snippets

Subjects

Twenty-four right-handed, healthy, native, or highly proficient German speaking participants (12 female, 12 male, mean age 25.1 years) were included in an fMRI study. None of the subjects previously participated in an fMRI experiment on emotional processing. Right-handedness was assessed using the Edinburgh Inventory (Oldfield, 1971). The study was approved by the Ethical Committee of the University of Tuebingen and conducted according to the Declaration of Helsinki.

Stimuli

126 adjectives and nouns

Results

Comparison of hemodynamic responses to emotional and neutral prosody revealed a cluster with two distinct maxima in the right temporal lobe, which were located within Heschl’s gyrus (MNI coordinates: x = 48; y =  24; z = 3; Z score = 4.06, cytoarchitectonic subdivision TE 1.1 as defined by probabilistic maps obtained from Morosan et al., 2001 and Rademacher et al., 2001) and in the mid STG (MNI coordinates: x = 63; y =  12; z = 0; Z score = 3.72, k = 99, p < 0.05 corrected, see Figs. 1a–c). Other brain regions

Discussion

The present study was conducted to investigate the neural correlates underlying automatic processing of emotional prosody. In particular, we wanted to test whether our region of interest in right mid STG shows increased responses to a broad variety of emotional intonations. Furthermore, we wanted to clarify whether stronger responses within this brain region can be explained on the basis of emotional arousal, a single acoustic parameter, or the conjoint effect of mean intensity, variability of

Conclusion

Prosody of all four emotional categories investigated in the present study induced stronger responses than neutral prosody in right mid STG indicating that this brain region is sensitive to a broad variety of behaviorally relevant information expressed by prosody. Demonstration of this effect in a passive-listening paradigm underlines its stimulus-driven nature and confirms previous suggestions (Ethofer et al., 2006a) that this effect occurs irrespective of task instructions. Event-related

Acknowledgments

Sarah Wiethoff was supported by a grant of the Studienstiftung des Deutschen Volkes. The study was supported by the Deutsche Forschungsgemeinschaft (Sonderforschungsbereich 550-B10) and by the Junior Science Program of the Heidelberger Academy of Sciences and Humanities.

References (55)

  • D.R.M. Langers et al.

    fMRI activation in relation to sound intensity and loudness

    NeuroImage

    (2007)
  • P. Morosan et al.

    Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system

    NeuroImage

    (2001)
  • R. Mottonen et al.

    Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus

    NeuroImage

    (2006)
  • R.C. Oldfield

    Assessment and analysis of handedness: the Edinburgh inventory

    Neuropsychologia

    (1971)
  • J. Rademacher et al.

    Probabilistic mapping and volume measurement of human primary auditory cortex

    NeuroImage

    (2001)
  • U. Schall et al.

    Functional neuroanatomy of auditory mismatch processing: an event-related fMRI study of duration-deviant oddballs

    NeuroImage

    (2003)
  • K.R. Scherer

    Vocal communication of emotion: a review of research paradigms

    Speech Commun.

    (2003)
  • D.S. Sokhi et al.

    Male and female voices activate distinct regions in the male brain

    NeuroImage

    (2005)
  • N. Tzourio-Mazoyer et al.

    Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

    NeuroImage

    (2002)
  • J.D. Warren et al.

    Analysis of the spectral envelope of sounds by the human brain

    NeuroImage

    (2005)
  • J.D. Warren et al.

    Human brain mechanisms for the early analysis of voices

    NeuroImage

    (2006)
  • T. Zaehle et al.

    Comparison of “silent” clustered and sparse temporal fMRI acquisitions in tonal and speech perception tasks

    NeuroImage

    (2007)
  • R.J. Zatorre et al.

    Structure and function of auditory cortex: music and speech

    Trends Cogn. Sci.

    (2002)
  • R. Banse et al.

    Acoustic profiles in vocal emotion expression

    J. Pers. Soc. Psychol.

    (1996)
  • V. Beaucousin et al.

    FMRI study of emotional speech comprehension

    Cereb. Cortex

    (2007)
  • P. Belin et al.

    Voice-selective areas in human auditory cortex

    Nature

    (2000)
  • P. Boersma

    Praat, a system for doing phonetics by computer

    Glot Int.

    (2001)
  • Cited by (144)

    • The hearing hippocampus

      2022, Progress in Neurobiology
    View all citing articles on Scopus
    View full text