Decoding Temporal Structure in Music and Speech Relies on Shared Brain Resources but Elicits Different Fine-Scale Spatial Patterns

Abrams, Daniel A.; Bhatara, Anjali; Ryali, Srikanth; Balaban, Evan; Levitin, Daniel J.; Menon, Vinod

doi:10.1093/cercor/bhq198

Abstract

Music and speech are complex sound streams with hierarchical rules of temporal organization that become elaborated over time. Here, we use functional magnetic resonance imaging to measure brain activity patterns in 20 right-handed nonmusicians as they listened to natural and temporally reordered musical and speech stimuli matched for familiarity, emotion, and valence. Heart rate variability and mean respiration rates were simultaneously measured and were found not to differ between musical and speech stimuli. Although the same manipulation of temporal structure elicited brain activation level differences of similar magnitude for both music and speech stimuli, multivariate classification analysis revealed distinct spatial patterns of brain responses in the 2 domains. Distributed neuronal populations that included the inferior frontal cortex, the posterior and anterior superior and middle temporal gyri, and the auditory brainstem classified temporal structure manipulations in music and speech with significant levels of accuracy. While agreeing with previous findings that music and speech processing share neural substrates, this work shows that temporal structure in the 2 domains is encoded differently, highlighting a fundamental dissimilarity in how the same neural resources are deployed.

auditory brainstem, auditory cortex, music, speech, syntax

Introduction

Music and speech are human cultural universals (Brown 1991) that manipulate acoustically complex sounds. Because of the ecological and behavioral significance of music and speech in human culture and evolution (Brown et al. 2006; Conard et al. 2009), there is great interest in understanding the extent to which the neural resources deployed for processing music and speech are distinctive or shared (Patel 2003, 2008).

The most substantial of the proposed links between music and language relates to syntax—the rules governing how musical or linguistic elements can be combined and expressed over time (Lerdahl and Jackendoff 1983). Here, we use the term “syntax” as employed in previous brain imaging studies of music (Maess et al. 2001; Levitin and Menon 2003, 2005; Koelsch 2005). In this context, syntax refers to temporal ordering of musical elements within a larger, hierarchical system. That is, the syntax of a musical sequence refers to the specific order in which notes appear, analogous to such structure in language. As in language, the order of elements influences meaning or semantics but is not its sole determinant.

One influential hypothesis—the “shared syntactic integration resource hypothesis” (SSIRH; [Patel 2003])—proposes that syntactic processing for language and music share a common set of neural resources instantiated in prefrontal cortex (PFC). Indirect support of SSIRH has been provided by studies implicating “language” areas of the inferior frontal cortex (IFC) in the processing of tonal and harmonic irregularities (Maess et al. 2001; Koelsch et al. 2002; Janata 2005) and coherent temporal structure in naturalistic musical stimuli (Levitin and Menon 2003). Functional brain imaging studies have implicated distinct subregions of the IFC in speech, with dorsal–posterior regions (pars opercularis and pars triangularis, Brodmann Area [BA] 44 and 45) implicated in both phonological and syntactic processing and ventral–anterior regions (pars opercularis, BA 47) implicated in syntactic and semantic processing (Bookheimer 2002; Grodzinsky and Friederici 2006). Anterior regions of superior temporal cortex have also been implicated in the processing of structural elements of both music and language (Koelsch 2005; Callan et al. 2006). Since most brain imaging studies have used either music or speech stimuli, differential involvement of these neural structures in music and speech processing is at present unclear.

A key goal of our study was to directly test the SSIRH and examine whether distinct or shared neural resources are deployed for processing of syntactic structure in music and speech. Given that the ordering of elements in music and speech represents a fundamental aspect of syntax in these domains, our approach was to examine the neural correlates of temporal structure processing in music and speech using naturalistic, well-matched music and speech stimuli in a within-subjects design. Functional magnetic resonance imaging (fMRI) was used to quantify blood oxygen level–dependent activity patterns in 20 participants while they listened to musical and speech excerpts matched for emotional content, arousal, and familiarity in a within-subjects design. Importantly, each individual stimulus had a temporally reordered counterpart in which brief (∼350 ms) segments of the music and speech stimuli were rearranged within the musical or speech passage, which served as an essential control that preserved many acoustic features but disrupted the overall temporal structure, including the rhythmic properties, of the signal (Fig. 1). Analyses employed both univariate and multivariate pattern analysis (MPA) techniques. The reason for employing these 2 fMRI analysis techniques is that they provide complimentary information regarding the neural substrates underlying cognitive processes (Schwarzlose et al. 2008): univariate methods were used to examine whether particular brain regions show greater magnitude of activation for manipulations to speech or music structure; multivariate methods were used to investigate whether spatial patterns of fMRI activity are sensitive to manipulations to music and speech structure. A novel methodological aspect is the use of a support vector machine (SVM)-based algorithm, along with a multisubject cross-validation procedure, for a robust comparison of decoded neural responses with temporal structure in music and speech.

Figure 1.

Open in new tab Download slide

Music and speech stimuli. Examples of normal and reordered speech (left) and music (right) stimuli. The top and middle panels include an oscillogram of the waveform (top) and a sound spectrogram (bottom). Frequency spectra of the normal and reordered stimuli are plotted at the bottom of each side.

Materials and Methods

Participants

Participants were 20 right-handed Stanford University undergraduate and graduate students with no psychiatric or neurological disorders, as assessed by self-report and the SCL-90-R (Derogatis 1992); using adolescent norms are appropriate for nonpatient college students as suggested in a previous study (Todd et al. 1997). All participants were native English speakers and nonmusicians. Following previously used criteria (Morrison et al. 2003), we define nonmusicians as those who have had 2 years or less of participation in an instrumental or choral group and less than 1 year of private musical lessons. The participants received $50 in compensation for participation. The Stanford University School of Medicine Human Subjects committee approved the study, and informed consent was obtained from all participants.

Stimuli

Music stimuli consisted of 3 familiar and 3 unfamiliar symphonic excerpts composed during the classical or romantic period, and speech stimuli were familiar and unfamiliar speeches (e.g., Martin Luther King, President Roosevelt) selected from a compilation of famous speeches of the 20th century (Various 1991; stimuli are listed in Supplementary Table 1). All music and speech stimuli were digitized at 22 050 Hz sampling rate in 16 bit. In a pilot study, a separate group of participants was used to select music and speech samples that were matched for emotional content, attention, memory, subjective interest, level of arousal, and familiarity.

Stimulus Selection

Fifteen undergraduate students who did not participate in the fMRI study used a scale of –4 to 4 to rate the 12 musical excerpts and 24 speech excerpts on 10 different dimensions. These participants were compensated $10 for their time.

The first goal was to obtain a set of 12 speech stimuli that were well matched to the music samples. For each emotion, all the ratings for all the music and speech stimuli, for all subjects, were pooled together in computing the mean and standard deviation used to normalize responses for that emotion. We analyzed the correlations between semantically related pairs of variables, and we found several high correlations among them: for example, ratings of “dissonant” and “happy” were highly correlated, (r = −0.75) indicating that these scales were measuring the same underlying concept. Therefore, we eliminated some redundant categories from further analysis (dissonant/consonant was correlated with angry/peaceful, r = 0.84 and with happy/sad, r = −0.75; tense/relaxed was correlated with angry/peaceful, r = 0.58; annoying/unannoying was correlated with boring/interesting, r = 0.67). We then selected the 12 speeches that most closely matched each of the individual pieces of music on standardized values of the ratings. Correlations between the ratings for the retained speeches and music were all significant (range: r = 0.85, P < 0.04 to r = 0.98, P < 0.001), and independent 2-sample t-tests for the mean values of each yielded no significant difference between the ratings of any of the pairs. Importantly, there were no significant differences between speech and music samples for any emotion when ratings for all music samples were directly compared with speech samples (Supplementary Table 2). Following this, we sought to narrow the sample to only 6 speech and 6 music excerpts (3 familiar and 3 unfamiliar of each) to keep the actual scan session to a manageable length. In order to do this, we performed a least-squares analysis, identifying those pairs of music and speeches that had the smallest difference between them, and thus were most easily comparable. For this analysis, we used the 6 remaining scales (with the exception of familiarity) and calculated the total squared difference between all pairs of familiar and all pairs of unfamiliar music and speeches. We selected the 6 (3 familiar and 3 unfamiliar) music–speech pairs with the least difference between them to be our stimuli (range of total squared difference: 6.8–71.7; range of 6 selected: 6.8–13.6).

Rationale for Stimulus Manipulation

All music and speech stimuli were “scrambled” as a means of altering the rich temporal structure inherent in these signals. Scrambling in this context refers to rearranging brief (<350 ms) segments of music and speech stimuli while controlling for a number of acoustical variables (please see “Stimulus Creation” below for details). The choice for the 350 ms maximum length was found empirically: this length preserved lower level phonetic segments and short words in speech and individual notes in music but disrupted meaningful clusters of words in speech and the continuity of short segments of melody and rhythmic figures in music. Additionally, to minimize the possibility that listeners would hear a pulse or “tactus” in the scrambled versions, we used windows of variable size. We acknowledge that music and speech have inherently different acoustical characteristics and that the ideal time window for scrambling the stimuli is currently unknown. Nevertheless, the value of 350 ms was arrived at after significant evaluation and is well suited as a means of reordering the elements of music and speech while leaving key elements intact.

Stimulus Creation

The scrambling technique used here was based on previously used methods (Levitin and Menon 2003; Koelsch 2005) but included more refined stimulus controls than were present in those studies to better insure the exact acoustic comparability of the stimuli. Specifically, temporal structure manipulations in the current study removed brief “gaps” and loud–soft “transitions” in the reordered stimuli that were audible in these previous studies. Each music and speech excerpt was 22–30 s in length. To create stimuli for the experimental conditions, each file was processed as follows using the SIGNAL Digital Signal Processing Language (Engineering Design). The original digitized file had its DC level set to zero, after which the envelope contour was extracted (absolute value smoothed with a 20 ms window and peak normalized to 1). A copy of the envelope was gated at 0.1 of peak threshold to identify “low-amplitude” time intervals, another copy was gated at 0.2 of peak amplitude to identify “high-amplitude” time intervals, and the rest of the time intervals were classified as “midamplitude.” The lengths of each type of interval were extracted and stored sequentially; lengths were examined for any intervals longer than 350 ms, which were divided into pieces of 350-ms length plus a piece of an appropriate size <350 ms for the remainder. Each of the resulting sequence of amplitude intervals was then assigned an integer number according to its position in the sequence. A pseudorandom reordering of these integers was produced subject to 3 constraints: 2 segments that had previously occurred together were not permitted to do so, the distribution of transitions between segments of different loudness had to be preserved, and the distribution of transitions between segments of different length also had to be preserved in the new ordering. Reordered stimuli were constructed by taking each piece from the original sequence, applying a 5-ms cosine envelope to its edges, and pasting it into its appropriate position in the new sequence as determined by a random number sequence. The speech samples were low-pass filtered at 2400 Hz to remove extraneous high frequencies. To increase the similarities between the original and reordered excerpts, the segments identified in the original versions had 5-ms cosine envelopes applied to their edges in exactly the same way as the reordered versions, thus creating microgaps in any notes held longer than 350 ms.

fMRI Task

Music and speech stimuli were presented in 2 separate runs each lasting about 7 min; the order of runs was randomized across participants. Each run consisted of 12 blocks of alternating original and reordered excerpts, each lasting 23–28 s. The block order and the order of the individual excerpts were counterbalanced across participants. Participants were instructed to press a button on an MRI-compatible button box whenever a sound excerpt ended. Response times were measured from the beginning of the experiment and the beginning of the excerpt. The button box malfunctioned in 8 of the scans and recorded no data but because the main purpose of the button press was to ensure that participants were paying attention, we retained those scans, and they were not statistically different from the other scans. All participants reported listening attentively to the music and speech stimuli. Music and speech stimuli were presented to participants in the scanner using Eprime V1.0 (Psychological Software Tools, 2002). Participants wore custom-built headphones designed to reduce the background scanner noise to approximately 70 dBA (Menon and Levitin 2005).

Postscan Assessments

Immediately following the scan, participants filled out a form to indicate which of the 2 conditions, music or speech, was best described by each of the following 12 semantic descriptors: Calm, Familiar, Unpleasant, Happy, Tense, Interesting, Dissonant, Sad, Annoying, Angry, Moving, and Boring. The data were characterized by using one binomial test for each descriptor (with a criterion of P < 0.05) in order to indicate when a term was applied more to one stimulus category than the other. Because participants showed a slight tendency to choose “speech” more often than “music” (55% of the time), the binomial equation was set at P = 0.55 and q = 0.45.

fMRI Data Acquisition

Images were acquired on a 3 T GE Signa scanner using a standard GE whole-head coil (software Lx 8.3). A custom-built head holder was used to prevent head movement during the scan. Twenty-eight axial slices (4.0-mm thick, 1.0-mm skip) parallel to the AC/PC line and covering the whole brain were imaged with a temporal resolution of 2 s using a $T_{2}^{*}$ -weighted gradient-echo spiral in–out pulse sequence (time repetition [TR] = 2000 ms, time echo [TE] = 30 ms, flip angle = 80°). The field of view was 200 × 200 mm, and the matrix size was 64 × 64, providing an in-plane spatial resolution of 3.125 mm. To reduce blurring and signal loss arising from field inhomogeneities, an automated high-order shimming method based on spiral in–out acquisitions was used before acquiring functional MRI scans (Kim et al. 2000).

To aid in localization of the functional data, a high-resolution T₁-weighted spoiled GRASS gradient recalled inversion-recovery 3D MRI sequence was used with the following parameters: TR = 35 ms; TE = 6.0 ms; flip angle = 45°; 24 cm field of view; 124 slices in coronal plane; 256 × 192 matrix; 2 averages, acquired resolution = 1.5 × 0.9 × 1.1 mm. The images were reconstructed as a 124 × 256 × 256 matrix with a 1.5 × 0.9 × 0.9-mm spatial resolution. Structural and functional images were acquired in the same scan session.

fMRI Data Analysis

Preprocessing

The first 2 volumes were not analyzed to allow for signal equilibration. A linear shim correction was applied separately for each slice during reconstruction using a magnetic field map acquired automatically by the pulse sequence at the beginning of the scan (Glover and Lai 1998). Functional MRI data were then analyzed using SPM5 analysis software (http://www.fil.ion.ucl.ac.uk/spm). Images were realigned to correct for motion, corrected for errors in slice timing, spatially transformed to standard stereotaxic space (based on the Montreal Neurological Institute [MNI] coordinate system), resampled every 2 mm using sinc interpolation, and smoothed with a 6-mm full-width at half-maximum Gaussian kernel to decrease spatial noise prior to statistical analysis. Translational movement in millimeters (x, y, z) and rotational motion in degrees (pitch, roll, and yaw) was calculated based on the SPM5 parameters for motion correction of the functional images in each participant. No participants had movement greater than 3-mm translation or 3 degrees of rotation; therefore, none were excluded from further analysis.

Quality Control

As a means of assessing the validity of individual participants’ fMRI data, we performed an initial analysis that identified images with poor image quality or artifacts. To this end, we calculated the standard deviation of each participants’ image (VBM toolboxes: http://dbm.neuro.uni-jena.de/vbm/) under the assumption that a large standard deviation may indicate the presence of artifacts in the image. The squared distance to the mean was calculated for each image. Results revealed one outlier among the 20 participants. This participant was >6 standard deviations from the mean on a number of images. Therefore, this participant was removed from all subsequent statistical analyses.

Univariate Statistical Analysis

Task-related brain activation was identified using a general linear model and the theory of Gaussian random fields as implemented in SPM5. Individual subject analyses were first performed by modeling task-related conditions as well as 6 movement parameters from the realignment procedure mentioned above. Brain activity related to the 4 task conditions (music, reordered music, speech, reordered speech) was modeled using boxcar functions convolved with a canonical hemodynamic response function and a temporal dispersion derivative to account for voxel-wise latency differences in hemodynamic response. Low-frequency drifts at each voxel were removed using a high-pass filter (0.5 cycles/min), and serial correlations were accounted for by modeling the fMRI time series as a first-degree autoregressive process (Poline et al. 1997). Voxel-wise t-statistics maps for each condition were generated for each participant using the general linear model, along with the respective contrast images. Group-level activation was determined using individual subject contrast images and a second-level analysis of variance (ANOVA). The 2 main contrasts of interest were (music–reordered music) and (speech–reordered speech). Significant clusters of activation were determined using a voxel-wise statistical height threshold of P < 0.01, with family-wise error (FWE) corrections for multiple spatial comparisons at the cluster level (P < 0.05).

Activation foci were superimposed on high-resolution T₁-weighted images. Their locations were interpreted using known functional neuroanatomical landmarks (Duvernoy 1995; Duvernoy et al. 1999) as has been done in our previous studies (e.g., Menon and Levitin 2005). Anatomical localizations were cross-validated with the atlas of Mai et al. (2004).

MPA

A multivariate statistical pattern recognition-based method was used to find brain regions that discriminated between temporal structure changes in music and speech (Kriegeskorte et al. 2006; Haynes et al. 2007; Ryali et al. 2010) utilizing a nonlinear classifier based on SVM algorithms with radial basis function (RBF) kernels (Muller et al. 2001). Briefly, at each voxel v_i, a 3 × 3 × 3 neighborhood centered at v_i was defined. The spatial pattern of voxels in this block was defined by a 27-dimensional vector. SVM classification was performed using LIBSVM software (www.csie.ntu.edu.tw/∼cjlin/libsvm). For the nonlinear SVM classifier, we needed to specify 2 parameters, C (regularization) and α (parameter for RBF kernel), at each searchlight position. We estimated optimal values of C and α and the generalizability of the classifier at each searchlight position by using a combination of grid search and cross-validation procedures. In earlier approaches (Haynes et al. 2007), linear SVM was used, and the free parameter, C, was arbitrarily set. In the current work, however, we have optimized the free parameters (C and α) based on the data, thereby designing an optimal classifier. In M-fold cross-validation procedure, the data is randomly divided into M-folds. M-1 folds were used for training the classifier, and the remaining fold was used for testing. This procedure is repeated M times wherein a different fold was left out for testing. We estimated class labels of the test data at each fold and computed the average classification accuracy obtained at each fold, termed here as the cross-validation accuracy (CVA). The optimal parameters were found by grid searching the parameter space and selecting the pair of values (C, α) at which the M-fold CVA is maximum. In order to search for a wide range of values, we varied the values of C and α from 0.125 to 32 in steps of 2 (0.125, 0.25, 0.5, … , 16, 32). Here, we used a leave-one-out cross-validation procedure where M = N (where N is the number of data samples in each condition/class). The resulting 3D map of CVA at every voxel was used to detect brain regions that discriminated between the individual subjects’ t-score maps for each of the 2 experimental conditions: (music–reordered music) and (speech–reordered speech). Under the null hypothesis that there is no difference between the 2 conditions, the CVAs were assumed to follow the binomial distribution Bi(N, P) with parameters N equal to the total number of participants in 2 groups and P equal to 0.5 (under the null hypothesis, the probability of each group is equal; [Pereira et al. 2009]). The CVAs were then converted to P values using the binomial distribution.

Interpretation of Multivariate Pattern Analysis

The results from the multivariate analysis are interpreted in a fundamentally different manner as those described for traditional univariate results. Univariate results show which voxels in the brain have greater magnitude of activation for one stimulus condition (or contrast) relative to another. Multivariate results show which voxels in the brain are able to discriminate between 2 stimulus conditions or contrasts based on the pattern of fMRI activity measured across a predetermined number of voxels (a 3 × 3 × 3 volume of voxels in the current study). It is critical to note that, unlike the univariate method, MPA does not provide information about which voxels “prefer” a given stimulus condition relative to second condition. Our multivariate analyses identify the location of voxels that consistently demonstrate a fundamentally different spatial pattern of activity for one stimulus condition relative to another (Haynes and Rees 2006; Kriegeskorte et al. 2006; Schwarzlose et al. 2008; Pereira et al. 2009).

Anatomical ROIs

We used the Harvard–Oxford probabilistic structural atlas (Smith et al. 2004) to determine classification accuracies within specific cortical regions of interest (ROIs). A probability threshold of 25% was used to define each anatomical ROI. We recognize that the precise boundaries of IFC regions BA 44, 45, and 47 are currently unknown. To address this issue, we compared the Harvard–Oxford probabilistic structural atlas with the Probabilistic Cytoarchitectonic Maps (Eickhoff et al. 2005) and the AAL atlas (Tzourio-Mazoyer et al. 2002) for BAs 44 and 45 and found that while there are some differences in these atlases, the core regions of these brain structures show significant overlap.

For subcortical structures, we used auditory brainstem ROIs based on a previous structural MRI study (Muhlau et al. 2006). Based on the peaks reported by Muhlau et al. (2006), we used spheres with a radius of 5 mm centered at ±10, –38, –45 (MNI coordinates) for the cochlear nuclei ROIs, ±13, –35, –41 for the superior olivary complex ROIs, and ±6, –33, –11 for the inferior colliculus ROIs. A sphere with a radius of 8 mm centered at ±17, –24, –2 was used for the medial geniculate ROI.

Post hoc ROI Analysis

The aim of this analysis was to determine whether voxels that showed superthreshold classification in the MPA during temporal structure processing in music and speech also differed in activation levels. This post hoc analysis was performed using the same 11 bilateral frontal and temporal cortical ROIs noted above. A brain mask was first created consisting of voxels that had >63% classification accuracy from the MPA. This mask was then merged using the logical “AND” operator with each of the 11 bilateral frontal and temporal anatomical ROIs (Smith et al. 2004). Within these voxels, ANOVAs were used to compare mean activation levels during temporal structure processing in music and speech. ROI analyses were conducted using the MarsBaR toolbox (http://marsbar.sourceforge.net).

Physiological Data Acquisition and Analysis

Acquisition

Peripheral vascular physiological data was acquired using a photoplethysmograph attached to the participant’s left index finger. Pulse data was acquired as a sequence of triggers in time at the zero crossings of the pulse waveform. Respiration data was acquired using the scanner’s pneumatic belt placed on the participant’s abdomen. Respiration and cardiac rates were recorded using a data logger (PowerLab, AD Instruments, Inc.) connected to the scanner’s monitoring system and sampled at 40 Hz.

Preprocessing and Artifact Removal

Interbeat intervals in the pulse data were calculated as the intervals between the triggers, these interbeat intervals are then representative of values at the midpoint of each interval. The disadvantage with this description is that the interbeat intervals are represented at nonuniform intervals in time. To overcome this, these intervals were then resampled to a uniform rate of 2 Hz using cubic spline interpolation prior to analysis. Artifacts occur in the beat-to-beat interval data due to skipped or extra beats. Artifacts were detected by comparing the beat-to-beat interval values with the median of their predecessors and successors in a time window. Set comparison thresholds were used for elimination of unusually small (caused by extra beats) and unusually large (caused by skipped beats) intervals. Artifact removal was performed prior to interpolation and resampling. Data for each participant was further normalized to zero mean and unit variance to facilitate comparisons across participants.

Analysis

Heart rate variability (HRV) in a time window was calculated as the variance of the interbeat interval within that time window (Critchley et al. 2003). A physiological observation window was defined by the length of each stimulus epoch. HRV and mean breaths per minute in the observation windows were combined (pooled) across stimuli in each experimental condition (music, reordered music, speech, reordered speech) and across participants. HRV and breaths per minute were compared between conditions using paired t-tests.

Results

Physiological and Behavioral Analyses

Participants exhibited increases in HRV and respiration rate in each of the experimental conditions (speech, music, and their reordered counterparts) compared with the baseline (rest), but we found no mean differences in these variables between conditions (Fig. 2), validating that the stimuli were well matched for arousal and emotional reactivity in study participants.

Figure 2.

Open in new tab Download slide

Equivalence of physiological measures by experimental condition. (A) Mean breaths per minute for each stimulus type. (B) HRV for each stimulus type as indexed by the mean of individual participants’ standard deviations over the course of the experiment. There were no significant differences within or across stimulus types.

Activation and Deactivation during Music and Speech Processing

The goal of this analysis was to 1) verify that our temporal and frontal lobe ROIs were strongly activated by music and speech and 2) identify brain areas that showed task-induced deactivation (greater activation during the reordered than the ordered conditions). As expected, normal and reordered music and speech-activated broad regions of the frontal and temporal lobes bilaterally, including primary, nonprimary, and association areas of auditory cortex, IFC regions including Broca’s (BA 44 and 45) and the pars orbitalis region (BA 47), as well as subcortical structures, including the thalamus, brainstem, and cerebellum (Fig. 3). Within the temporal lobe, the left superior and middle temporal gyri showed the most extensive activation. In the frontal lobe, Broca’s area (BA 44 and 45) showed the most extensive activations.

Figure 3.

Open in new tab Download slide

Activation to music and speech. Surface rendering and axial slice (Z = −2) of cortical regions activated by music and speech stimuli show strong responses in the IFC and the superior and middle temporal gyri. The contrast used to generate this figure was (speech + reordered speech + music + reordered music) – rest. This image was thresholded using a voxel-wise statistical height threshold of (P < 0.01), with FWE corrections for multiple spatial comparisons at the cluster level (P < 0.05). Functional images are superimposed on a standard brain from a single normal subject (MRIcroN: ch2bet.nii.gz).

We also observed significant deactivation in the posterior cingulate cortex (BA 7, 31), the ventromedial PFC (BA 10, 11, 24, 32), and the visual cortex (BA 18, 19, 37), as shown in Supplementary Figure 1. This pattern is consistent with previous literature on task-general deactivations reported in the literature (Greicius et al. 2003). Because such task-general processes are not germane to the goals of our study, these large deactivated clusters were excluded from further analysis by constructing a mask based on stimulus-related activation. We identified brain regions that showed greater activation across all 4, normal and reordered, music and speech conditions compared with “rest” using a liberal height (P < 0.05) and cluster-extent threshold (P < 0.05), and binarized the resulting image to create a mask. This mask image was used in subsequent univariate and multivariate analyses.

Structure Processing in Music Versus Speech—Univariate Analysis

Next, we turned to the main goal of our study, which was to compare temporal structure processing in music versus speech. For this purpose, we compared fMRI response during (music–reordered music) with (speech–reordered speech) using a voxel-wise analysis. fMRI signal levels were not significantly different for temporal structure processing between musical and speech stimuli (P < 0.01, FWE corrected). fMRI signal levels were not significantly different for temporal structure processing between music and speech stimuli even at a more liberal height threshold (P < 0.05) and extent thresholds using corrections for false discovery rate (P < 0.05) or cluster-extent (P < 0.05). These results suggest that for this set of regions, processing the same temporal structure differences in music and speech evokes similar levels of fMRI signal change.

Structure Processing in Music Versus Speech—MPA

We performed MPA to examine whether localized patterns of fMRI activity could accurately distinguish between brain activity in the (music–reordered music) and (speech–reordered speech) conditions. As noted above, to facilitate interpretation of our findings, this analysis was restricted to brain regions that showed significant activation during the 4 stimulus conditions, contrasted with rest. This included a wide expanse of temporal and frontal cortices that showed significant activation for the music and speech stimuli (Fig. 2). While these regions are identified using group-level activation across the 4 stimulus conditions, the activity patterns discriminated by MPA within this mask consist of both activating and deactivating voxels from individual subjects, and both activating and deactivating voxels contribute to classification results.

MPA analyses yielded “classification maps” in which the classification accuracy is computed for a 3 × 3 × 3 volume centered at each voxel. A classification accuracy threshold of 63%, representing accuracy that is significantly greater than random performance at the P < 0.05 level, was selected for thresholding these maps. As noted below, classification accuracies in many brain regions far exceeded this threshold.

Several key cortical, subcortical, and cerebellar regions were highly sensitive to differences between the same structural manipulations in music and speech. High classification accuracies (>75%; P < 0.001) were observed in the left IFC pars opercularis (BA 44), right IFC pars triangularis (BA 45), and bilateral IFC pars orbitalis (BA 47; Fig. 4). Several regions within the temporal lobes bilaterally also showed high classification accuracies, including anterior and posterior superior temporal gyrus (STG) and middle temporal gyrus (MTG) (BA 22 and 21), the temporal pole, and regions of the superior temporal plane including Heschl’s gyrus (HG) (BA 41), the planum temporal (PT), and the planum polare (PP) (BA 22; Fig. 5). Across the entire brain, the highest classification accuracies were detected in the temporal lobe, with accuracies >90% (P < 0.001) in left-hemisphere pSTG and right-hemisphere aSTG and aMTG (Fig. 5). Table 1 shows the classification accuracy in each cortical ROI.

Table 1

Descriptive statistics from multivariate pattern analysis

Cortical structure (Harvard–Oxford map)	Percent of voxels > threshold	Mean class, accuracy (%)	Maximum class, accuracy (%)	Maximum, Z score
Left BA44	40.6	61.7	86.8	4.99
Left BA45	22.3	58.2	73.7	3.15
Left BA47	16.1	57.0	76.3	3.50
Left Heschl	55.6	62.8	78.9	3.85
Left MTGAnt	98.5	77.6	89.5	5.40
Left MTGPost	81.6	68.8	86.8	4.99
Left polare	51.7	62.9	81.6	4.22
Left STGAnt	92.9	73.9	89.5	5.40
Left STGPost	80.3	69.5	92.1	5.83
Left TempPole	36.9	59.8	78.9	3.85
Left temporale	52.7	62.9	89.5	5.40
Right BA44	22.9	57.9	76.3	3.50
Right BA45	45.8	62.1	84.2	4.60
Right BA47	35.1	59.5	76.3	3.50
Right Heschl	28.0	58.8	73.7	3.15
Right MTGAnt	57.1	63.9	78.9	3.85
Right MTGPost	65.2	66.3	92.1	5.83
Right polare	34.6	59.6	76.3	3.50
Right STGAnt	52.1	63.4	92.1	5.83
Right STGPost	51.0	62.8	89.5	5.40
Right TempPole	15.7	56.3	76.3	3.50
Right temporale	55.1	63.3	84.2	4.60

Cortical structure (Harvard–Oxford map)	Percent of voxels > threshold	Mean class, accuracy (%)	Maximum class, accuracy (%)	Maximum, Z score
Left BA44	40.6	61.7	86.8	4.99
Left BA45	22.3	58.2	73.7	3.15
Left BA47	16.1	57.0	76.3	3.50
Left Heschl	55.6	62.8	78.9	3.85
Left MTGAnt	98.5	77.6	89.5	5.40
Left MTGPost	81.6	68.8	86.8	4.99
Left polare	51.7	62.9	81.6	4.22
Left STGAnt	92.9	73.9	89.5	5.40
Left STGPost	80.3	69.5	92.1	5.83
Left TempPole	36.9	59.8	78.9	3.85
Left temporale	52.7	62.9	89.5	5.40
Right BA44	22.9	57.9	76.3	3.50
Right BA45	45.8	62.1	84.2	4.60
Right BA47	35.1	59.5	76.3	3.50
Right Heschl	28.0	58.8	73.7	3.15
Right MTGAnt	57.1	63.9	78.9	3.85
Right MTGPost	65.2	66.3	92.1	5.83
Right polare	34.6	59.6	76.3	3.50
Right STGAnt	52.1	63.4	92.1	5.83
Right STGPost	51.0	62.8	89.5	5.40
Right TempPole	15.7	56.3	76.3	3.50
Right temporale	55.1	63.3	84.2	4.60

Open in new tab

Table 1

Descriptive statistics from multivariate pattern analysis

Cortical structure (Harvard–Oxford map)	Percent of voxels > threshold	Mean class, accuracy (%)	Maximum class, accuracy (%)	Maximum, Z score
Left BA44	40.6	61.7	86.8	4.99
Left BA45	22.3	58.2	73.7	3.15
Left BA47	16.1	57.0	76.3	3.50
Left Heschl	55.6	62.8	78.9	3.85
Left MTGAnt	98.5	77.6	89.5	5.40
Left MTGPost	81.6	68.8	86.8	4.99
Left polare	51.7	62.9	81.6	4.22
Left STGAnt	92.9	73.9	89.5	5.40
Left STGPost	80.3	69.5	92.1	5.83
Left TempPole	36.9	59.8	78.9	3.85
Left temporale	52.7	62.9	89.5	5.40
Right BA44	22.9	57.9	76.3	3.50
Right BA45	45.8	62.1	84.2	4.60
Right BA47	35.1	59.5	76.3	3.50
Right Heschl	28.0	58.8	73.7	3.15
Right MTGAnt	57.1	63.9	78.9	3.85
Right MTGPost	65.2	66.3	92.1	5.83
Right polare	34.6	59.6	76.3	3.50
Right STGAnt	52.1	63.4	92.1	5.83
Right STGPost	51.0	62.8	89.5	5.40
Right TempPole	15.7	56.3	76.3	3.50
Right temporale	55.1	63.3	84.2	4.60

Cortical structure (Harvard–Oxford map)	Percent of voxels > threshold	Mean class, accuracy (%)	Maximum class, accuracy (%)	Maximum, Z score
Left BA44	40.6	61.7	86.8	4.99
Left BA45	22.3	58.2	73.7	3.15
Left BA47	16.1	57.0	76.3	3.50
Left Heschl	55.6	62.8	78.9	3.85
Left MTGAnt	98.5	77.6	89.5	5.40
Left MTGPost	81.6	68.8	86.8	4.99
Left polare	51.7	62.9	81.6	4.22
Left STGAnt	92.9	73.9	89.5	5.40
Left STGPost	80.3	69.5	92.1	5.83
Left TempPole	36.9	59.8	78.9	3.85
Left temporale	52.7	62.9	89.5	5.40
Right BA44	22.9	57.9	76.3	3.50
Right BA45	45.8	62.1	84.2	4.60
Right BA47	35.1	59.5	76.3	3.50
Right Heschl	28.0	58.8	73.7	3.15
Right MTGAnt	57.1	63.9	78.9	3.85
Right MTGPost	65.2	66.3	92.1	5.83
Right polare	34.6	59.6	76.3	3.50
Right STGAnt	52.1	63.4	92.1	5.83
Right STGPost	51.0	62.8	89.5	5.40
Right TempPole	15.7	56.3	76.3	3.50
Right temporale	55.1	63.3	84.2	4.60

Open in new tab

Figure 4.

Open in new tab Download slide

MPA of temporal structure in music and speech. (A–B) Classification maps for temporal structure in music and speech superimposed on a standard brain from a single normal subject. (C) Color coded location of IFC ROIs. (D) Maximum classification accuracies in BAs 44 (yellow), 45 (brown), and 47 (cyan). Cross hair indicates voxel with maximum classification accuracy.

Figure 5.

Open in new tab Download slide

MPA of temporal structure in music and speech. (A–C) Classification maps for temporal structure in music and speech superimposed on a standard brain from a single normal subject. (D) Maximum classification accuracies for PT (pink), HG (cyan), and PP (orange) in the superior temporal plane. (E) Color coded location of temporal lobe ROIs. (F) Maximum classification accuracies for pSTG (yellow), pMTG (red), aSTG (white), aMTG (blue), and tPole (green) in middle and superior temporal gyri as well as the temporal pole. a, anterior; p, posterior; tPole, temporal pole.

Subcortical nuclei were also sensitive to differences between normal and reordered stimuli in music and speech (Fig. 6, left and center). The anatomical locations of these nuclei were specified using ROIs based on a prior structural MRI study (Muhlau et al. 2006). Brainstem auditory nuclei, including bilateral cochlear nucleus, left superior olive, and right inferior colliculus and medial geniculate nucleus, also showed classification values that exceeded the 63% threshold. Other regions that were sensitive to the temporal structure manipulation were the bilateral amygdale, hippocampi, putamens and caudate nuclei of the dorsal striatum, and the left cerebellum.

Figure 6.

Open in new tab Download slide

MPA of temporal structure in music and speech. Classification maps for brainstem regions (A) cochlear nucleus (cyan) and (B) inferior colliculus (green) superimposed on a standard brain from a single normal subject (MRIcroN: ch2.nii.gz).

Structure Processing in Music Versus Speech—Signal Levels in ROIs with High Classification Rates

A remaining question is whether the voxels sensitive to music and speech temporal structure manipulations identified in the classification analysis arise from local differences in mean response magnitude. To address this question, we examined activity levels in 11 frontal and temporal cortical ROIs that showed superthreshold classification rates. We performed a conventional ROI analysis comparing signal changes in the music and speech structure conditions. We found that mean response magnitude was statistically indistinguishable for music and speech temporal structure manipulations within all frontal and temporal lobe ROIs (range of P values: 0.11 through 0.99 for all ROIs; Fig. 7).

Figure 7.

Open in new tab Download slide

ROI signal change analysis. Percentage signal change in ROIs for music structure (blue) and speech structure (red) conditions. ROIs were constructed using superthreshold voxels from the classification analysis in 11 frontal and temporal cortical regions bilaterally. There were no significant differences in signal change to temporal structure manipulations in music and speech. TP, temporal pole.

Discussion

Music and speech stimuli and their temporally reordered counterparts were presented to 20 participants to examine brain activation in response to the same manipulations of temporal structure. Important strengths of the current study that differentiate it from its predecessors include the use of the same stimulus manipulation in music and speech, a within-subjects design, and tight controls for arousal and emotional content. The principal result both supports and extends the SSIRH (Patel 2003). The same temporal manipulation in music and speech produced fMRI signal changes of the same magnitude in prefrontal and temporal cortices of both cerebral hemispheres in the same group of participants. However, MPA revealed significant differences in the fine-grained pattern of fMRI signal responses, indicating differences in dynamic temporal structure processing in the 2 domains. In particular, the same temporal structure manipulation in music and speech was found to be differentially processed by a highly distributed network that includes the IFC, anterior and posterior temporal cortex, and the auditory brainstem bilaterally. The existence of decodable fine-scale pattern differences in fMRI signals suggests that the 2 domains share similar anatomical resources but that the resources are accessed and used differently within each domain.

IFC Involvement in Processing Temporally Manipulated Music and Speech Stimuli

Previous studies have shown that subregions of the IFC are sensitive to semantic and syntactic analysis in music and speech. Semantic analysis of word and sentence stimuli have revealed activation in left BA 47 (Dapretto and Bookheimer 1999; Roskies et al. 2001; Wagner et al. 2001; Binder et al. 2009) and left BA 45 (Newman et al. 2001; Wagner et al. 2001), while the analysis of language-based syntax has typically revealed activation of left BA 44 (Dapretto and Bookheimer 1999; Ni et al. 2000; Friederici et al. 2006; Makuuchi et al. 2009). In the music domain, BA 44 has also been implicated in syntactic processing. For example, magnetoencephalography (Maess et al. 2001) and fMRI (Koelsch et al. 2002) studies have shown increased cortical activity localized to Broca’s Area (BA 44) and its right-hemisphere homolog in response to chord sequences ending with “out-of-key” chords relative to “in-key” chords. A prior study has shown that the anterior and ventral aspects of the IFC within the pars orbitalis (BA 47) are sensitive to temporal structure variation in music (Levitin and Menon 2003, 2005). The present study differs from all previous studies in its use of an identical, well-controlled structural manipulation of music and speech stimuli to examine differences in fine-scale patterns of fMRI activity in the same set of participants.

The IFC distinguished between the same temporal structure manipulation in music and speech with classification accuracies between 70% and 85%. Importantly, all 3 subdivisions of the IFC—BA 44, 45, and 47—were equally able to differentiate the same manipulation in the 2 domains (Fig. 4). Furthermore, both the left and right IFC were sensitive to temporal structure, although the relative classification rates varied considerably across the 3 subdivisions and 2 hemispheres. The inferior frontal sulcus was also sensitive to temporal structure, consistent with a recent study that showed sensitivity of the inferior frontal sulcus to hierarchically structured sentence processing in natural language stimuli (Makuuchi et al. 2009).

These results extend the SSIRH by showing that both left and right hemisphere IFC are involved in decoding temporal structure and that there is differential sensitivity to temporal structure among the constituent structures of the IFC. Although classification rates were high in both Broca’s area and its right-hemisphere homolog (BA 44 and 45), these regions showed differential sensitivity with higher classification rates in the left, as compared with the right, BA 44, and higher classification rates in the right, compared with the left, BA 45. Additional experimental manipulations will be needed to further delineate and better understand the relative contributions of various left and right hemisphere subregions of the IFC for processing of fine- and coarse-grained temporal structure.

Modular Versus Distributed Neural Substrates for Temporal Structure Processing and Syntactic Integration

In addition to the IFC, responses in several temporal lobe regions also distinguished between the same structural manipulation in music and speech. Classification accuracies greater than 85% were observed bilaterally in the anterior and posterior divisions of the STG and pMTG as well as the left PT and aMTG. Slightly lower accuracies (∼75%) were found in the temporal pole and PP in addition to HG. Again, it is noteworthy that fMRI signal strengths to the 2 acoustic stimuli were statistically similar in all regions of temporal lobe.

A common interpretation of prior findings has been that the processing of music and speech syntax is a modular phenomenon, with either IFC or anterior temporal regions underlying different processes (Caplan et al. 1998; Dapretto and Bookheimer 1999; Grodzinsky 2000; Ni et al. 2000; Maess et al. 2001; Martin 2003; Humphries et al. 2005). It is important to note, however, that the many studies that have arrived at this conclusion have often used dissimilar experimental manipulations, including different cognitive paradigms and stimulus types. We hypothesize that a common bilateral and distributed network including cortical, subcortical, brainstem, and cerebellar structures underlies the decoding of temporal structure (including syntax) in music and speech. This network is incompletely revealed when only the amplitude of fMRI signal changes are examined (Freeman et al. 2009). When the magnitude of fMRI signal change is the independent variable in studies of temporal structure processing, the (usually cortical) structures that are subsequently identified may primarily reflect large differences in the stimulus types and cognitive paradigms used to elicit brain responses. Consistent with this view, both anatomical and intrinsic functional connectivity analyses have provided evidence for strong coupling between the IFC, pSTS/STG, and anterior temporal cortex (Anwander et al. 2007; Frey et al. 2008; Friederici 2009; Petrides and Pandya 2009; Xiang et al. 2010). A compelling question for future research is how this connectivity differentially influences structure processing in music and speech.

“Low-Level” Auditory Regions and Temporal Structure Processing of Music and Speech

Auditory brainstem regions, including the inferior colliculus, superior olive, and cochlear nucleus, were among the brain areas that showed superthreshold levels of classification accuracies between normal and temporally reordered stimuli in this study. Historically, the brainstem has primarily been associated with only fine-grained temporal structure processing (Frisina 2001), but there is growing evidence to suggest that brainstem nuclei are sensitive to temporal structure over longer time scales underlying auditory perception (King et al. 2002; Wible et al. 2004; Banai et al. 2005, 2009; Krishnan et al. 2005; Russo et al. 2005; Johnson et al. 2007, 2008; Musacchia et al. 2007; Wong et al. 2007; Song et al. 2008). One possible interpretation of these brainstem findings is that they reflect corticofugal modulation of the incoming sensory stimulus by higher level auditory regions. The mammalian auditory system has robust top-down projections from the cortex which converge on the auditory brainstem (Webster 1992), and neurophysiological studies have shown that “top-down” information refines acoustic feature representation in the brainstem (Polley et al. 2006; Luo et al. 2008; Nahum et al. 2008; Song et al. 2008). Whether the auditory brainstem responses found in the present study arise from top-down corticofugal modulation or from intrinsic processing within specific nuclei that were not spatially resolved by the fMRI parameters employed here requires further investigation.

Broader Implications for the Study of Temporal Structure and Syntactic Processing in Music and Speech

A hallmark of communication in humans—through music or spoken language—is the meaningful temporal ordering of components in the auditory signal. Although natural languages differ considerably in the strictness of such ordering, there is no language (including visually signed languages) or musical system (other than 12 tone or “quasi-random” styles of 20th century experimental European music) that arranges components without ordering rules. The present study demonstrates the effectiveness of carefully controlled reordering paradigms for studying temporal structure in both music and speech, in addition to the more commonly used “oddball” or expectancy violation paradigms. The present study has focused on perturbations that disrupt sequential temporal order at approximately 350 ms segment lengths. An interesting question for future research is how the temporal granularity of these perturbations influences brain responses to music and speech.

In addition to disrupting the temporal ordering of events, the acoustical manipulations performed here also altered the rhythmic properties of the music and speech stimuli. In speech, the rhythmic pattern of syllables is thought to provide a critical temporal feature for speech understanding (Drullman et al. 1994; Shannon et al. 1995), and in music, rhythm is regarded as a primary building block of musical structure (Lerdahl and Jackendoff 1983; Dowling and Harwood 1986; Levitin 2002; Large 2008): rhythmic patterns set up expectations in the mind of the listener, which contribute to the temporal structure of phrases and entire compositions (Bernstein 1976; Huron 2006). Extant literature suggests that there is considerable overlap in the brain regions that track rhythmic elements in music and speech, although this question has never been directly tested. Both music and speech rhythm processing are thought to engage auditory cortical regions (Grahn and Brett 2007; Abrams et al. 2008, 2009; Chen et al. 2008; Geiser et al. 2008; Grahn and Rowe 2009), IFC (Schubotz et al. 2000; Snyder and Large 2005; Grahn and Brett 2007; Chen et al. 2008; Geiser et al. 2008; Fujioka et al. 2009; Grahn and Rowe 2009), supplementary motor and premotor areas (Schubotz et al. 2000; Grahn and Brett 2007; Chen et al. 2008; Geiser et al. 2008; Grahn and Rowe 2009), the insula and basal ganglia (Grahn and Brett 2007; Geiser et al. 2008). The cerebellum is thought to play a fundamental role in the processing of musical rhythm (Grahn and Brett 2007; Chen et al. 2008; Grahn and Rowe 2009), and a recent article proposes a prominent role for the cerebellum in the processing of speech rhythm (Kotz and Schwartze 2010). Many of the brain structures associated with music and speech rhythm processing—notably auditory cortex, IFC, the insula and cerebellum—were also identified in the MPA in our study, which may reflect differential processing of rhythmic properties between music and speech.

Comparisons between music and language are necessarily imperfect because music lacks external referents and is considered to be primarily self-referential (Meyer 1956; Culicover 2005), while language generally has specific referents The present study examined temporal structure by comparing brain responses with the same manipulations of temporal structure in music and speech. The granularity of temporal reordering attempted to control for semantic processing at the word level, but long-range semantic integration remains an issue, since there are structures in the human brain that respond to differences in speech intelligibility (Scott et al. 2000; Leff et al. 2008; Okada et al. 2010), and these do not have an obvious musical counterpart. Differences in intelligibility and meaning across stimulus classes are unavoidable in studies directly comparing naturalistic music and speech processing, and more experimental work will be necessary to fully comprehend the extent to which such issues may directly or indirectly contribute to the processing differences uncovered here.

Funding

National Institutes of Health (National Research Service Award fellowship to D.A.A.); National Science Foundation (BCS0449927 to V.M. and D.J.L.); Natural Sciences and Engineering Research Council of Canada (223210 to D.J.L., 298612 to E.B.); Canada Foundation for Innovation (9908 to E.B.).

We thank Jason Hom for assistance with data acquisition and Kaustubh Supekar for help with analysis software. Conflict of Interest: None declared.

References

Abrams

DA

,

Nicol

T

,

Zecker

S

,

Kraus

N

.

Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech

,

J Neurosci

,

2008

, vol.

28

(pg.

3958

-

3965

)

Abrams

DA

,

Nicol

T

,

Zecker

S

,

Kraus

N

.

Abnormal cortical processing of the syllable rate of speech in poor readers

,

J Neurosci

,

2009

, vol.

29

(pg.

7686

-

7693

)

Anwander

A

,

Tittgemeyer

M

,

von Cramon

DY

,

Friederici

AD

,

Knosche

TR

.

Connectivity-based parcellation of Broca’s area

,

Cereb Cortex

,

2007

, vol.

17

(pg.

816

-

825

)

Banai

K

,

Hornickel

J

,

Skoe

E

,

Nicol

T

,

Zecker

S

,

Kraus

N

.

Reading and subcortical auditory function

,

Cereb Cortex

,

2009

, vol.

19

(pg.

2699

-

2707

)

Banai

K

,

Nicol

T

,

Zecker

SG

,

Kraus

N

.

Brainstem timing: implications for cortical processing and literacy

,

J Neurosci

,

2005

, vol.

25

(pg.

9850

-

9857

)

Bernstein

L

.

The unanswered question: six talks at Harvard

,

Charles Eliot Norton lectures

,

1976

Cambridge (MA)

Harvard University Press

p. 53–115

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Binder

JR

,

Desai

RH

,

Graves

WW

,

Conant

LL

.

Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies

,

Cereb Cortex

,

2009

, vol.

19

(pg.

2767

-

2796

)

Bookheimer

SY

.

Functional MRI of language: new approaches to understanding the cortical organization of semantic processing

,

Annu Rev Neurosci

,

2002

, vol.

25

(pg.

151

-

188

)

Brown

D

. ,

Human universals

,

1991

New York

McGraw-Hill

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Brown

S

,

Martinez

MJ

,

Parsons

LM

.

Music and language side by side in the brain: a PET study of the generation of melodies and sentences

,

Eur J Neurosci

,

2006

, vol.

23

(pg.

2791

-

2803

)

Callan

DE

,

Tsytsarev

V

,

Hanakawa

T

,

Callan

AM

,

Katsuhara

M

,

Fukuyama

H

,

Turner

R

.

Song and speech: brain regions involved with perception and covert production

,

Neuroimage

,

2006

, vol.

31

(pg.

1327

-

1342

)

Caplan

D

,

Alpert

N

,

Waters

G

.

Effects of syntactic structure and propositional number on patterns of regional cerebral blood flow

,

J Cogn Neurosci

,

1998

, vol.

10

(pg.

541

-

552

)

Chen

JL

,

Penhune

VB

,

Zatorre

RJ

.

Listening to musical rhythms recruits motor regions of the brain

,

Cereb Cortex

,

2008

, vol.

18

(pg.

2844

-

2854

)

Conard

NJ

,

Malina

M

,

Munzel

SC

.

New flutes document the earliest musical tradition in southwestern Germany

,

Nature

,

2009

, vol.

460

(pg.

737

-

740

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Critchley

HD

,

Mathias

CJ

,

Josephs

O

,

O'Doherty

J

,

Zanini

S

,

Dewar

B-K

,

Cipolotti

L

,

Shallice

T

,

Dolan

RJ

.

Human cingulate cortex and autonomic control: converging neuroimaging and clinical evidence

,

Brain

,

2003

, vol.

126

(pg.

2139

-

2152

)

Culicover

PW

.

Linguistics, cognitive science, and all that jazz

,

Linguist Rev

,

2005

, vol.

22

(pg.

227

-

248

)

Google Scholar

Crossref

WorldCat

Dapretto

M

,

Bookheimer

SY

.

Form and content: dissociating syntax and semantics in sentence comprehension

,

Neuron

,

1999

, vol.

24

(pg.

427

-

432

)

Derogatis

LR

. ,

SCL-90-R: administration, scoring, and procedures manual–II

,

1992

Baltimore (MD)

Clinical Psychometric Research

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Dowling

WJ

,

Harwood

DL

. ,

Music cognition

,

1986

Orlando (FL)

Academic Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Drullman

R

,

Festen

JM

,

Plomp

R

.

Effect of temporal envelope smearing on speech reception

,

J Acoust Soc Am

,

1994

, vol.

95

(pg.

1053

-

1064

)

Duvernoy

HM

.

The Human Brain Stem and Cerebellum: Surface, Structure, Vascularization, and Three-Dimensional Sectional Anatomy with MRI

,

1995

New York

Springer-Verlag

Google Scholar

Duvernoy

HM

,

Bourgouin

P

,

Cabanis

EA

,

Cattin

F

.

The Human Brain: Functional Anatomy, Vascularization and Serial Sections with MRI

,

1999

New York: Springer

OpenURL Placeholder Text

WorldCat

Eickhoff

SB

,

Stephan

KE

,

Mohlberg

H

,

Grefkes

C

,

Fink

GR

,

Amunts

K

,

Zilles

K

.

A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data

,

Neuroimage

,

2005

, vol.

25

(pg.

1325

-

1335

)

Freeman

WJ

,

Ahlfors

SP

,

Menon

V

.

Combining fMRI with EEG and MEG in order to relate patterns of brain activity to cognition

,

Int J Psychophysiol

,

2009

, vol.

73

(pg.

43

-

52

)

Frey

S

,

Campbell

JS

,

Pike

GB

,

Petrides

M

.

Dissociating the human language pathways with high angular resolution diffusion fiber tractography

,

J Neurosci

,

2008

, vol.

28

(pg.

11435

-

11444

)

Friederici

AD

.

Pathways to language: fiber tracts in the human brain

,

Trends Cogn Sci

,

2009

, vol.

13

(pg.

175

-

181

)

Friederici

AD

,

Bahlmann

J

,

Heim

S

,

Schubotz

RI

,

Anwander

A

.

The brain differentiates human and non-human grammars: functional localization and structural connectivity

,

Proc Natl Acad Sci U S A

,

2006

, vol.

103

(pg.

2458

-

2463

)

Frisina

RD

.

Subcortical neural coding mechanisms for auditory temporal processing

,

Hear Res

,

2001

, vol.

158

(pg.

1

-

27

)

Fujioka

T

,

Trainor

LJ

,

Large

EW

,

Ross

B

.

Beta and gamma rhythms in human auditory cortex during musical beat processing

,

Ann N Y Acad Sci

,

2009

, vol.

1169

(pg.

89

-

92

)

Geiser

E

,

Zaehle

T

,

Jancke

L

,

Meyer

M

.

The neural correlate of speech rhythm as evidenced by metrical speech processing

,

J Cogn Neurosci

,

2008

, vol.

20

(pg.

541

-

552

)

Glover

GH

,

Lai

S

.

Self-navigated spiral fMRI: interleaved versus single-shot

,

Magn Reson Med

,

1998

, vol.

39

(pg.

361

-

368

)

Grahn

JA

,

Brett

M

.

Rhythm and beat perception in motor areas of the brain

,

J Cogn Neurosci

,

2007

, vol.

19

(pg.

893

-

906

)

Grahn

JA

,

Rowe

JB

.

Feeling the beat: premotor and striatal interactions in musicians and nonmusicians during beat perception

,

J Neurosci

,

2009

, vol.

29

(pg.

7540

-

7548

)

Greicius

MD

,

Krasnow

B

,

Reiss

AL

,

Menon

V

.

Functional connectivity in the resting brain: a network analysis of the default mode hypothesis

,

Proc Natl Acad Sci U S A

,

2003

, vol.

100

(pg.

253

-

258

)

Grodzinsky

Y

.

The neurology of syntax: language use without Broca's area

,

Behav Brain Sci

,

2000

, vol.

23

(pg.

1

-

21

)

Grodzinsky

Y

,

Friederici

AD

.

Neuroimaging of syntax and syntactic processing

,

Curr Opin Neurobiol

,

2006

, vol.

16

(pg.

240

-

246

)

Haynes

JD

,

Rees

G

.

Decoding mental states from brain activity in humans

,

Nat Rev Neurosci

,

2006

, vol.

7

(pg.

523

-

534

)

Haynes

JD

,

Sakai

K

,

Rees

G

,

Gilbert

S

,

Frith

C

,

Passingham

RE

.

Reading hidden intentions in the human brain

,

Curr Biol

,

2007

, vol.

17

(pg.

323

-

328

)

Humphries

C

,

Love

T

,

Swinney

D

,

Hickok

G

.

Response of anterior temporal cortex to syntactic and prosodic manipulations during sentence processing

,

Hum Brain Mapp

,

2005

, vol.

26

(pg.

128

-

138

)

Huron

D

. ,

Sweet anticipation: music and the psychology of expectation

,

2006

Cambridge (MA)

MIT Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Janata

P

.

Brain networks that track musical structure

,

Ann N Y Acad Sci

,

2005

, vol.

1060

(pg.

111

-

124

)

Johnson

KL

,

Nicol

T

,

Zecker

SG

,

Kraus

N

.

Developmental plasticity in the human auditory brainstem

,

J Neurosci

,

2008

, vol.

28

(pg.

4000

-

4007

)

Johnson

KL

,

Nicol

TG

,

Zecker

SG

,

Kraus

N

.

Auditory brainstem correlates of perceptual timing deficits

,

J Cogn Neurosci

,

2007

, vol.

19

(pg.

376

-

385

)

Kim

SH

,

Adalsteinsson

E

,

Glover

GH

,

Spielman

S

.

SVD regularization algorithm for improved high-order shimming

,

Proceedings of the 8th Annual Meeting of ISMRM, Denver

,

2000

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

King

C

,

Warrier

CM

,

Hayes

E

,

Kraus

N

.

Deficits in auditory brainstem pathway encoding of speech sounds in children with learning problems

,

Neurosci Lett

,

2002

, vol.

319

(pg.

111

-

115

)

Koelsch

S

.

Neural substrates of processing syntax and semantics in music

,

Curr Opin Neurobiol

,

2005

, vol.

15

(pg.

207

-

212

)

Koelsch

S

,

Gunter

TC

,

von Cramon

DY

,

Zysset

S

,

Lohmann

G

,

Friederici

AD

.

Bach speaks: a cortical “language-network” serves the processing of music

,

Neuroimage

,

2002

, vol.

17

(pg.

956

-

966

)

Kotz

SA

,

Schwartze

M

.

Cortical speech processing unplugged: a timely subcortico-cortical framework

,

Trends Cogn Sci

,

2010

, vol.

14

(pg.

392

-

399

)

Kriegeskorte

N

,

Goebel

R

,

Bandettini

P

.

Information-based functional brain mapping

,

Proc Natl Acad Sci U S A

,

2006

, vol.

103

(pg.

3863

-

3868

)

Krishnan

A

,

Xu

Y

,

Gandour

J

,

Cariani

P

.

Encoding of pitch in the human brainstem is sensitive to language experience

,

Brain Res Cogn Brain Res

,

2005

, vol.

25

(pg.

161

-

168

)

Large

EW

.

Grondin

S

.

Resonating to musical rhythm: theory and experiment

,

The psychology of time

,

2008

Bingley (UK)

Emerald

(pg.

189

-

231

)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Leff

AP

,

Schofield

TM

,

Stephan

KE

,

Crinion

JT

,

Friston

KJ

,

Price

CJ

.

The cortical dynamics of intelligible speech

,

J Neurosci

,

2008

, vol.

28

(pg.

13209

-

13215

)

Lerdahl

F

,

Jackendoff

R

. ,

A generative theory of tonal music

,

1983

Cambridge (MA)

MIT Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Levitin

DJ

.

Levitin

DJ

.

Memory for musical attributes

,

Foundations of cognitive psychology

,

2002

Cambridge (MA)

MIT Press

(pg.

295

-

310

)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Levitin

DJ

,

Menon

V

.

Musical structure is processed in “language” areas of the brain: a possible role for Brodmann area 47 in temporal coherence

,

Neuroimage

,

2003

, vol.

20

(pg.

2142

-

2152

)

Levitin

DJ

,

Menon

V

.

The neural locus of temporal structure and expectancies in music: evidence from functional neuroimaging at 3 Tesla

,

Music Percept

,

2005

, vol.

22

(pg.

563

-

575

)

Google Scholar

Crossref

WorldCat

Luo

F

,

Wang

Q

,

Kashani

A

,

Yan

J

.

Corticofugal modulation of initial sound processing in the brain

,

J Neurosci

,

2008

, vol.

28

(pg.

11615

-

11621

)

Maess

B

,

Koelsch

S

,

Gunter

TC

,

Friederici

AD

.

Musical syntax is processed in Broca’s area: an MEG study

,

Nat Neurosci

,

2001

, vol.

4

(pg.

540

-

545

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Mai

JK

,

Assheur

J

,

Paxinos

G

.

Atlas of the Human Brain. Amsterdam: Elsevier

,

2004

OpenURL Placeholder Text

WorldCat

Makuuchi

M

,

Bahlmann

J

,

Anwander

A

,

Friederici

AD

.

Segregating the core computational faculty of human language from working memory

,

Proc Natl Acad Sci U S A

,

2009

, vol.

106

(pg.

8362

-

8367

)

Martin

RC

.

Language processing: functional organization and neuroanatomical basis

,

Annu Rev Psychol

,

2003

, vol.

54

(pg.

55

-

89

)

Menon

V

,

Levitin

DJ

.

The rewards of music listening: response and physiological connectivity of the mesolimbic system

,

Neuroimage

,

2005

, vol.

28

(pg.

175

-

184

)

Meyer

L

. ,

Emotion and meaning in music

,

1956

Chicago (IL)

University of Chicago Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Morrison

SJ

,

Demorest

SM

,

Aylward

EH

,

Cramer

SC

,

Maravilla

KR

.

fMRI investigation of cross-cultural music comprehension

,

Neuroimage

,

2003

, vol.

20

(pg.

378

-

384

)

Muhlau

M

,

Rauschecker

JP

,

Oestreicher

E

,

Gaser

C

,

Rottinger

M

,

Wohlschlager

AM

,

Simon

F

,

Etgen

T

,

Conrad

B

,

Sander

D

.

Structural brain changes in tinnitus

,

Cereb Cortex

,

2006

, vol.

16

(pg.

1283

-

1288

)

Muller

KR

,

Mika

S

,

Ratsch

G

,

Tsuda

K

,

Scholkopf

B

.

An introduction to kernel-based learning algorithms

,

IEEE Trans Neural Netw

,

2001

, vol.

12

(pg.

181

-

201

)

Musacchia

G

,

Sams

M

,

Skoe

E

,

Kraus

N

.

Musicians have enhanced subcortical auditory and audiovisual processing of speech and music

,

Proc Natl Acad Sci U S A

,

2007

, vol.

104

(pg.

15894

-

15898

)

Nahum

M

,

Nelken

I

,

Ahissar

M

.

Low-level information and high-level perception: the case of speech in noise

,

PLoS Biol

,

2008

, vol.

6

pg.

e126

Newman

AJ

,

Pancheva

R

,

Ozawa

K

,

Neville

HJ

,

Ullman

MT

.

An event-related fMRI study of syntactic and semantic violations

,

J Psycholinguist Res

,

2001

, vol.

30

(pg.

339

-

364

)

Ni

W

,

Constable

RT

,

Mencl

WE

,

Pugh

KR

,

Fulbright

RK

,

Shaywitz

SE

,

Shaywitz

BA

,

Gore

JC

,

Shankweiler

D

.

An event-related neuroimaging study distinguishing form and content in sentence processing

,

J Cogn Neurosci

,

2000

, vol.

12

(pg.

120

-

133

)

Okada

K

,

Rong

F

,

Venezia

J

,

Matchin

W

,

Hsieh

IH

,

Saberi

K

,

Serences

JT

,

Hickok

G

.

Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech

,

Cereb Cortex

,

2010

, vol.

20

(pg.

2486

-

2495

)

Patel

AD

.

Language, music, syntax and the brain

,

Nat Neurosci

,

2003

, vol.

6

(pg.

674

-

681

)

Patel

AD

. ,

Music, language, and the brain

,

2008

Oxford

Oxford University Press

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Pereira

F

,

Mitchell

T

,

Botvinick

M

.

Machine learning classifiers and fMRI: a tutorial overview

,

Neuroimage

,

2009

, vol.

45

(pg.

S199

-

S209

)

Petrides

M

,

Pandya

DN

.

Distinct parietal and temporal pathways to the homologues of Broca's area in the monkey

,

PLoS Biol

,

2009

, vol.

7

pg.

e1000170

Poline

J-B

,

Worsley

KJ

,

Evans

AC

,

Friston

KJ

.

Combining spatial extent and peak intensity to test for activations in functional imaging

,

Neuroimage

,

1997

, vol.

5

(pg.

83

-

96

)

Polley

DB

,

Steinberg

EE

,

Merzenich

MM

.

Perceptual learning directs auditory cortical map reorganization through top-down influences

,

J Neurosci

,

2006

, vol.

26

(pg.

4970

-

4982

)

Roskies

AL

,

Fiez

JA

,

Balota

DA

,

Raichle

ME

,

Petersen

SE

.

Task-dependent modulation of regions in the left inferior frontal cortex during semantic processing

,

J Cogn Neurosci

,

2001

, vol.

13

(pg.

829

-

843

)

Russo

NM

,

Nicol

TG

,

Zecker

SG

,

Hayes

EA

,

Kraus

N

.

Auditory training improves neural timing in the human brainstem

,

Behav Brain Res

,

2005

, vol.

156

(pg.

95

-

103

)

Ryali

S

,

Supekar

K

,

Abrams

DA

,

Menon

V

.

Sparse logistic regression for whole-brain classification of fMRI data

,

Neuroimage

,

2010

, vol.

51

(pg.

752

-

764

)

Schubotz

RI

,

Friederici

AD

,

von Cramon

DY

.

Time perception and motor timing: a common cortical and subcortical basis revealed by fMRI

,

Neuroimage

,

2000

, vol.

11

(pg.

1

-

12

)

Schwarzlose

RF

,

Swisher

JD

,

Dang

S

,

Kanwisher

N

.

The distribution of category and location information across object-selective regions in human visual cortex

,

Proc Natl Acad Sci U S A

,

2008

, vol.

105

(pg.

4447

-

4452

)

Scott

SK

,

Blank

CC

,

Rosen

S

,

Wise

RJ

.

Identification of a pathway for intelligible speech in the left temporal lobe

,

Brain

,

2000

, vol.

123

Pt 12

(pg.

2400

-

2406

)

Shannon

RV

,

Zeng

FG

,

Kamath

V

,

Wygonski

J

,

Ekelid

M

.

Speech recognition with primarily temporal cues

,

Science

,

1995

, vol.

270

(pg.

303

-

304

)

Smith

SM

,

Jenkinson

M

,

Woolrich

MW

,

Beckmann

CF

,

Behrens

TE

,

Johansen-Berg

H

,

Bannister

PR

,

De Luca

M

,

Drobnjak

I

,

Flitney

DE

, et al.

Advances in functional and structural MR image analysis and implementation as FSL

,

Neuroimage

,

2004

, vol.

23

Suppl 1

(pg.

S208

-

S219

)

Snyder

JS

,

Large

EW

.

Gamma-band activity reflects the metric structure of rhythmic tone sequences

,

Brain Res Cogn Brain Res

,

2005

, vol.

24

(pg.

117

-

126

)

Song

JH

,

Skoe

E

,

Wong

PC

,

Kraus

N

.

Plasticity in the adult human auditory brainstem following short-term linguistic training

,

J Cogn Neurosci

,

2008

, vol.

20

(pg.

1892

-

1902

)

Todd

DM

,

Deane

FP

,

McKenna

PA

.

Appropriateness of SCL-90-R adolescent and adult norms for outpatient and nonpatient college students

,

J Couns Psychol

,

1997

, vol.

44

(pg.

294

-

301

)

Google Scholar

Crossref

WorldCat

Tzourio-Mazoyer

N

,

Landeau

B

,

Papathanassiou

D

,

Crivello

F

,

Etard

O

,

Delcroix

N

,

Mazoyer

B

,

Joliot

M

.

Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

,

Neuroimage

,

2002

, vol.

15

(pg.

273

-

289

)

Various

,

Great speeches of the 20th century

,

1991

Los Angeles (CA)

Rhino Records

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Wagner

AD

,

Paré-Blagoev

EJ

,

Clark

J

,

Poldrack

RA

.

Recovering meaning: left prefrontal cortex guides controlled semantic retrieval

,

Neuron

,

2001

, vol.

31

(pg.

329

-

338

)

Webster

DB

.

Popper

AN

,

Fay

RR

.

An overview of the mammalian auditory pathways with an emphasis on humans

,

The mammalian auditory pathway: neuroanatomy

,

1992

New York

Springer-Verlag

(pg.

1

-

22

)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Wible

B

,

Nicol

T

,

Kraus

N

.

Atypical brainstem representation of onset and formant structure of speech sounds in children with language-based learning problems

,

Biol Psychol

,

2004

, vol.

67

(pg.

299

-

317

)

Wong

PC

,

Skoe

E

,

Russo

NM

,

Dees

T

,

Kraus

N

.

Musical experience shapes human brainstem encoding of linguistic pitch patterns

,

Nat Neurosci

,

2007

, vol.

10

(pg.

420

-

422

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Xiang

HD

,

Fonteijn

HM

,

Norris

DG

,

Hagoort

P

.

Topographical functional connectivity pattern in the perisylvian language networks

,

Cereb Cortex

,

2010

, vol.

20

(pg.

549

-

560

)

Download all slides

Month:	Total Views:
November 2016	3
December 2016	3
January 2017	4
February 2017	26
March 2017	30
April 2017	21
May 2017	18
June 2017	37
July 2017	16
August 2017	13
September 2017	8
October 2017	28
November 2017	15
December 2017	74
January 2018	81
February 2018	32
March 2018	60
April 2018	60
May 2018	67
June 2018	57
July 2018	39
August 2018	76
September 2018	38
October 2018	31
November 2018	73
December 2018	31
January 2019	45
February 2019	34
March 2019	59
April 2019	60
May 2019	39
June 2019	47
July 2019	21
August 2019	32
September 2019	21
October 2019	51
November 2019	20
December 2019	39
January 2020	28
February 2020	25
March 2020	38
April 2020	86
May 2020	30
June 2020	34
July 2020	27
August 2020	34
September 2020	44
October 2020	33
November 2020	50
December 2020	18
January 2021	33
February 2021	45
March 2021	79
April 2021	76
May 2021	46
June 2021	29
July 2021	25
August 2021	14
September 2021	28
October 2021	43
November 2021	39
December 2021	45
January 2022	47
February 2022	40
March 2022	52
April 2022	70
May 2022	28
June 2022	28
July 2022	40
August 2022	29
September 2022	68
October 2022	96
November 2022	66
December 2022	43
January 2023	38
February 2023	23
March 2023	37
April 2023	71
May 2023	1,836
June 2023	36
July 2023	44
August 2023	36
September 2023	38
October 2023	33
November 2023	62
December 2023	39
January 2024	147
February 2024	47
March 2024	47
April 2024	26

Article Contents

Decoding Temporal Structure in Music and Speech Relies on Shared Brain Resources but Elicits Different Fine-Scale Spatial Patterns

Abstract

Introduction

Materials and Methods

Participants

Stimuli

Stimulus Selection

Rationale for Stimulus Manipulation

Stimulus Creation

fMRI Task

Postscan Assessments

fMRI Data Acquisition

fMRI Data Analysis

Preprocessing

Quality Control

Univariate Statistical Analysis

MPA

Interpretation of Multivariate Pattern Analysis

Anatomical ROIs

Post hoc ROI Analysis

Physiological Data Acquisition and Analysis

Acquisition

Preprocessing and Artifact Removal

Analysis

Results

Physiological and Behavioral Analyses

Activation and Deactivation during Music and Speech Processing

Structure Processing in Music Versus Speech—Univariate Analysis

Structure Processing in Music Versus Speech—MPA

Structure Processing in Music Versus Speech—Signal Levels in ROIs with High Classification Rates

Discussion

IFC Involvement in Processing Temporally Manipulated Music and Speech Stimuli

Modular Versus Distributed Neural Substrates for Temporal Structure Processing and Syntactic Integration

“Low-Level” Auditory Regions and Temporal Structure Processing of Music and Speech

Broader Implications for the Study of Temporal Structure and Syntactic Processing in Music and Speech

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only