Jokes often take the form of a setup question, followed after a short delay by a punch line: for instance, “What do you get when you cross a parrot with a centipede?” A brief suspenseful pause allows the audience to imagine the issue of this unlikely match. The punch line “a walkie-talkie” is surprising but also fits in a clever and unexpected way. Prominent models commonly postulate that the punch line is appreciated as funny in two stages of integration: In the first stage, the punch line is perceived as incongruous with the setup, and in the second, further consideration establishes a deeper coherence with the preceding context (Brownell, Michel, Powelson, & Gardner, 1983; Ramachandran, 1998; Suls, 1972; Veatch, 1998; Wyer & Collins, 1992). Thus, amusement may result when a “twist” in the meaning is successfully incorporated into the preceding context—that is, when the initial incongruity is resolved by reinterpreting or “frame shifting” (Coulson, 2001).

Despite its important role in modulating social dynamics (Black, 1984; Kane, Suls, & Tedeschi, 1977), there have been relatively few investigations into the neural basis of humor. Lesion evidence suggests that patients with right hemisphere (RH) lesions, particularly in right prefrontal areas, are impaired in appreciating humor (Shammi & Stuss, 1999). When faced with a choice of captions, such patients tend to prefer non sequitur endings over funny ones (Wapner, Hamby, & Gardner, 1981). While their ability to recognize a joke form by detecting a surprise element is intact, the RH patients are impaired at integrating the punch line with the preceding sentence (Bihrle, Brownell, Powelson, & Gardner, 1986; Brownell et al., 1983; Winner, Brownell, Happe, Blum, & Pincus, 1998). In contrast, left hemisphere (LH) lesion patients are impaired in detecting incongruity but are able to establish coherence with the preceding storyline (Bihrle et al., 1986). Furthermore, lesion studies suggest that the RH plays an important role in higher-order language functions (Sidtis, 2006). Even though RH-damaged patients have largely intact language faculties and are able to follow the story narrative well (Rehak et al., 1992), they are impaired in making inferences (Beeman, 1993; Brownell, Potter, Bihrle, & Gardner, 1986) and interpreting nonliteral language (Kaplan, Brownell, Jacobs, & Gardner, 1990) and metaphors (Brownell, Simpson, Bihrle, Potter, & Gardner, 1990) (but see Giora, Zaidel, Soroker, Batori, & Kasher, 2000). Thus, although not all studies have found clear laterality differences (Gardner, Ling, Flamm, & Silverman, 1975), lesion evidence has provided the neural framework for two-stage models of humor comprehension: The LH contributes to incongruity detection, whereas the RH is essential for its resolution and for humor appreciation.

These hypotheses have been tested using fMRI in response to a variety of humorous materials. Goel and Dolan (2001) reported activations in bilateral temporal areas to semantic jokes and left frontotemporal areas to puns, while pleasurable affect was associated with activity in the medioventral prefrontal region, indicating its involvement in the emotional aspect of joke appreciation. Ozawa et al. (2000) also reported left frontotemporal activity to spoken sentences that were rated as funny. Studies using funny cartoons and comedy videos reported activity in the temporal and prefrontal regions (Bartolo, Benuzzi, Nocetti, Baraldi, & Nichelli, 2006; Mobbs, Greicius, Abdel-Azim, Menon, & Reiss, 2003; Moran, Wig, Adams, Janata, & Kelley, 2004; Wild et al., 2006), with amygdala contributions to the emotional aspects of mirth. Although most studies have reported prefrontal and temporal activations, their laterality and spatial configuration are inconsistent. Furthermore, due to its poor temporal resolution, the BOLD signal may not be able to resolve the temporal stages of joke processing that underlie initially understanding a punch line, detecting an ambiguity or “twist” in the meaning, and establishing coherence with the preceding context.

Event-related potentials (ERP) have excellent temporal resolution and have been used extensively in studies of language functions (Kutas & Federmeier, 2000; Van Petten & Luka, 2006). A negative deflection peaking at ~400 ms (N400) is evoked by potentially meaningful material and has been interpreted as an attempt to access and integrate a semantic representation into a current context (Brown & Hagoort, 1993; Friederici, 1997; Halgren, 1990; Holcomb, 1993; Kutas & Federmeier, 2000; Van Petten & Luka, 2006). The N400 is sensitive to lexical, semantic, and mnemonic aspects in the context of single words, sentences, and discourse, and it reflects semantic fit with general world knowledge (Chwilla & Kolk, 2005; Friederici, 2004; Hagoort & van Berkum, 2007; Kuperberg, 2007). N400-like activity, as measured with ERPs and magnetoencephalography (MEG), is readily elicited by incongruity with the preceding context (Halgren et al., 2002; Helenius, Salmelin, Service, & Connolly, 1998; Kutas & Van Petten, 1994; Lau, Phillips, & Poeppel, 2008; Marinkovic, 2004; Osterhout & Holcomb, 1995). In the context of joke processing, the N400 could potentially reflect the stage of detecting a “twist,” or an incongruity of the punch line with the preceding context. A late positivity often follows the N400 with sentence stimuli. This positivity is termed the P600, and it is observed to syntactic anomalies (Friederici, Hahne, & Saddy, 2002; Hagoort, Brown, & Groothusen, 1993) and to semantic ambiguities that need to be resolved; it is sensitive to the sentence plausibility, structure, and discourse complexity (Friederici et al., 2002; Kaan & Swaab, 2003; Kuperberg, Sitnikova, Caplan, & Holcomb, 2003; Osterhout, Holcomb, & Swinney, 1994). Possibly reflecting different functional components, the P600 has been proposed to index activity of the combinatorial stream of language processing that is sensitive to both morphosyntactic and semantic anomalies (Kuperberg, 2007), but it could also reveal engagement of a conflict-monitoring process within the executive system (Kolk & Chwilla, 2007; Vissers, Chwilla, & Kolk, 2006). In the context of humor comprehension, the P600 might reflect resolving the incongruity of the punch line and establishing coherence with the preceding context.

Coulson and colleagues measured ERPs to jokes in a series of studies. They observed N400 effects that were dependent on the level of sentence constraint and levels of joke comprehension (Coulson & Kutas, 2001), as well as on handedness and gender (Coulson & Lovett, 2004). A subsequent positivity observed 500–900 ms after stimulus onset was larger to jokes overall, but its distribution and laterality varied as a function of gender and handedness (Coulson & Lovett, 2004) and level of joke comprehension (Coulson & Kutas, 2001). When the ERPs were time-locked to single probe words that were related to jokes and not to control sentences, a smaller N400 was observed to joke-relevant probes, whereas a subsequent anterior positivity 700–900 ms after stimulus onset was larger to unrelated probes (Coulson & Wu, 2005).

Taken together, the fMRI studies suggest that joke comprehension is subserved by a distributed neural system encompassing temporal and prefrontal regions, whereas the ERP studies indicate that detection of ambiguity and establishment of coherence may engage both the N400 and P600-like processes. However, the underlying neural substrate and the timing of these stages remain unclear. In order to examine where the humor-specific brain activations are occurring, and to understand the temporal sequence (when) of the involved neural components, a method with good temporal resolution and reasonable spatial estimates is needed. To this end, we employed the anatomically constrained magnetoencephalography method (aMEG), which reflects neural activity with millisecond precision. This methodology combines whole-head high-density MEG and a distributed source-modeling approach with high-resolution structural MRI and cortical reconstruction in an effort to estimate the anatomical distribution of the underlying neural networks in a time-sensitive manner (Dale et al., 2000; Dale & Sereno, 1993).

The main goal of this study was to describe the spatiotemporal characteristics of joke-specific processing in healthy subjects. In particular, we wished to examine the timing and the estimated neural substrate of the incongruity detection and resolution/coherence stages, as proposed by two-stage models of humor processing (Brownell et al., 1983; Ramachandran, 1998; Suls, 1972; Veatch, 1998; Wyer & Collins, 1992). We hypothesized that the ambiguity of the funny punch lines would be detected during the lexical–semantic stage and would evoke a larger magnetic equivalent of the N400 (N400m). We specifically manipulated this stage by using incongruous, nonsensical responses to setup questions that are known to evoke a large N400m (Halgren et al., 2002; Helenius et al., 1998; Kutas & Van Petten, 1994). Setup questions followed by logical, congruous responses that were neither funny nor surprising served as a baseline control condition. Because humor appreciation relies on resolving ambiguities and establishing coherence with the preceding joke setup question, funny punch lines were used to probe the resolution/coherence stage. We hypothesized that the activity subsequent to N400m, potentially analogous to the P600, would be largest to funny punch lines as their ambiguous meaning was resolved and integrated with the preceding setup question. Thus, the “funny” condition manipulated the elements of surprise/ambiguity and coherence. Sentences with congruous endings were not surprising but were coherent with the preceding context (the “congruous” condition). Incongruous endings were surprising but were difficult to integrate meaningfully with the preceding stem (the “incongruous” condition). We were especially interested in contributions from the prefrontal and temporal regions because of their critical involvement in semantic processing, as indicated by N400m (Halgren et al., 2002; Helenius et al., 1998; Marinkovic, 2004), and because of the importance of right prefrontal areas for humor appreciation (Brownell et al., 1983; Shammi & Stuss, 1999), ambiguity resolution, and the integration of remote associations into a coherent representation (Bookheimer, 2002).

Method

Subjects

Whole-head MEG recordings and structural MRI scans were obtained from 10 healthy, right-handed male subjects between 22 and 30 years of age (M ± SEM = 25.1 ± 0.7). No subjects had any neurological or other impairments, and all were medication-free at the time of testing. The subjects signed statements of consent and were reimbursed for their participation in the study.

Task

Subjects viewed question sentences followed by a punch line on a screen in front of them. They rated each set as either “funny” or “not funny” after a response cue following the punch line. Subjects were asked to use the index and middle fingers of one hand to indicate their responses. Response hands were switched in the middle of the experiment. Four subjects started with their right hand and 6 started with their left hand, in a random order. The sentence words were individually presented on a black background at a fast pace (240-ms word duration, 390 ms onset to onset). A punch line (240-ms duration) followed 2.1 s after the last word of a question. A response cue was presented 1.83 s after the onset of the punch line, and the cue indicating the start of the next sentence commenced 2.4 s later. This timing was intended to emulate the pattern of presenting jokes in everyday life; when telling a joke, it is common to ask a question, wait for a moment, and then give the punch line. The pause induces anticipation and enhances emphasis.

Three types of sentences, 80 of each, were presented randomly. In an effort to use naturalistic stimuli, a large number of jokes were gathered from published collections, as well as from internet sources. The jokes were selected for the funny condition if they appeared funny but were not offensive along the ethnic or gender dimensions, and if they ended with short punch lines—for example, “What is the soft, gooshy stuff between King Kong’s toes? Slow natives.” Most of the punch lines were puns—that is, they were phonologically related to the preceding stem (e.g., “How do you become an executioner? Just axe.”) The congruous sentences were obtained by replacing the punch lines of the existing jokes with endings that were not funny or surprising and that provided a logical answer to the question (e.g., “What did the artist say when he was convicted of murder? I’m innocent!”) The incongruous sentences were similarly obtained by modifying a punch line of an existing joke to make it surprising and nonsensical—for example, “What’s the best way to pass a math test? Cloudy.” Stimuli included in each condition (80 sentences in each) were selected from a larger set based on congruency ratings obtained from five independent raters. The punch lines were equated across conditions for the number of words, including articles (M ± SD: funny, 1.8 ± 0.6; cong., 1.8 ± 0.65; inc., 1.7 ± 0.68); for the number of syllables (funny, 2.6 ± 0.9; cong., 2.6 ± 0.9; inc., 3.1 ± 1.0); and for the number of letters (funny, 9.1 ± 2.6; cong., 7.7 ± 2.3; inc., 9.0 ± 2.1). The three ending types did not differ in frequency of usage whenever it was possible to ascertain this from published norms (Kucera & Francis, 1967), and their means ± standard deviations were as follows: funny, 112.6 ± 321; congruous, 95.9 ± 206; incongruous, 84.8 ± 124. Seven additional individuals rated the punch lines on 1–5 scales for “goodness of fit” or plausibility (i.e., whether the ending made sense in the context) and for “surprise/expectancy” (i.e., whether it was expected in the context). Both funny and congruous endings were judged to make sense in the context (average ratings of 4.7 ± 0.14 and 4.8 ± 0.1, respectively). In contrast, the incongruous endings did not make sense (rated 1.1 ± 0.07). Funny endings were unexpected (4.4 ± 0.35), and so were the incongruous endings (4.8 ± 0.2), whereas the congruous endings were quite expected (1.6 ± 0.5). Statistical comparisons of the ratings indicated unique, nonoverlapping patterns of plausibility and expectancy for each of the three sentence lists. More specifically, funny punch lines were rated as making much more sense than the incongruous endings, F(1, 6) = 2782.9, p < .0001, but they did not differ in plausibility from the congruous endings, F(1, 6) = 2.8, n.s. Similarly, funny punch lines were rated much higher on the “surprise” dimension than the congruous endings, F(1, 6) = 229.1, p < .0001, but in this respect differed only marginally from the incongruous endings, F(1, 6) = 5.7, p < .1. The entire sentence list is included as a supplemental Appendix.

Data acquisition and analysis

Whole-head MEG signals were recorded from 204 channels (102 pairs of planar gradiometers) with a Neuromag Vectorview instrument (Elekta Neuromag) in a magnetically and electrically shielded room. The data were recorded continuously with a 600-Hz sampling rate and 0.1- to 200-Hz filtering. Condition-based averages were constructed from trials that were free of eye blinks or other artifacts and that were rated correctly as funny or not funny and were bandpass filtered at 0.5–20 Hz. Averaged waveforms from one subject are shown in Fig. 1. An EEG signal referenced to the nose was acquired simultaneously from nonmagnetic scalp electrodes (Fz, Cz, Pz) embedded in an electrode cap (Electro-Cap International, Inc.). The electrooculogram was recorded between electrodes placed below the outer canthus of the right eye and just above the nasion. The electrode impedance was kept below 5 kΩ. Scalp ERPs were measured for each subject and each electrode within time windows from 350–500 ms and 700–1,150 ms and were submitted to a within-subjects ANOVA (Woodward, Bonett, & Brecht, 1990) with the factors Electrode and Condition.

Fig. 1
figure 1

Average MEG waveforms obtained with planar gradiometers from 1 subject. Responses to funny, congruous, and incongruous endings are superimposed. Note a larger N400m to incongruous endings in the left temporal sensors and sensitivity to humor during a later stage in the right frontal sensors

In addition, high-resolution T1-weighted MRI structural images were obtained from each subject with a 1.5-T Picker Eclipse scanner (Marconi Medical, Cleveland OH). Each individual’s cortical surface was reconstructed from these structural images using automatic gray/white segmentation, tessellation, and inflation of the folded surface tessellation patterns (Dale, Fischl, & Sereno, 1999; Fischl, Sereno, & Dale, 1999). For each individual’s data, the inflated cortical surface was subsampled to ~2,500 dipole locations per hemisphere and served as a solution space to constrain a noise-normalized minimum norm inverse solution. The resulting approach, termed anatomically constrained MEG or aMEG, assumes that the synaptic potentials giving rise to the summated MEG lie in the cortical gray matter of each subject, so each individual’s reconstructed cortical surface serves as a custom-tailored anatomical constraint for the inverse solution.

Precise co-registration of the MEG signal with structural MRI images was made possible by digitizing the main fiduciary points—such as the nasion and preauricular points, the position of the magnetic coils attached to the skull, and a large array of random points covering the scalp, with the Polhemus 3Space Isotrak II system. After calculating the forward solution with a boundary element model (Oostendorp & van Oosterom, 1991), dipole strength power was estimated at each cortical location every 5 ms using a linear estimation minimum norm approach (Dale & Sereno, 1993; Hämäläinen & Ilmoniemi, 1994). No a priori assumptions were made regarding the local dipole orientation, so the local current dipole power equals the sum of squared dipole component strengths. Noise estimates obtained from the 300-ms-long prestimulus time window were used to normalize the signal estimates. Dividing the dipole strength power estimated at each cortical location by the noise power has the effect of transforming power maps into statistical significance maps, as well as making the point-spread function relatively uniform across the cortical surface (Dale et al., 2000; Liu, Dale, & Belliveau, 2002). The resulting “brain movies” are a series of frames of dynamic statistical parametric maps (dSPMs), similar to the statistical maps used to analyze fMRI or PET data, with the exception that here the activity is estimated at each time point. These noise-normalized average estimates of the local current dipole power for each location match the F distribution under the null hypothesis. Thus, the dSPM maps are displays of the statistical tests of the null hypothesis that, at a particular latency and location, there is no difference in the activity evoked by the condition and the baseline period. Inverse solutions for MEG signal from all of the subjects were averaged using cortical surface alignment (Fischl, Sereno, Tootell, & Dale, 1999), in a manner analogous to the “fixed effects” in fMRI analyses (Dale et al., 2000; Friston, Holmes, & Worsley, 1999; Marinkovic et al., 2003). Figure 2 presents group-average dSPMs of the overall estimated activity patterns for each stimulus condition at selected time latencies. To view the activity estimated to lie within sulci, the estimated cortical activity is displayed on “inflated” views of the cortical surface. Thresholds are noted within the figure.

Fig. 2
figure 2

Group average (N = 10) dynamic statistical parametric maps (dSPMs) of the overall activity estimated at successive latency windows to funny, congruous, and incongruous endings. All three types are initially processed by the same left-lateralized ventral processing stream that has been observed in other studies of word processing. Left-lateralized temporo-prefrontal areas contribute to the initial lexical–semantic access embodied by the N400m process, which is largest to incongruous endings (blue arrow). Subsequently, the left (green arrows) and right (red arrow) prefrontal areas are additionally engaged after ~700 ms, especially to funny punch lines as the “twist” is integrated with the preceding context

Source localization from extracranial EEG and MEG measures is inherently uncertain (Dale et al., 2000). This uncertainty is influenced by a variety of factors, including the spread of the signal from the generator to the sensors, limited sampling of the field, and overlap of activity from different sources at each sensor, all within the context of the assumptions required to arrive at an inverse solution. These assumptions include the locations where generation is allowed, whether the generators are focal or distributed, the synchrony allowed between generators in different locations, and the quantity assumed to be minimized by the brain (e.g., power vs. current). Liu et al. (2002) estimated localization uncertainty using the same inverse approach and forward model that is employed in this study (Dale et al., 2000). In their approach, uncertainty was quantified using metrics of “crosstalk” (the amount of activity incorrectly localized into a specified location from other locations) and “point-spread” (the mislocalization of activity from that specified location to other locations in the brain). Their average crosstalk was 17%, largely due to the measurement of the very deep or radial (gyral crown) sources. Whereas the standard minimum norm approach tends to erroneously ascribe focal and deep sources to extended superficial estimates (Dale & Sereno, 1993; Hämäläinen & Ilmoniemi, 1994), the noise normalization approach has a strongly beneficial effect on the point-spread function, resulting in a good spatial uniformity across the surface, averaging around 20 mm (Dale et al., 2000; Liu et al., 2002) on the cortical surface. These calculations were based on 122 MEG sensor locations and on single-subject solutions. Since the estimated localization accuracy improves with increased numbers of sensors and multiple subjects, it is reasonable to assume that the spatial resolution obtained in the present study is somewhat better.

Snapshots of dSPMs (“brain movies”) represent reliability estimates of the overall activity pattern at a particular time point for all three conditions, as shown in Fig. 2. The snapshots presented in Fig. 3 illustrate differential activity by presenting subtraction estimates of activity evoked by funny versus congruous (upper row) and funny versus incongruous conditions (lower row). An alternative insight into the temporal dynamics of the activity estimated for each cortical location is offered by plotting the estimated noise-normalized dipole strength across all time points for selected regions of interest (ROIs), as shown in Fig. 4. These waveforms represent estimated time courses of the dipole strength moments in the cortical source space. To further explore the statistical significance of the comparisons between conditions that would permit generalization of the effects to the population, analogous to random-effect analysis in fMRI (Friston et al., 1999), ROIs were chosen for the relevant areas on the cortical surface. The ROIs were based on the overall group average estimated activity. The same set of group-based ROIs was used for all subjects in a manner blind to their individual activations by translating them across all surfaces with an automatic spherical morphing procedure (Fischl, Sereno, Tootell, & Dale, 1999). Estimated noise-normalized dipole strength values were calculated for each subject across all conditions and averaged across the cortical points comprised in each ROI and across the time points in each time window. As shown in Fig. 4, the ROIs primarily encompassed the activity estimated to temporal and prefrontal areas bilaterally, in addition to time courses of the activity estimated to the visual sensory area. More specifically, the temporal ROIs included LH and RH anteroventral temporal regions (LHAT and RHAT, respectively). LHAT included the inferior portion of the left temporal pole and the anterior part of the inferior temporal cortex, whereas RHAT included the right temporal pole. The left ventrolateral prefrontal (LPF) ROI encompassed the anterior portion of the left inferior frontal gyrus, pars triangularis. The left frontomedial area (LFM) was centered on the posterior portion of the rostral anterior cingulate cortex. The right prefrontal (RPF) region encompassed the posterior portion of the right rostral middle frontal cortex. Within-subjects ANOVA (Woodward, Bonett, & Brecht, 1990) was used to statistically compare differences across conditions for each ROI in the activity integrated within two time windows with respect to punch line onset: 350–550 ms, approximating the “lexical–semantic analysis” stage, and 700–1,150 ms, encompassing the “ambiguity detection and global integration” stage. This approach tests differences between conditions while controlling for intersubject variability in a traditional statistical sense. It is quite conservative, since it does not allow for idiosyncrasies in terms of either spatial distribution or latency between subjects.

Fig. 3
figure 3

Group average differential activity to funny punch lines, as compared with congruous (upper row) and incongruous (lower row) endings. Activity was estimated in each individual to signals obtained by subtracting the activity to congruous and incongruous endings from that evoked by funny punch lines. These activity estimates were then averaged across the individuals after aligning their cortical surfaces (Fischl, Sereno, Tootell, & Dale, 1999). During the initial lexical–semantic stage, the funny punch lines evoked the smallest N400m on the left (blue arrow). Subsequently, they engage contributions from distributed prefrontal areas during ambiguity detection and resolution of the intended meaning of the joke. Since they cannot be meaningfully integrated in the context, the incongruous endings evoke the weakest activity in the left prefrontal area (green arrow), suggesting this area’s involvement in evaluating meaning plausibility. The right prefrontal region (red arrows) is sensitive to “funniness” as it searches for alternative meanings and derives global coherence with the context

Fig. 4
figure 4

Group average time courses of the estimated noise-normalized dipole strengths to funny, congruous, and incongruous conditions. Gray shading marks the significant effects during the lexical–semantic stage (350–550 ms) and the ambiguity detection and resolution stage (700–1,150 ms). Bar graphs denote the statistical significance of pairwise comparisons of the three conditions, as follows: * denotes p < .05, % denotes p < .1, and – denotes nonsignificant comparisons. LFM, left frontomedial cortex; LPF, left ventrolateral prefrontal cortex; LHAT, left hemisphere anteroventral temporal lobe; OCC, occipital lobe; RHAT, right hemisphere anteroventral temporal lobe; RPF, right dorsolateral prefrontal cortex

Results

Behavioral results

Overall, the subjects’ ratings of “funniness” were consistent with the a priori sentence categorizations. On average, 82.5% (mean) ± 2.2% (SEM) of the joke sentences were rated as funny, and most congruous, 89.8% ± 2.3%, and incongruous, 97.9% ± 0.6%, sentences were rated as not funny. Thus, the funny sentences were rated as significantly more funny than not, F(1, 9) = 220.4, p < .0001. Conversely, both the congruous, F(1, 9) = 293.3, p < .0001, and incongruous, F(1, 9) = 6,568.2, p < .0001, sentences were consistently rated as “not funny” more often than “funny.” As expected, speed of reactions did not differ among the conditions, since they were measured from the onset of the response cue rather than from the punch lines, F(2, 18) = 0.85, p < .5. Means ± SEMs were as follows: 837.4 ± 112 ms for the funny sentences rated as funny, 788.2 ± 90 ms for congruous sentences rated as not funny, and 765.4 ± 108 ms for incongruous sentences rated as not funny.

Spatiotemporal aMEG estimates

Comparisons of the processing dynamics in the form of “brain movies” suggested that punch lines are processed through the same spatiotemporal cortical stream as other words (Marinkovic, 2004). Following an early response in the visual cortex, the activity spreads anteriorly via the ventral visual stream. It is left-lateralized at 170 ms in the anteroventral occipital area, possibly subserving word form encoding (Dhond, Buckner, Dale, Marinkovic, & Halgren, 2001; McCandliss, Cohen, & Dehaene, 2003; Tarkiainen, Cornelissen, & Salmelin, 2002). As shown in Fig. 2, by 250 ms activation encompasses the anterior left temporal area, and it subsequently reaches left inferior prefrontal regions. The left anterior temporal and prefrontal processing reflected in the N400m embodies the initial semantic analysis and is greatest for incongruous endings. This is an expected finding, confirming the well-known effect of a larger N400 to words that do not fit with the context (Kutas & Federmeier, 2000; Van Petten & Luka, 2006). We hypothesized that funny punch lines would also evoke a larger N400m than the congruous endings, presuming that their incongruity would be detected at this stage. Contrary to our hypothesis, however, funny punch lines evoked the smallest N400m in LHAT as illustrated in Figs. 2, 3, and 4. The scalp N400 is equally small to the congruous endings and funny punch lines (Fig. 5) but is predictably largest to the incongruous endings. This suggests that the reduced N400m to funny punch lines may be due to the “surface congruity” with the preceding setup question at this initial semantic and contextual processing stage, as discussed further below. Subsequent contributions from distributed areas including left frontotemporal and, especially, right prefrontal regions are needed to resolve the “deep incongruity” and derive the intended funny meaning. This late global stage takes place after ~700 ms, as the “twist” in the funny ending is incorporated into the preceding context through higher-order cognitive and affective integration, resulting in a sense of amusement. It seems to encompass ambiguity detection and may rely on the conflict-monitoring functions of the executive system (Kolk & Chwilla, 2007). It is also an interpretative process, reflected in a continued combinatorial analysis of alternative meanings in order to achieve and consolidate coherence with the preceding context (Kuperberg, 2007).

Fig. 5
figure 5

Grand average ERPs recorded from Fz, Cz, and Pz and averaged across all subjects. Incongruous endings evoke the largest N400m, whereas the funny punch lines evoke a protracted P600-like positivity, reflecting interpretive–integrative processing during the ambiguity detection and global integration stage. Negative is plotted upward

ANOVAs of the activity time course estimates for the ROIs and the two latency windows are presented next. Estimated noise-normalized dipole strength values were averaged across all the cortical vertices comprised in each ROI and averaged across the time points in each time window for each subject. These values were entered into a within-subjects ANOVA (Woodward et al., 1990) and compared across the three levels of the Ending Type factor, including funny, congruous, and incongruous endings. The main point of interest was differential activity estimates to funny relative to the other two, not-funny endings. A more complete insight into pairwise comparisons among all three conditions for the relevant time windows is presented in Fig. 4 in the form of bar graphs. Since the three-way comparisons were planned by virtue of the hypotheses and the design, the reported p values were not corrected for multiple comparisons. Nevertheless, the reported results need to be considered with prudence.

MEG: “Lexical–semantic stage,” 350–550 ms

The main effect of condition was significant in the left, F(2, 18) = 10.0, p < .005, and the right, F(2, 18) = 4.4, p < .05, anterior temporal areas; see Fig. 4. The incongruous punch lines evoked the strongest activity in LHAT when compared with both the congruous, F(1, 9) = 6.8, p < .05, and funny, F(1, 9) = 12.3, p < .01, endings. In RHAT, the incongruous punch lines tended to evoke the strongest activity, as compared with the average of congruous and funny punch lines, F(1, 9) = 4.5, p < .06. However, contrary to our hypothesis, the funny punch lines evoked the weakest activity in LHAT within this latency window, as compared with both congruous, F(1, 9) = 7.7, p < .05, and incongruous, F(1, 9) = 12.3, p < .005, punch lines. None of the effects were significant for the prefrontal ROIs at this latency.

MEG: “Ambiguity detection–global integration,” 700–1,150 ms

The main effect of condition observed in the right dorsolateral prefrontal area (RPF), F(2, 18) = 3.7, p < .05, was due to the significantly stronger activity to the funny, as compared to both congruous, F(1, 9) = 6.5, p < .05, and incongruous, F(1, 9) = 5.2, p < .05, endings, indicating unique sensitivity of the RPF to funny punch lines. The LHAT was also slightly sensitive to the funniness of the jokes, at least in some subjects, since the funny punch lines evoked a marginally stronger activity than the incongruous ones, F(1, 9) = 3.8, p < .1. In contrast, the LPF appeared responsive to semantic incongruity. Incongruous punch lines evoked weaker activity than the average of the other two conditions, F(1, 9) = 7.1, p < .05, possibly reflecting the fact that these endings could not be integrated with the context. Whereas the activity to incongruous endings was weaker than the averaged estimates to funny and congruous endings, pairwise comparisons with each individual ending did not reach significance. The incongruous estimates were marginally weaker than the funny ones, F(1, 9) = 3.8, p < .1, but they were not significantly weaker than the congruous endings, F(1, 9) = 2.5, p < .15. The main effect observed in the LFM, F(2, 18) = 3.7, p < .05, resulted from a significant difference between the funny and congruous endings, F(1, 9) = 41.6, p < .001. The funny and incongruous endings did not differ significantly, suggesting that this area may contribute to detection of semantic ambiguity and incongruity with the preceding context.

Scalp ERPs

Figure 5 depicts grand average waveforms obtained in all three conditions and averaged across all subjects. A large negativity peaking at ~400 ms (N400) was larger to incongruous trials, as compared with the funny F(1, 9) = 25.4, p < .001, and congruous, F(1, 9) = 26.6, p < .001, conditions. Funny and congruous trials did not differ, F(1, 9) = 0.8, p < .4. The main effect of electrode site, F(2, 18) = 4.4, p < .05, for this time window resulted from a larger negativity over the frontal, as compared with the posterior, scalp. A large, protracted positivity (here termed the P600m) was greater to funny punch lines in comparison with congruous endings, F(1, 9) = 31.5, p < .001, or incongruous endings, F(1, 9) = 15.5, p < .01, whereas the congruous and incongruous endings did not differ, F(1, 9) = 0.03, p > .5. A main effect of electrode site, F(2, 18) = 12.3, p < .01, was due to a larger positivity over the posterior scalp. In agreement with other evidence, this P600m may reflect parallel contributions of a conflict-monitoring process and the combinatorial analysis stream (Kolk & Chwilla, 2007; Kuperberg, 2007) engaged in establishing coherence with the preceding context.

Discussion

The results of the present study concur with prior neuroimaging and lesion evidence suggesting that humor appreciation relies on a distributed neural circuit encompassing temporal and prefrontal regions in both hemispheres (Bartolo et al., 2006; Brownell et al., 1983; Goel & Dolan, 2001; Mobbs et al., 2003; Moran et al., 2004; Ozawa et al., 2000; Suls, 1972; Veatch, 1998; Wild et al., 2006). Furthermore, our results indicate that these areas contribute to humor processing in distinct spatiotemporal stages: (1) Surface-level semantic analysis is embodied by the N400m and subserved by bilateral anterior temporal and left inferior prefrontal regions. (2) Interpretive integrative processing comprises (a) detection of ambiguity or conflict between the dominant semantic representations of the punch line and the context and (b) global coherence integration, which is reflected in semantic, phonological, metaphorical, and other supralinguistic integration of the punch line with the preceding sentence. Our data suggest that the right dorsolateral prefrontal, anterior frontomedial, left ventrolateral prefrontal, and left anterior temporal cortices contribute to post-N400m stages (2a) and (2b), which appear to overlap in time. Simultaneously recorded scalp ERPs confirm that a P600-like protracted positivity is elicited by funny punch lines during this time period, possibly reflecting interpretative–integrative processing (Kuperberg, 2007).

As we follow words in a sentence, we continually construct predictions based on contextual and mnemonic aspects of our real-world knowledge (Federmeier & Kutas, 1999b; Hagoort & Brown, 2000; Marinkovic, 2004). However, the success of jokes often derives from setting up a strange or unrealistic context that precludes our capacity to create specific predictions (e.g., “What fish will make you an offer you can’t refuse?”) The subsequent punch line (“The Codfather”) may be semantically primed by the categorical nature of the fish–cod association. The most active locations during the lexical–semantic stage (N400m) are the anterior temporal lobes. This stage performs an initial analysis based on the associative relationships between the setup question and the punch line. However, when continued ambiguity indicates that this stage is not sufficient to understand the joke, distributed prefrontal and temporal areas are then recruited to interpret and integrate the “twist” in the meaning. The joke is appreciated only if the famous quote uttered by Don Corleone in the movie The Godfather is recalled: “I’m gonna make him an offer he can’t refuse.” This “superficial congruity” (fish–cod) is reflected in an attenuated N400m that acts as a phonological link (cod–God) that, coupled with prior knowledge of this phrase, leads to a resolution and appreciation of the joke. Indeed, setup questions commonly prime the dominant meaning of the punch line but contain cues to the intended meaning as well (e.g., “What do you call a crazy spaceman? An astronut.”). Consistent with its general role in conflict detection (Bush, Luu, & Posner, 2000), anterior frontomedial cortex may be involved in monitoring for potential ambiguity between the initially derived meaning of the punch line and the preceding setup question (Kolk & Chwilla, 2007). At the same time, the left, and especially the right, lateral prefrontal areas may be engaged in resolving the ambiguity by considering possible alternative meanings and reintegrating the punch line with the contextual constraints imposed by the setup question. Thus, our results support the multistage model of joke comprehension originally based on RH lesion studies (Brownell et al., 1983; Shammi & Stuss, 1999; Suls, 1972; Veatch, 1998). The present evidence has refined our understanding of the spatiotemporal aspects of humor processing by providing insight into time windows and anatomical underpinnings of the initial lexical–semantic analysis and subsequent interpretive–integrative processing during the “ambiguity detection and global integration” stage.

Lexical–semantic association stage: N400m

The strongest activity at this latency was evoked by the incongruous endings in both left and right anterior temporal regions (Figs. 2 and 4), replicating a well-known sensitivity of the N400 and its magnetic equivalent to semantic incongruity (Halgren et al., 2002; Kutas & Hillyard, 1980; Van Petten & Luka, 2006). Indeed, the most prominent generators of the N400 have been found in the anteromedioventral regions of the anterior temporal lobe with intracranial recordings (Halgren, Baudena, Heit, Clarke, & Marinkovic, 1994; Halgren et al., 2006; McCarthy, Nobre, Bentin, & Spencer, 1995; Nobre & McCarthy, 1995; Smith, Stapleton, & Halgren, 1986). Distributed MEG solutions concur with these estimates (Dhond et al., 2001; Halgren et al., 2002; Marinkovic et al., 2003).

The scalp-measured N400 was also largest to incongruent endings, but it did not differ between the congruous endings and funny punch lines (Fig. 5). The MEG data provided a finer, regionally specific differentiation in the left anteroventral temporal area. Whereas the congruent endings evoked intermediate activity, the funny punch lines elicited the weakest activity estimates in the LHAT area (Fig. 4). The observed left-lateralized specificity, however, aligns with suggestions that LH is sensitive to a predictive strategy and RH to a plausibility strategy (Kutas & Federmeier, 2000). Even though joke punch lines are unpredictable by design, joke stems share semantic or phonological features with the punch line, priming a dominant, but often incorrect, meaning of a word, resulting in a reduced N400m. This may be related to the “semantic illusion” effect, wherein words that are seemingly coherent but are not factually correct or congruous within the context evoke a smaller N400 (Nieuwland & Van Berkum, 2005). A commonly given example is the “Moses illusion” (Erickson & Mattson, 1981): “How many animals of each sort did Moses put on the ark?” In an effort to answer the question correctly, subjects readily focus on the animals and miss the fact that an erroneous name was given, since it was actually Noah who carried out the animal kingdom rescue mission according to scriptures. This effect is stronger if the foils are semantically associated with the correct targets (Erickson & Mattson, 1981; Nieuwland & Van Berkum, 2005), if they are preceded by a coherent sentence or part of discourse (Nieuwland & Van Berkum, 2005), and if they are delivered in the form of a question rather than a statement (Buttner, 2007). The resulting N400 attenuation is thought to arise from the superficial analysis of the meaning at this stage in an effort to get a gist (Kuperberg, 2007; Nieuwland & Van Berkum, 2005). Indeed, it has been shown that comprehension relies on a “good enough” representation rather than on the precise analysis of each word (Christianson, Hollingworth, Halliwell, & Ferreira, 2001; Ferreira, Bailey, & Ferraro, 2002; Sturt, Sanford, Stewart, & Dawydiak, 2004). Many jokes share these characteristics of temporarily “tricking” the recipient by introducing ambiguity and obscuring the intended meaning. Their funniness often depends on how successfully they exploit phonological similarities, in the case of homophonic puns (e.g., “What fruit is never lonely? Pears”), or meaning alternatives, in the case of homographic puns. For instance, the setup question “During a thunderstorm at a concert, who is most likely to be struck by lightning?” primes the dominant meaning of the punch line “the conductor,” resulting in an attenuated N400m. However, its alternative meaning (i.e., conductor of electricity) is needed to establish coherent understanding of the punch line. In this context, ambiguity results from aptly employing words with multiple interpretations. Its resolution and the resulting sense of funniness depend on finding and integrating the correct meaning during the continued analysis subsequent to N400m. Given that most of the jokes used in this study were puns, it is possible that the present results apply to that joke category and not to semantic jokes (Coulson & Kutas, 2001; Coulson & Lovett, 2004). Indeed, the left frontotemporal activity pattern observed during N400m in the present study roughly corresponds to the results of an fMRI study (Goel & Dolan, 2001) in which left inferior frontal and left posterior inferior temporal areas were activated by puns. The strongest N400m effect observed in our study was graded activity in the anterior temporal lobe, which is consistent with the intracranial EEG studies and MEG studies using distributed models cited above. Furthermore, semantic dementia observed after atrophy of this area confirms its essential contributions to semantic processing (Hodges, Patterson, Oxbury, & Funnell, 1992; Mummery et al., 1999). In contrast, no anterior temporal activation was reported by Goel and Dolan, possibly due to regionally specific signal loss as magnetic susceptibility artifacts cause distortions in these areas due to changes in local field gradients (Devlin et al., 2002; Jezzard & Clare, 1999).

To assess priming influences on the N400m attenuation observed here, we examined the associative semantic closeness between the sentence stems and punch lines across conditions. In the funny category, 28.8% of the punch lines contained either a synonym (e.g., “What do you call a pig that knows karate? Pork chop.”) or direct repetition (e.g., “What did the monster eat after the dentist pulled its tooth? The dentist.”). In contrast, only 8.8% of the congruous endings contained these relationships (e.g., “What might you title a funny book about dogs? Doggy jokes.”). The two conditions contained roughly equal numbers of endings with close semantic association (e.g., category exemplar relationship): 46% in the funny and 50% in the congruous category. Since priming exerts powerful effects on the N400 (Bentin, McCarthy, & Wood, 1985; Marinkovic et al., 2003; Rugg & Doyle, 1994), it is likely that the initial lexical–semantic associations to the funny punch lines were more easily accomplished due to more strongly primed associations, resulting in the attenuated N400m. This “surface congruity” detected by the lexical semantic analysis stage may actually serve the purpose of temporarily foiling or tricking the system and may be essential to joke processing, particularly as it pertains to puns. It is possible that the sequence of a temporary semantic illusion stage followed by detection of “deep incongruity” constitutes the “twist” whose detection and successful reinterpretation (or “frame shifting”; Coulson, 2001) actually renders the sense of amusement. A distributed network including bilateral prefrontal and temporal areas is engaged in providing a wider pool of remote associations and alternative meanings during the integration stage.

Ambiguity detection: global coherence integration

Studies suggest that the verification of plausibility and the processing of ambiguities takes place subsequent to N400, possibly reflecting attempts to revise the interpretation of the target word in the context (Kuperberg, 2007). A late slow positivity (P600 effect) has been observed with syntactic ambiguities and violations (Friederici et al., 2002; Kaan, Harris, Gibson, & Holcomb, 2000; Osterhout et al., 1994; Service, Helenius, Maury, & Salmelin, 2007). This positivity is also sensitive to discourse complexity (Kaan & Swaab, 2003), plausible but low-cloze-probability endings (Federmeier, Wlotko, De Ochoa-Dewald, & Kutas, 2007; Moreno, Federmeier, & Kutas, 2002), semantic anomaly (Kolk, Chwilla, van Herten, & Oor, 2003; Kuperberg et al., 2003; van Herten, Kolk, & Chwilla, 2005), and animacy violations (Kuperberg, Kreher, Sitnikova, Caplan, & Holcomb, 2007). These P600 effects cover a rather wide range of syntactic and semantic anomalies and may reflect different underlying generator configurations, but they all seem to be evoked by an ambiguity that needs to be resolved in order to integrate the meaning of a sentence. According to some accounts, these ambiguities reside in the language domain within the interactive and combinatorial stream of syntactic and semantic constraints (Kemmerer, Weber-Fox, Price, Zdanczyk, & Way, 2007; Kuperberg, 2007). Other researchers have proposed that the P600 reflects conflict monitoring that relies on contributions from the executive control system in the brain (Kolk & Chwilla, 2007; van Herten, Chwilla, & Kolk, 2006). fMRI studies suggest that the anterior frontomedial cortex is activated during conflict monitoring and error detection in executive tasks (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Bush et al., 2000), but also in language tasks as a function of text coherence (Ferstl & von Cramon, 2002) and reference or syntactic ambiguity (Nieuwland, Petersson, & Van Berkum, 2007). Using MEG, anterior midline activity has been observed to syntactic–semantic mismatches starting at ~600 ms (Pylkkanen, Martin, McElree, & Smart, 2009). In broad agreement with these studies, our results suggest that the anterior frontomedial area may contribute to the detection of implausible or ambiguous interpretations, necessitating additional processing to arrive at a coherent understanding. At the same time, the left ventrolateral prefrontal area may be involved in semantic processing of meaning plausibility, in line with other evidence (Wagner, Paré-Blagoev, Clark, & Poldrack, 2001). In the present study, aMEG time courses of the estimated late activity are slow-rising and ramp-like, in contrast to the phasic deflections that are characteristic of N400m. This may be due to the temporal overlap of the activity elicited by different jokes. In other words, it is difficult to precisely time-lock the moment of “getting” a joke. For each subject, the meaning of individual jokes is integrated at different latencies, resulting in an overlap between the stages of ambiguity detection and global coherence integration. Further smearing is a result of group-level averaging of activity across all the subjects during the analysis. Thus, based on the present results, it is not possible to resolve these two processes—if, indeed, they are dissociable. Future studies may be able to investigate and potentially resolve these stages by using different paradigms and careful manipulations.

The RPF was most sensitive to the funniness of the punch lines, along with the left anterior temporal area. In an effort to “get” the joke, the left anterior temporal area may access semantic memory representations, while the right prefrontal region may contribute divergent, alternative word meanings that are based on weak semantic associations (Booth et al., 2002; Grindrod & Baum, 2005; Kiefer, Weisbrod, Kern, Maier, & Spitzer, 1998), contextual demands, and lexical ambiguity (Federmeier & Kutas, 1999a; Mason & Just, 2007). Recent neuroimaging evidence, however, is more equivocal on the question of the laterality of nonliteral language, because some studies have confirmed RH dominance (Mashal, Faust, Hendler, & Jung-Beeman, 2007) and others LH dominance (Lee & Dapretto, 2006; Mashal, Faust, Hendler, & Jung-Beeman, 2009; Rapp, Leube, Erb, Grodd, & Kircher, 2007) in processing metaphors. By providing insight into spatiotemporal stages of joke processing that rely on contributions of both hemispheres, our study provides a more nuanced depiction of a processing stream as it unfolds in time and engages a distributed network during integration of ambiguous meaning. In that sense, our results substantiate suggestions that the neural dynamics underlying nonliteral language result from a complex interaction of the factors influencing the integration of meaning and inference-making, such as relative distance of semantic associations, ambiguity, difficulty, novelty, contextual constraint, memory, and funniness (Coulson & Van Petten, 2007; Giora, 2007; Kuperberg, Lakshmanan, Caplan, & Holcomb, 2006).

Conclusions and limitations of the study

Several factors should be considered that limit the generalizability of this study’s findings. Since the task was to rate sentences for “funniness,” the emphasis was placed on making a decision as to whether a punch line was funny given its preceding context. One third of the sentences concluded with funny punch lines, so it is worthwhile to consider if this somewhat “oddball” probability could have affected the observed late effects by superimposing a classical P3b. Using the same inverse methodology used in the present study, we examined the potentials and fields evoked by rare attended target changes in tone sequences (Halgren, Sherfey, Irimia, Dale, & Marinkovic, in press). After balancing the rare and frequent tones with respect to pitch and habituation status, aMEG estimated the localization of activity during the scalp P3b mainly to the temporal lobe. This finding was confirmed with cortical lead-field analysis. Another MEG study, using visual categorization tasks more similar to the present study, estimated P3b generators to the central midline (Mecklinger et al., 1998). Other MEG studies of the P3b in classical oddball tasks, reviewed in Halgren et al. (in press), have estimated sources to various locations, but not to the anterior superior lateral prefrontal cortex. Intracranial recordings have identified an extended network associated with the P3b, including hippocampus, ventromedial temporal cortex, the superior temporal sulcus, and a superior parietal region (reviewed by Halgren, 2008). The superior temporal and parietal sites are most consistent with lesion (reviewed by Soltani & Knight, 2000) and hemodynamic (reviewed by Linden, 2005) studies of oddball paradigms. Thus, sources active during the P3b in classical oddball paradigms are clearly distinct from the late anterolateral prefrontal source evoked by jokes in the present study. Similarly, the sources active during the P3b are also distinct from those during the N400. For example, the N400 is more prominent in the LH, whereas the P3b is more prominent in the RH (Halgren et al., in press; Smith et al., 1986). Although spatiotemporal overlap between the end of the N400 and the beginning of the P3b could thus affect the scalp N400 measures, it is not likely to affect the more focal N400m measures.

Unlike previous studies, in which the humorous nature of presented sentences was not relevant to the task (Coulson & Lovett, 2004; Coulson & Williams, 2005), subjects in our study were requested to rate each sentence/punch line as either “funny” or “not funny.” This may have imposed bias on the observed late activity, since ERP language studies reported that a larger P600 is elicited when subjects are asked to make plausibility judgments, as compared with reading for comprehension (Kolk et al., 2003). On the other hand, it is quite possible that a sense of amusement can be experienced more fully if one actually focuses on the humorous aspects of the presented material, due to the emotional dimension of jokes. A related issue concerns a task-induced overlap between the functions of understanding a funny punch line and deciding whether it is funny, since these processes are not the same. These questions will need to be addressed in a future experiment that would include a balanced number of funny versus not-funny punch lines, with response requirements that are orthogonal to the predesigned funniness status, and with better controlled setup versus punch line matching. Parallel manipulations of explicit judgment versus reading for comprehension could explore effects of the task and decision making on the observed activity estimates across both genders.

In sum, despite these caveats in interpretation, the temporal precision of the aMEG approach employed in our study offers insight into the functional anatomy and the temporal sequence of the multistage joke comprehension process as it unfolds in time. Punch lines are initially processed by the same left-lateralized ventral visual stream as other words. During the subsequent stage of lexical–semantic integration, the smallest N400m is elicited by funny punch lines in the anterior left temporal area, indicating facilitated initial semantic retrieval. The attenuated N400m may result from the primed “surface congruity” with the preceding stem. However, this stage is not sufficient to appreciate the joke, and continued analysis engages distributed prefrontal areas bilaterally in order to understand the joke’s intended alternative meaning and integrate it with the preceding context. Amusement may depend on submitting to “trickery” initially, followed by detecting and resolving the ambiguity in a clever way. A “twist” or ambiguity contained in the funny punch lines may be detected by anterior frontomedial areas. The left prefrontal area may contribute to semantic processing of the meaning plausibility, whereas the right area may search semantic memory for alternative meanings in order to “get” the joke. Coherent integration of the intended meaning and a sense of amusement may emerge from the dynamic interaction of these regions with special contributions from the right prefrontal region.