Elsevier

Brain Research

Volume 1377, 4 March 2011, Pages 78-83
Brain Research

Research Report
How vision is shaped by language comprehension — Top-down feedback based on low-spatial frequencies

https://doi.org/10.1016/j.brainres.2010.12.063Get rights and content

Abstract

Effects of language comprehension on visual processing have been extensively studied within the embodied-language framework. However, it is unknown whether these effects are caused by passive repetition suppression in visual processing areas, or depend on active feedback, based on partial input, from prefrontal regions. Based on a model of top-down feedback during visual recognition, we predicted diminished effects when low-spatial frequencies were removed from targets. We compared low-pass and high-pass filtered pictures in a sentence–picture-verification task. Target pictures matched or mismatched the implied shape of an object mentioned in a preceding sentence, or were unrelated to the sentences. As predicted, there was a large match advantage when the targets contained low-spatial frequencies, but no effect of linguistic context when these frequencies were filtered out. The proposed top-down feedback model is superior to repetition suppression in explaining the current results, as well as earlier results about the lateralization of this effect, and peculiar color match effects. We discuss these findings in the context of recent general proposals of prediction and top-down feedback.

Research Highlights

► Sentence–picture match effects show that linguistic context modulates visual processing. ► The top-down feedback model accounts for these effects. ► This model predicts that effects are driven by low spatial frequencies. ► Removing low spatial-frequency information diminishes this effect. ► Top-down feedback is superior to repetition-suppression in explaining match effects.

Introduction

With language, “we can shape events in each others brains with exquisite precision” (Pinker, 1994). How this is achieved has received much interest since the advance of the embodied cognition perspective (Glenberg and Kaschak, 2002). According to the embodied- or grounded-cognition framework, conceptual knowledge is grounded in perceptual and motor states (Barsalou, 2008). For language, this means that listeners represent mentioned scenes by activating perceptual and motor simulations of these scenes. These simulations rely on similar neural resources that would be activated during perception of the described scene, and hence act as perceptual symbols (Barsalou, 1999). While perceptual symbol systems were introduced as an alternative to abstract representations, recent proposals take a more balanced view and treat them as complementary representational systems (Dove, 2009, Mahon and Caramazza, 2008). Such a dual view renders the question of what brings about these effects even more important. Our goal in this study was to investigate the mechanisms that give rise to very specific context effects. Can these effects be explained by passive repetition suppression, or are more complex processes involved?

A plethora of behavioral and neurocognitive studies has demonstrated that language comprehenders activate contextually relevant perceptual or motor features (for a review see: Barsalou, 2008). Neurocognitive studies have explored the activation of sensory and motor areas in language comprehension at a neural level. For example, fMRI studies have investigated how verbs that describe actions executed with different effectors are processed (Buccino et al., 2005, Hauk et al., 2004). Similarly, specific talked-about sensory modalities and areas for motor-planning are activated during story comprehension (Speer et al., 2009). Generally, these neurocognitive studies have focused on very coarse differentiations between broad semantic categories, while behavioral work has focused on fine-grained differences between stimuli.

The sentence–picture-verification paradigm has shown how very detailed grained perceptual features are activated during language comprehension, and this has been widely replicated (Brunye, 2009, Dijkstra et al., 2004, Madden and Therriault, 2009, Noordzij et al., 2005, Stanfield and Zwaan, 2001, Zwaan et al., 2002). In this paradigm, participants have to verify whether pictured targets were mentioned in a preceding sentence or not. Critically, the relation between sentences and pictures is manipulated. Objects mentioned in the sentence can either match or mismatch task-irrelevant perceptual features that are implied in the sentence (e.g. their orientation, shape or direction of motion). Reaction times are typically facilitated when these implied features match the visually presented targets (i.e. the match-effect: Stanfield and Zwaan, 2001).

Neurocognitive studies on the neural basis of mental imagery provide converging evidence that sensory cortices are activated during mental-imagery tasks (Kosslyn et al., 2001), but also by tasks that do not require conscious imagery, such as the sentence–picture-verification task (Carpenter et al., 1999). While these studies suggest an involvement of perceptual-processing regions, the precise mechanisms are far from clear. The most parsimonious mechanism for the match effect assumes that simulating an object has the same effect as actually seeing it. Specifically, simulations can evoke effects comparable to passive repetition suppression, which is frequently observed when visually presented objects are repeatedly accessed (Grill-Spector et al., 2006, Henson, 2003). According to this explanation, targets that match the orientation of the description have a processing advantage, because language comprehension engages the same neurons that code the orientation as in picture perception. If the context implies the same orientation as shown in the subsequent picture the overlap between simulation and the actual object is larger than in a mismatching condition (Stanfield and Zwaan, 2001).

A recent MEG-study with the sentence–picture-verification paradigm found the opposite. Pictures that matched the implicitly mentioned shape evoked larger M1 amplitudes than mismatching targets, when reduced activations in perceptual areas would have been expected (Hirschfeld et al., 2011). Furthermore, unrelated pictures evoked a large N400-like response in the left hemisphere that was similar for matching and mismatching targets. A recent EEG-study revealed systematic differences between the effects of visual adaptation and mental imagery on visual processing (Ganis and Schendan, 2008). While adaptation decreased the amplitude of the N170 component, imagery enhanced it. The authors surmise that amplifying effects of linguistic contexts or imagery result from top-down feedback rather than passive repetition suppression (Ganis and Schendan, 2008, Hirschfeld et al., 2011). The crucial difference between this view and previous explanations, which also entail top-down feedback, is that only former but latter assumes reentrant top-down feedback connections.

A highly detailed model for such a reentrant feedback mechanism has been proposed by Moshe Bar (2004). According to this model visual–object recognition involves a very fast initial analysis of low-spatial-frequency information in a visual scene. This initial analysis relies on the dorsal magnocellular pathway to the orbitofrontal cortex (Kveraga et al., 2007) that generates initial guesses about possible objects. These are integrated with the bottom-up stream of the slower ventral visual pathway via top-down projections from the orbitofrontal cortex to fusiform gyrus. Thus, context might not affect primary sensory processing regions alone, but also induce changes in reentrant orbitofrontal connections that make use of low spatial frequency information, which is rapidly extracted from the target.

This top-down feedback mechanism can also account for hemispheric differences in the sentence–picture-verification task (Lincoln et al., 2008). Patients with damage to the left hemisphere exhibit similar context effects than controls, but patients with damage to the right hemisphere do not show any effect. Assuming that the right hemisphere is better tuned to low-spatial frequencies (Grabowska and Nowicka, 1996), a feedback mechanism based on low-spatial frequencies would predict such differences (Lincoln et al., 2008).

The aim of the present study was to test if the context effect in the sentence–picture-verification task depends on a feedback mechanism that is driven by low-spatial frequencies. We used a sentence–picture-verification task and manipulated the targets by high-pass or low-pass filtering, a manipulation that is known to differentially activate top-down feedback (Bar et al., 2006). The rationale was as follows. If context effects are triggered by feedback connections involving the magnocellular pathway (Kveraga et al., 2007), removing low-spatial-frequency information with a high-pass filter should diminish the match effect.

Participants read sentences that implied a specific shape of a mentioned object, followed by a pictured object. They had to decide if the target object was mentioned in the sentence or not. Two picture variants A and B that showed the object in different shapes were used (e.g. a sitting or a flying duck). Each picture was paired with three sentences, sentence A implied the same shape as depicted in the picture A (e.g. “the ranger saw a duck in the lake”), sentence B implied the same shape as depicted in picture B (e.g. “the ranger saw a duck in the air”), or an unrelated sentence not mentioning the depicted object at all (e.g. “the ranger buttered his bread”). Previous research used the interaction between picture-variant and sentence as an index of match effect (Stanfield and Zwaan, 2001, Zwaan et al., 2002). The unrelated condition was added to test whether responses are modulated in match, mismatch or both conditions.

Data were analyzed using mixed-effects models, with crossed-random effects for subjects and items (Baayen et al., 2008, Jaeger, 2008). These analyses replace the traditional separate subject and item analysis, as they allow the simultaneous modeling of both. Binomial mixed-effects models are more accurate for error-data than analysis of transformed error rates (Jaeger, 2008). Our analysis proceeded in two steps. First, only trials affording yes-responses were compared, leaving out the unrelated sentences. A model was fitted with sentence (A, B, vs. unrelated), picture ( variant A vs. variant B), filter-condition (high-pass, low-pass), and their interactions to the latency and error data. A repetition (first to sixth block) effect was included to capture longitudinal effects of familiarization with the material. Next, the latency data were modeled collapsing the factors sentence and picture, into the factor condition (match, mismatch, and unrelated) that was entered using treatment coding.

Section snippets

Results

Fillers and warm-up items were removed from the data. For the reaction-time analysis incorrect responses (7.12%) and extreme outliers (trials slower than mean reaction time plus 4× sd) were excluded from the analysis (1.11%). Reaction times were log-transformed to achieve normality (Baayen et al., 2008).

Analysis of the errors showed no speed-for-accuracy-tradeoff. Instead, reaction times were faster in correct than to incorrect responses (638 ms vs. 818 ms; beta = 0.02; t = 8.93; pMCMC < 0.001). The

Discussion

The aim of this study was to test the neural mechanism that underlies the match effect in sentence–picture verification (Stanfield and Zwaan, 2001, Zwaan et al., 2002). Specifically, we investigated the impact of high- and low-spatial-frequency information contained in the target on the context effects. Our results show a match effect in latencies and errors that was modulated by the spatial-frequency information in the targets. While there was a strong match effect for low-pass-filtered

Participants

62 students from the University of Münster participated in this experiment, 31 in each filter condition.

Materials

108 pictures of 54 objects served as targets. Images were taken from a large collection of object photographs (Hemera Photo Objects). Objects were photographed without background. All images had a resolution of 75 pixels/in., scaled to fill a 400 × 400 pixel square that measured approx. 9° × 9° visual angle in the experiment. Target pictures were filtered using either a high-pass filter to remove

Acknowledgment

We thank Thomas Hösker for help in conducting the experiment.

References (38)

  • O. Hauk et al.

    Somatotopic representation of action words in human motor and premotor cortex

    Neuron

    (2004)
  • R. Henson

    Neuroimaging studies of priming

    Progress in Neurobiology

    (2003)
  • G. Hirschfeld et al.

    Effects of language comprehension on visual processing — MEG dissociates early perceptual and late N400 effects

    Brain and Language

    (2011)
  • T.F. Jaeger

    Categorical data analysis: away from ANOVAs (transformation or not) and towards logit mixed models

    Journal of Memory and Language

    (2008)
  • A.E. Lincoln et al.

    Hemispheric asymmetries in the perceptual representation of words

    Brain Research

    (2008)
  • B.Z. Mahon et al.

    A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content

    Journal of Physiology - Paris

    (2008)
  • M.L. Noordzij et al.

    Strategic and automatic components in the processing of linguistic spatial relations

    Acta Psychologica

    (2005)
  • R. Abdel Rahman et al.

    Seeing what we know and understand: how knowledge shapes perception

    Psychological Bulletin and Review.

    (2008)
  • V. Aginsky et al.

    How are different properties of a scene encoded in visual memory?

    Visual Cognition.

    (2000)
  • Cited by (12)

    • Recognizing syntactic errors in Chinese and English sentences: Brain electrical activity in Asperger's syndrome

      2013, Research in Autism Spectrum Disorders
      Citation Excerpt :

      The recognition of written speech includes both top-down and bottom-up processes, as indicated by different ERP peaks or fMRI blood oxygen level-dependent (BOLD) responses. It was shown that the left inferior prefrontal cortex (LIPC) is involved in top-down control in language comprehension, and its activation level is correlated with task difficulty in word and sentence recognition (Mishra, 2009; Hirschfeld & Zwitserlood, 2011; Whitney, Grossman, & Kircher, 2009). A fMRI study on brain transfer effects also suggested that language training may modulate brain activity in the fronto-parietal regions involved in the top-down regulation of auditory functions (Elmer, Meyer, Marrama, & Jäncke, 2011).

    • Incidental picture exposure affects later reading: Evidence from the N400

      2012, Brain and Language
      Citation Excerpt :

      Behavioral studies have demonstrated that language comprehenders are sensitive to such subtle shades of meaning. They represent the orientation, shape, and motion direction of objects, even though these are only implied by the text (Hirschfeld & Zwitserlood, 2011; Holt & Beilock, 2006; Stanfield & Zwaan, 2001; Zwaan, Madden, Yaxley, & Aveyard, 2004; Zwaan, Stanfield, & Yaxley, 2002). In an oft-used paradigm, subjects read or hear sentences and are then shown a picture.

    • Perceptual simulation in developing language comprehension

      2011, Journal of Experimental Child Psychology
      Citation Excerpt :

      For example, “The ranger saw the eagle in the sky” can be followed by a picture of a flying eagle (match) or a perched eagle (mismatch). It has been found that participants need more time to verify or name mismatching pictures relative to matching pictures (Dijkstra, Yaxley, Madden, & Zwaan, 2004; Hirschfeld & Zwitserlood, 2010; Holt & Beilock, 2006; Kaup, Yaxley, Madden, Zwaan, & Lüdtke, 2007; Madden & Dijkstra, 2010; Madden & Zwaan, 2006; Stanfield & Zwaan, 2001; Zwaan et al., 2002). A symbolic theory of language comprehension, in which the eagle might be represented as a list of features or a node in a propositional network, cannot easily account for this mismatch effect.

    View all citing articles on Scopus
    View full text