Elsevier

NeuroImage

Volume 24, Issue 3, 1 February 2005, Pages 862-873
NeuroImage

An fMRI study of reward-related probability learning

https://doi.org/10.1016/j.neuroimage.2004.10.002Get rights and content

Abstract

The human striatum has been implicated in processing reward-related information. More recently, activity in the striatum, particularly the caudate nucleus, has been observed when a contingency between behavior and reward exists, suggesting a role for the caudate in reinforcement-based learning. Using a gambling paradigm, in which affective feedback (reward and punishment) followed simple, random guesses on a trial by trial basis, we sought to investigate the role of the caudate nucleus as reward-related learning progressed. Participants were instructed to make a guess regarding the value of a presented card (if the value of the card was higher or lower than 5). They were told that five different cues would be presented prior to making a guess, and that each cue indicated the probability that the card would be high or low. The goal was to learn the contingencies and maximize the reward attained. Accuracy, as measured by participant's choices, improved throughout the experiment for cues that strongly predicted reward, while no change was observed for unpredictable cues. Event-related fMRI revealed that activity in the caudate nucleus was more robust during the early phases of learning, irrespective of contingencies, suggesting involvement of this region during the initial stages of trial and error learning. Further, the reward feedback signal in the caudate nucleus for well-learned cues decreased as learning progressed, suggesting an evolving adaptation of reward feedback expectancy as a behavior–outcome contingency becomes more predictable.

Introduction

Learning what choice is best comes with experience. In order to maximize rewards, an organism will strive to make better choices based on trial and error. Thus, it is imperative that brain mechanisms exist to support early learning of contingencies that will lead to a rewarding outcome. The goal of the present study is to investigate how the human brain behaves during a reward-learning paradigm, specifically the acquisition and progression of reward learning. One structure that has been implicated in processing of reward-related information is the striatum, the input unit of the basal ganglia, and specifically the caudate nucleus, part of the dorsal striatum. The striatum is a component of multiple cortico-striatal loops that are modulated by dopaminergic neurons in the midbrain, which have been shown to increase firing to unexpected rewards and conditioned stimuli that predict a reward. Due to its heterogeneity in terms of function and connectivity, the striatum is in a prime position to integrate cognitive and motivational information and influence goal-directed behavior. The human striatum is therefore a possible key structure in the acquisition of contingencies that lead to a reward.

Previous research has suggested a role for the striatum in processing reward-related information across species. Significant increases in dopamine release in the striatum, for example, have been observed during cocaine self-administration in rats (Di Chiara and Imperato, 1988, Ito et al., 2002). Neurons in the monkey striatum have been shown to respond to the anticipation (Apicella et al., 1992, Kawagoe et al., 1998) and delivery (Apicella et al., 1991, Hikosaka et al., 1989) of rewards. In accordance with animal studies, brain imaging studies of the human striatum have observed activity during the processing of both primary and secondary rewards (Aharon et al., 2001, Berns et al., 2001, Breiter et al., 2001, Delgado et al., 2000, Delgado et al., 2003, Elliott et al., 2004, Kirsch et al., 2003, Knutson et al., 2000, Knutson et al., 2001a, O'Doherty et al., 2002, O'Doherty et al., 2004, Pagnoni et al., 2002). The striatum's response to the anticipation and delivery of rewards and punishments suggests that it may be a key structure in affective learning. Indeed, as argued by Schultz et al. (2003), learning can be viewed as a change in outcome predictions and the acquisition of discriminatory responses to different stimuli may reflect the learning of appropriate behavioral actions.

Although the striatum responds to anticipation and delivery of rewards, the caudate nucleus, a component of the dorsal portion of the striatum, does not seem to respond to the reward per se. Rather, it seems to be more vigorously recruited when an outcome is contingent on an action (Tricomi et al., 2004), suggesting a larger role for reinforcement-based processing, where predictions and feedback help adjust behavior. The plasticity of the striatum allows for such rapid reinforcement of actions as shown in dynamic and efficacious synaptic changes in the rat throughout learning of a procedural task (Jog et al., 1999) and during self-stimulation (Reynolds et al., 2001, Wickens et al., 2003). Thus, the caudate nucleus' unique role in reward processing may be to contribute to the brain's ability to learn though reinforcement.

The caudate nucleus is one of the main regions affected in degenerative disorders such as Parkinson's and Huntington's disease. In accordance with the idea that the caudate is important during feedback-based learning, patients with Parkinson's disease are slower during initial learning of an associative learning task (Myers et al., 2003), as compared to control subjects, and show deficits during a feedback-based learning task, as opposed to intact learning during a nonfeedback version of the same paradigm (Shohamy et al., 2004). Similarly, patients with Huntington's disease have poor performance on a trial and error incidental learning task, a type of learning thought to be dependent on the integrity of the caudate nucleus (Brown et al., 2001).

The striatum, particularly the caudate nucleus, is therefore a structure involved in processing reward-related information and various aspects of learning. Research suggests that the caudate may be an essential component of a brain circuit that allows us to improve our choices through trial and error learning. However, it is unclear whether this observed pattern of results extends from cognitive to more affective learning, where feedback properties are both informative and incentive-laden (representing possible gain or losses). The goal of this experiment was to investigate the role of the human striatum during reward-related contingency learning.

The present study investigated how activity in the caudate nucleus is modulated as reward learning progresses, specifically looking at the early stages of learning, when associations between action and outcome are being formed, and during latter stages, when the well-learned stimulus–responses are performed. We used a gambling paradigm where wins and losses were determined on the basis of guessing, but learning of stimulus–response contingencies could influence future performance. Participants were instructed that different cues, presented prior to making a guess, predicted what type of choice was more likely to lead to a reward. The introduction of a learning component insured that participants had a chance to maximize their rewards based on actual performance, allowing us to investigate how activity in the striatum, particularly the caudate nucleus, is modulated as learning of affective contingencies progresses.

Section snippets

Participants

Seventeen right-handed volunteers participated in this study (9 male, 8 female). Participants responded to posted advertisement (average age: M = 23.29, SD = 3.31), and all participants gave informed consent.

Procedure

The paradigm involved a series of 120 interleaved trials, divided into 10 runs of 12 trials each. Participants were instructed that they would see a card and were asked to guess if the value of such card was higher or lower than the number 5. Each individual trial represented one specific

Behavioral results

Analysis was conducted on all 17 participants to investigate behavioral effects of gender, trial order and overall monetary gain. Participants monetary score was calculated at the end of the session and took into account correct ($1.00 gain), incorrect (−$0.50 loss) and missed trials (−$1.00 loss). Participants scores ranged from $34.50 to $78.00 (M = 58.15, SD = 12.28). Using the monetary score for each participant, we then looked at any effects of trial order (version 1 and version 2) and

Discussion

The goal of this study was to investigate how the human brain processes learning of reward contingencies. Specifically, we investigated brain regions thought to be important during the acquisition of reward associations and their modulation as learning progresses. By using a gambling paradigm (where rewards were attained on the basis of guessing) that contained probabilistic cues (which educated the participant in regards to which choice or guess was more likely to lead to a reward), we were

Acknowledgments

This work was supported by NIMH 62104 to EAP and Center for Brain Imaging, NYU. The authors wish to acknowledge Kate Fissell, Brett Sedgewick and Ben Holmes for technical assistance, Susan Ravizza and Elizabeth Tricomi for informative discussion and constructive criticism. The authors would also like to acknowledge the support of the Beatrice and Samuel A. Seaver Foundation.

References (64)

  • J.P. O'Doherty et al.

    Neural responses during anticipation of a primary taste reward

    Neuron

    (2002)
  • J.P. O'Doherty et al.

    Temporal difference models and reward-related learning in the human brain

    Neuron

    (2003)
  • M.G. Packard et al.

    Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning

    Neurobiol. Learn. Mem.

    (1996)
  • W. Schultz et al.

    Changes in behavior-related neuronal activity in the striatum during learning

    Trends Neurosci.

    (2003)
  • E.M. Tricomi et al.

    Modulation of caudate activity by action contingency

    Neuron

    (2004)
  • N.M. White et al.

    Multiple parallel memory systems in the brain of the rat

    Neurobiol. Learn. Mem.

    (2002)
  • J.R. Wickens et al.

    Neural mechanisms of reward-related motor learning

    Curr. Opin. Neurobiol.

    (2003)
  • C.F. Zink et al.

    Human striatal responses to monetary reward depend on saliency

    Neuron

    (2004)
  • P. Apicella et al.

    Responses to reward in monkey dorsal and ventral striatum

    Exp. Brain Res.

    (1991)
  • P. Apicella et al.

    Neuronal activity in monkey striatum related to the expectation of predictable environmental events

    J. Neurophysiol.

    (1992)
  • A.R. Aron et al.

    Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning

    J. Neurophysiol.

    (2004)
  • G.S. Berns et al.

    Predictability modulates human brain response to reward

    J. Neurosci.

    (2001)
  • R.G. Brown et al.

    Dissociation between intentional and incidental sequence learning in Huntington's disease

    Brain

    (2001)
  • R.M. Carelli et al.

    Loss of lever press-related firing of rat striatal forelimb neurons after repeated sessions in a lever pressing task

    J. Neurosci.

    (1997)
  • M.R. Delgado et al.

    Tracking the hemodynamic responses to reward and punishment in the striatum

    J. Neurophysiol.

    (2000)
  • M.R. Delgado et al.

    Dorsal striatum responses to reward and punishment: effects of valence and magnitude manipulations

    Cognit. Affective Behav. Neurosci.

    (2003)
  • M.R. Delgado et al.

    Motivation-dependent responses in the human caudate nucleus

    Cereb. Cortex

    (2004)
  • G. Di Chiara et al.

    Drugs abused by humans preferentially increase synaptic dopamine concentrations in the mesolimbic system of freely moving rats

    Proc. Natl. Acad. Sci. U. S. A.

    (1988)
  • R. Elliott et al.

    Dissociable neural responses in human reward systems

    J. Neurosci.

    (2000)
  • C.D. Fiorillo et al.

    Discrete coding of reward probability and uncertainty by dopamine neurons

    Science

    (2003)
  • S.D. Forman et al.

    Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold

    Magn. Reson. Med.

    (1995)
  • M.A. Gluck et al.

    How do people solve the “weather prediction” task?: individual variability in strategies for probabilistic category learning

    Learn. Mem.

    (2002)
  • Cited by (0)

    View full text