You are on page 1of 19

HYPOTHESIS AND THEORY ARTICLE

published: 27 November 2012


BEHAVIORAL NEUROSCIENCE doi: 10.3389/fnbeh.2012.00079

Integrating cortico-limbic-basal ganglia architectures for


learning model-based and model-free navigation strategies
Mehdi Khamassi 1,2* and Mark D. Humphries 3,4
1
Institut des Systmes Intelligents et de Robotique, Universit Pierre et Marie Curie, Paris, France
2
Centre National de la Recherche Scientifique, UMR7222, Paris, France
3
Department dEtudes Cognitives, Group for Neural Theory, Ecole Normale Superieure, Paris, France
4
Faculty of Life Sciences, University of Manchester, Manchester, UK

Edited by: Behavior in spatial navigation is often organized into map-based (place-driven) vs. map-free
Matthijs Van Der Meer, University of (cue-driven) strategies; behavior in operant conditioning research is often organized into
Waterloo, Canada
goal-directed vs. habitual strategies. Here we attempt to unify the two. We review one
Reviewed by:
powerful theory for distinct forms of learning during instrumental conditioning, namely
A. David Redish, University of
Minnesota, USA model-based (maintaining a representation of the world) and model-free (reacting to
Aaron Bornstein, New York immediate stimuli) learning algorithms. We extend these lines of argument to propose
University, USA an alternative taxonomy for spatial navigation, showing how various previously identified
Hisham Atallah, Massachusetts
Institute of Technology, USA
strategies can be distinguished as model-based or model-free depending on the
usage of information and not on the type of information (e.g., cue vs. place). We argue that
*Correspondence:
Mehdi Khamassi, UPMC ISIR identifying model-free learning with dorsolateral striatum and model-based learning
UMR 7222, Case courrier 173, with dorsomedial striatum could reconcile numerous conflicting results in the spatial
4 place Jussieu, 75005 Paris, navigation literature. From this perspective, we further propose that the ventral striatum
France.
e-mail: mehdi.khamassi@isir.upmc.fr
plays key roles in the model-building process. We propose that the core of the ventral
striatum is positioned to learn the probability of action selection for every transition
between states of the world. We further review suggestions that the ventral striatal core
and shell are positioned to act as critics contributing to the computation of a reward
prediction error for model-free and model-based systems, respectively.
Keywords: reinforcement learning, habit, stimulus-response, action-outcome, nucleus accumbens

1. INTRODUCTION distinctions map between the two literatures? And what might we
A vast morass of neuroscience data addresses the problem of learn by comparing the two?
how voluntary behavior is underpinned by the anatomical and While some links have been drawn between the approaches
physiological substrates of the forebrain. Principles or frame- of the two literatures (Redish, 1999; Yin et al., 2004, 2008;
works to organize this data are essential. A consensus is growing Khamassi, 2007), their primary theories for the strategies under-
around the potentially useful organizing principle that we can pinning behavior are, we suggest, orthogonal: the conditioning
make a division of the forebrain striatum into three domains literature distinguishes goal-directed and habitual behavior in a
on both anatomical (Joel and Weiner, 1994, 2000; Voorn et al., task, whereas the navigation literature distinguishes place and
2004) and functional (Yin and Knowlton, 2006; Yin et al., 2008; response strategies for solving a task. However, there is mount-
Bornstein and Daw, 2011; Ito and Doya, 2011; van der Meer et al., ing evidence that the place/response distinction is unable to
2012) grounds. From this striatal eye-view we can make sense account for the effects of lesions on navigation behavior. Our
of the wider cortical, hippocampal, amygdala, and basal gan- main hypothesis is that strategies for navigation, similar to strate-
glia networks in which they sit, and the role of these networks gies for instrumental conditioning (Daw et al., 2005), can be
in different forms of voluntary behavior. Both the spatial nav- reconciled as either model-free or model-basedwe define these
igation and instrumental conditioning literatures have adopted terms below. At root, the key distinction is that it is the use
this perspective, recognizing the functional division of striatum of information in building a representation of the world, rather
into dorso-lateral (DLS), dorso-medial (DMS), and ventral stria- than the type of information about the world, that defines the
tum (VS) 1, belonging to different parallel cortico-basal ganglia different computational processes and their substrates in the
loops (Alexander et al., 1990; Middleton and Strick, 2000), with striatum. We argue that explicitly identifying the DLS as a cen-
each striatal domain having established functional roles within tral substrate for model-free learning and expression, and the
those broader behavioral distinctions. How do these functional DMS as a central substrate for model-based learning and expres-
sion (Yin and Knowlton, 2006; Thorn et al., 2010; Bornstein
1 We use VS throughout, rather than nucleus accumbens, to emphasize the con- and Daw, 2011; van der Meer et al., 2012) can help rec-
tiguous nature of the striatum through its dorsolateral to ventro-medial extent oncile numerous conflicting results in the spatial navigation
(Voorn et al., 2004; Humphries and Prescott, 2010). literature.

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 1


Khamassi and Humphries Model-free/model-based navigation strategies

With this hypothesis in hand, we can see how work on spatial guidance, cue-guided, and praxic 2 navigation, and can be
navigation gives us a second hypothesis, useful to understanding further elaborated in the form of a sequence or chaining of
instrumental conditioning. We propose that the VS is a central Stimulus-Response (S-R) associations when new cues result
substratein collaboration with the hippocampusfor a collec- from the previous displacement (OKeefe and Nadel, 1978;
tion of functions that we informally term the model-builder. On Trullier et al., 1997; Arleo and Rondi-Reig, 2007).
the one hand, the core of the VS acting as the locus of actions Place strategies, which rely on a spatial localization process,
necessary to build a model; and on the other hand the shell of and can imply a topological or metric map of the environ-
the VS acting to evaluate predicted and achieved outcomes in the ment (Tolman, 1948)the term map being defined by Gallistel
model. These are clearly not the only roles of the multi-faceted (1990) as a record in the central nervous system of macro-
VS (Humphries and Prescott, 2010); nonetheless, they may prove scopic geometric relations among surfaces in the environment
a further useful organizing principle. used to plan movements through the environment.
With this sketch in mind, we address first the different forms
of behavioral strategies that have separately been identified in the 2.2. SUBSTRATES IN THE STRIATUM
spatial navigation and instrumental conditioning literatures. We This strong strategy distinction has been mapped onto a strong
take a striatal-centric view here as an organizing principle, not distinction in underlying neural systems. It has been found
as a claim that striatal domains are exclusive substrates for dif- that lesions of the hippocampal system impair place strate-
ferent forms of learning and navigation. Each striatal domain is gies while sparing response strategies (Morris, 1981; Packard
one locus in a broader basal ganglia network that computes its et al., 1989; Devan and White, 1999). In contrast, lesions
output using information gathered by the striatum (Houk and of the DLS produce the opposite effect: impairing or reduc-
Wise, 1995; Mink, 1996; Redgrave et al., 1999; Humphries et al., ing the expression of response strategies while sparing place
2006; Leblois et al., 2006; Girard et al., 2008); and each network strategies (Potegal, 1972; Devan and White, 1999; Adams
is in turn one locus in a broader basal ganglia-thalamo-cortical et al., 2001; Packard and Knowlton, 2002; Martel et al.,
loop. Nonetheless, the striatums consistent intrinsic microcir- 2007). Thus, it is common to speak of place and response
cuit across the dorsolateral to ventro-medial axis (Bolam et al., strategies as being, respectively, hippocampus-dependent and
2006), its integration of cortical, thalamic, hippocampal, and hippocampus-independent (White and McDonald, 2002).
amygdala input, and its position as the primary target of the mid- Some theories propose that the hippocampus-dependent sys-
brain dopaminergic system, makes it a natural vantage point from tem expresses its output via the VS (Redish and Touretzky,
which to attempt to unify the disparate strands of navigation and 1997; Albertin et al., 2000; Arleo and Gerstner, 2000; Johnson
conditioning. and Redish, 2007; Penner and Mizumori, 2012). Other studies
have also highlighted a role for the DMS in the hippocampus-
2. STRATEGY DISTINCTIONS IN SPATIAL NAVIGATION dependent system (Whishaw et al., 1987; Devan and White,
2.1. TAXONOMY OF SPATIAL NAVIGATION FORMS 1999; Yin and Knowlton, 2004), by finding that lesions of the
Evidence for different navigation strategies in the rat comes from DMS promote response strategies, implying the loss of place
behavioral studies showing that they are able to rely on differ- strategies. The behavioral strategies are often equated directly
ent information to localize themselves in the environment and with learning systems: that is, separate systems that learn a partic-
to reach a certain location in space (Krech, 1932; Reynolds et al., ular cue-guided and/or place-guided set of strategies for a given
1957; OKeefe and Nadel, 1978). Existing classifications of naviga- environment. However, the simple mapping between VS-DMS
tion strategies (OKeefe and Nadel, 1978; Gallistel, 1990; Trullier vs. DLS onto place vs. response strategies is not consistent with
et al., 1997; Redish, 1999; Franz and Mallot, 2000; Arleo and mounting evidence from lesion studies.
Rondi-Reig, 2007) point out a series of criteria, some of them
overlapping, to differentiate navigation strategies: the type of 2.3. KNOWN PROBLEMS WITH TAXONOMY AND SUBSTRATES
information required (sensory, proprioceptive, internal), the ref- Response strategies are not solely dependent on the DLS. Chang
erence frame (egocentric vs. allocentric), the type of memory at and Gold (2004) reported that DLS-lesioned rats were only
stake (procedural vs. declarative memory) and the time necessary unable to express a response strategy on a T-maze in the absence
to acquire each strategy (place-based strategies generally being of extra-maze cues; in cue-rich conditions the DLS-lesioned rats
more rapidly acquired than cue-guided strategies; Honzik, 1936; did not differ from controls in their ratio of using response
OKeefe and Nadel, 1978; Packard and McGaugh, 1992, 1996; or place strategies. Both Yin and Knowlton (2004) and De
Redish, 1999). Moreover, it has been observed that in normal ani- Leonibus et al. (2011) also found no significant decrease in the
mals, a shift from a place strategy to a response strategy occurs use of response strategies by DLS-lesioned rats running a T-maze.
in the course of training (Packard, 1999). This has led to the Moreover, Botreau and Gisquet-Verrier (2010) not only replicated
proposition of a strong distinction between two main categories this result but also ran a second separate cohort of DLS-lesioned
of strategies: rats to confirm it; further, they showed that the DLS-lesioned
rats using a response strategy were really doing so: they con-
Response strategies, where a reactive behavior results from tinued to use that strategy to solve a new task on the T-maze.
learning direct sensory-motor associations (like heading
toward a visual cue or making an egocentric turn at the cross- 2 praxic normally refers to internally-generated sequences of movement
roads of a maze). This category includes target-approaching, independent of position information.

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 2


Khamassi and Humphries Model-free/model-based navigation strategies

We conclude that the response learning systemincluding cue- this, let us first consider the taxonomy of learning in instrumental
guided and praxic strategiescannot be simply associated with conditioning.
the DLS.
Place strategies are not solely dependent on the DMS. When 3. STRATEGY DISTINCTIONS IN INSTRUMENTAL
learning to navigate to a hidden platform in the Morris water CONDITIONING
maze, rats with DMS lesions were able to learn the platforms 3.1. GOAL-DIRECTED BEHAVIORS vs. HABITS
location just as well as controls or DLS-lesioned rats, as indicated A long line of conditioning research has elaborated two oper-
by their similar escape latencies (Whishaw et al., 1987; Devan ationally defined forms of instrumental behavior in the rat:
and White, 1999); consistent impairmentshown by a lack of goal-directed in which the animal is able to modify its behavior
improvement over trialsonly occurred if the fornix-fimbria 3 in response to changes in outcome and habitual in which the
was cut (Devan and White, 1999). Botreau and Gisquet-Verrier animal does not respond to changes in outcome (it perseveres
(2010) reported that DMS-lesioned rats did not differ from con- with its previous action hence habit) (Dickinson, 1985; Yin
trols or DLS-lesioned rats in their ratio of using response and et al., 2008). This definition is operational because it can only be
place strategies in a probe test in the water-maze. We conclude safely defined in retrospect i.e., after extinction. Experimenters
that the place learning system cannot be simply associated with typically use a test in extinction to discriminate between these
the DMS. two behavioral modes after a reward devaluation or change in
The precise role of VS in particular navigation strategies is contingency between behavior and reward. If during this extinc-
even less clear (see Humphries and Prescott, 2010; Penner and tion test the animal quickly stops producing the now irrelevant
Mizumori, 2012 for recent reviews). VS lesions impair place- conditioned response (e.g., pressing a lever) it is said to be goal-
based learning (Sutherland and Rodriguez, 1989; Ploeger et al., directed; if the animal persists it is said to be habitual (Balleine and
1994; Setlow and McGaugh, 1998; Albertin et al., 2000). For Dickinson, 1998). The inference is then drawn that goal-directed
instance, lesions of the medial shell of the VS impair the rat in animals have access to action-outcome contingencies to guide
learning and recalling the location of sites associated with larger behavioral choice, and that changes in outcome consequently
rewards (Albertin et al., 2000). However, more recent studies change action choice, whereas habitual animals make behavioral
reveal that VS function may not be restricted to place strategies. choices based on S-R pairings (Dickinson, 1985).
For instance, De Leonibus et al. (2005) report that VS lesions
impair the acquisition of both allocentric and egocentric strate- 3.2. SUBSTRATE EVIDENCE FOR DMS GOAL-DIRECTED AND DLS
gies in a task requiring the detection of a spatial change in the HABITUAL ROLES IN LEARNING
configuration of four objects placed in an arena. During the course of a conditioning task animals behavior pro-
The clean distinction between rapidly learnt place strate- gressively shifts from expressing awareness of action-outcome
gies and slowly learnt response strategies is also problematic. contingencies to expressing habits. In particular, after extensive
Several authors have reported rapidly learned response strate- training or overtraining animals behavior is most often habitual
gies (Pych et al., 2005; see Willingham (1998) and Hartley and (Yin et al., 2004). It turns out that this natural progressive shift can
Burgess (2005) for reviews including rodent data). Conversely, be perturbed by lesions of different parts of the striatum, point-
while place strategies have most of the time been found highly ing to a possible double-dissociation between DLS and DMS: the
flexible and more rapidly acquired than response strategies former being required for acquisition and maintenance of habits,
(Packard and McGaugh, 1996), after extensive training place and the latter being required for learning and expression of goal-
strategies can also become inflexible and persist in leading directed behaviors (Balleine, 2005; Yin and Knowlton, 2006; Yin
animals toward the previous goal location after a reversal, as et al., 2008).
if not relying on a cognitive map (Hannesson and Skelton, There is a strong consensus that the dorsolateral striatum is
1998; see also rat behavioral data in a Y-maze described in necessary for habitual behavior: lesions of either the DLS (Yin
Khamassi, 2007). et al., 2004), or disruption of dopamine signaling within it (Faure
These data suggest that the simple distinction between place et al., 2005), prevent habit formation in extinction. Animals
vs. response strategies might be too broad to explain the dif- with such lesions thus appear to maintain goal-directed behav-
ferent roles of VS-DMS vs. DLS in navigation. Several authors ior throughout a task. Correspondingly, there is a re-organization
have highlighted that this classification of navigation strategies of the DLS single neuron activity during habit formation (Barnes
lends too much importance to the type of information involved et al., 2005; Tang et al., 2007; Kimchi et al., 2009). Consequently,
(i.e., place vs. cue) and thus to the spatial localization process the dorsolateral striatum has been proposed as central to the
(Trullier et al., 1997; Sutherland and Hamilton, 2004). We suggest learning of habits (Yin and Knowlton, 2006; Yin et al., 2008).
that considering the type of learning involvedand measurable There is a strong consensus that the dorsomedial striatum is
in terms of behavioral flexibilitymight better account for the necessary for goal-directed behavior: lesions of the DMS (Yin
specific involvement of VS, DMS, or DLS in navigation. To see et al., 2005b), or blockade of NMDA receptors within it (Yin
et al., 2005a), putatively preventing synaptic plasticity, prevent
3 Thisfiber pathway brings hippocampal information to the VS, but is also
sensitivity to devaluation or contingency changes in extinction.
the source of brainstem inputs to the hippocampus, so may disrupt either Animals with such lesions thus appear to obtain habitual behav-
transmission of place information by hippocampus or the encoding of place ior from the outset. Correspondingly, there is a re-organization of
in hippocampus. the DMS single neuron activity after changes in action-outcome

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 3


Khamassi and Humphries Model-free/model-based navigation strategies

contingencies (Kimchi and Laubach, 2009; Kimchi et al., 2009). influence of reward values on these two processes during learning.
Consequently, the dorsomedial striatum has been proposed as Computational work has brought great advances in formalizing
central to goal-directed learning (Yin and Knowlton, 2006; Yin the differences between these learning processes.
et al., 2008).
A caveat is that the anterior part of DMS (aDMS) may escape 3.4. MODEL-BASED vs. MODEL-FREE LEARNING PROCESSES
from this functional scheme. To our knowledge, only the pos- Machine-learning research into formal algorithms for reinforce-
terior DMS (pDMS) has been clearly shown as involved in the ment learning has developed a basic distinction between two
acquisition of goal-directed behaviors (Yin et al., 2005b) and forms of such algorithms. Common to both is the idea that we
in place-based navigation (Yin and Knowlton, 2004). Lesions of can represent the world as a set of states S, that the agent could
aDMS do not affect either of these processes. They even increase take one of a set of actions A in each state (including no action
the number of rats classified as place-responders both during at all), and that the outcome of taking action a in state s is the
initial and late phases of learning (Yin and Knowlton, 2004), next state s and a possible reward r (Sutton and Barto, 1998).
and seem to increase the sensitivity to contingency degradation Distinguishing the two is whether or not the dependencies in the
(compared to sham-lesioned rats) (Yin et al., 2005b). Ragozzino world representation are explicitly modeled (Figure 1).
and Choi (2004) showed that inactivating aDMS does not affect In the model-free forms of algorithm, each state has associ-
learning of a T-maze task or acquisition of a place strategy; but ated with it a distribution of the values of each possible action,
inactivation during reversal learning did affect performance, thus learnt iteratively using a prediction error to minimize the dif-
suggesting that aDMS is involved in switching between strate- ference between the values of actions in consecutive states. This
gies, not in learning per se. Contrary to these data, Moussa et al. set includes most well-known forms of reinforcement learn-
(2011) showed that a rats impairment in learning an alternating- ing algorithmsincluding Temporal Difference (TD) learning,
arm T-maze task correlated with volume of DMS damage, not Actor-Critic, and Q-Learning. Each state thus has an associated
with the location of the lesion. Nonetheless, it remains possi- distribution of cached action-values Q(s, a) over all available
ble that the aDMS is not part of the goal-directed or habitual actions. The action to execute is then simply chosen based on this
systems. cached value distribution. Such behavior is called reactive in that
it is state-drivene.g., stimulus-drivenand does not rely on the
3.3. THE VENTRAL STRIATUM IN CONDITIONING inference of possible outcomes of the action.
While dorsal parts of the striatum are important for the expres- In the model-based forms of algorithm, direct use is made
sion of learned S-R contingencies, their acquisition may require of the state information about the world. With each state s is
intact VS (Atallah et al., 2007). The VS is indeed located at a still associated a reward r, each action is still assigned a value
crossroads between limbic and motor structures which places Q(s, a), and action selection is based on those values. However,
it in a privileged position to integrate reward, motivation, and model-based algorithms explicitly store the state transitions after
action (Mogenson et al., 1980; Groenewegen et al., 1996). In the each action: they can then simulate off-line the consequence
instrumental conditioning literature, the VS is also considered of action choices on transitions between states before choosing
particularly important for Pavlovian influences over voluntary the next action appropriately (Sutton and Barto, 1998; Johnson
behavior (Balleine and Killcross, 1994; Dayan and Balleine, 2002; and Redish, 2005). Thus in this case the agent will infer pos-
Yin et al., 2008; van der Meer and Redish, 2011). It has been sible future outcomes of its decisions before acting. In simple
attributed roles as both a locus of Pavlovian conditioning decision-making tasks in which each action leads to a different
learning to associate outcomes to different stimuli or statesand state, such a process is naturally captured by a branching decision
the locus of Pavlovian-instrumental transferthe use of those tree (Figure 1); in more natural situations states may be re-visited
learnt stimulus-outcome associations to motivate the learning during ongoing behavior, and thus the transitions between states
and expression of instrumental actions in the presence of those may have periodic structure. Sophisticated model-based algo-
stimuli (Yin et al., 2008). Further, while the functional subdivi- rithms explicitly compute a separate transition matrix T(s , a, s)
sion of VS into core and shell might be oversimplified (Heimer for the probability of ending up in each next state s , given the
et al., 1997; Ikemoto, 2002; Voorn et al., 2004; Humphries and current state s and each possible action choice a in A (Daw et al.,
Prescott, 2010), it may account for distinct influences of reward 2005, 2011; Glascher et al., 2010).
values on habitual performance and goal-directed behavior, Daw et al. (2005) proposed the formal mapping that goal-
respectively. For instance, Corbit and Balleine (2011) found that directed behavior results from model-based learning and that
shell lesions impair outcome-specific [putatively goal-directed habitual behavior results from model-free learning 4. They fur-
as noted by Bornstein and Daw (2011)] Pavlovian-instrumental ther proposed that both learning systems operate in parallel, with
transfer while core lesions impair general (putatively habitual)
4 They used a model-based algorithm that explicitly computed the transi-
Pavlovian-instrumental transfer.
These data suggest that the differences in the learning pro- tion matrix. It seems feasible that simpler model-based algorithms, without
cess controlling the progressive influence of rewards on actions explicit computation of the transition matrix, could also equally account for
the sensitivity to devaluation and contingency changes in goal-directed learn-
may determine the functional roles of striatal domains in var- ing, as their repeated internal simulation after such outcome manipulations
ious behavioral strategies: DLS being involved in learning and would result in more rapid changes in overt behavior. To our knowledge, no
expression of habitual behaviors; DMS being involved in learn- one has examined the possibility. Intriguingly, Johnson and Redish (2005)
ing and expression of goal-directed behaviors; VS controlling the showed that such an internal-simulation model, emulating hippocampal

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 4


Khamassi and Humphries Model-free/model-based navigation strategies

FIGURE 1 | Model-based and model-free learning and controllers. updated values in the current state. A model-free controller (right) vastly
Model-based and model-free controllers represent the world as a set of reduces the representational and computational demands by essentially
states S1 . . . Sm and actions A1 . . . AN within those states. They learn the externalizing the world-model. Sensory information specifies the state at
values of each action in a given state, here indicated by the thickness of each time t1 ; an action is chosen based on its current value. Updated sensory
circle, based on available rewards R. What distinguishes them is their information resulting from that action then specifies the state at time t2 .
representation of the links between those states. A model-based controller Learning is then based on the prediction error between expected and
(centre) also represents the transitions between states and the action(s) that resulting values of the action taken at t1 . A model-free controller can also be
cause the transition (indicated by the multiple arrows). For a known current trained by a model-based controller, and thus represent an abstraction of that
state, specified by current sensory information, the model can be traversed model. Irrespective of whether model-free or model-based, a common set of
to find the likely outcome of simulated actions in each stateone such information needs to be learnt to construct and use the controller (left) to
trajectory is given by the orange arrows. Each trajectory can then be used to specify the set of current relevant states in the world; to learn actions
update the predicted value of each action. Finally, after a number of available within them and the transitions those actions cause; and to learn
trajectories through the model, an overt action is selected based on their the reward functionwhich state(s) contain reward(s).

the system chosen for current behavioral control based on hav- 2011; Ito and Doya, 2011). Thus, as DLS is central to the
ing the least uncertainty in its prediction of the outcome. Using habit-learning system, so, by extension, it is considered central
stylized examples of simple conditioning tasks, they showed how to the model-free learning system in instrumental conditioning
this mapping can explain the sensitivity to devaluation and con- (Daw et al., 2005). Similarly, as DMS is central to the goal-
tingency degradation in extinction early in training when the directed system, it is thus natural to propose that DMS is central
model-based controller is dominant, and how that sensitivity is to the model-based learning system in instrumental conditioning
lost when the model-free controller becomes dominant with over- (Bornstein and Daw, 2011).
training. The underlying explanation is that the model-based
controller directly represents action-outcome contingencies, and 4. UNIFICATION: NAVIGATION STRATEGIES ARE
is thus able to quickly propagate changes in reward through the MODEL-FREE OR MODEL-BASED
world-model; by contrast, the model-free controller, while able Superficially, the model-free/model-based dichotomy strongly
to reduce the uncertainty in its predictions with over-training, resembles the dichotomous taxonomy defined in the spatial
requires further extensive training for the change in reward to navigation literature between flexible map-based place strate-
propagate through the independent state-action representations. gies and automatic map-free response strategies. However, the
This formal mapping onto computational substrates has proven two approaches are orthogonal: one is defined by information
a very useful and fruitful guide to the understanding of these use in a world representation (model-free/based), the other by
operationally-defined forms of behavior and their inferred learn- information type (place/cue).
ing systems (Ito and Doya, 2011; Bornstein and Daw, 2011; Our hypothesis is that we may similarly distinguish model-
van der Meer et al., 2012). free and model-based navigation strategies by their use of
This computational mapping is also assumed to follow the information (Figure 2), no matter if the state is represented
same substrate mapping (Daw et al., 2005; Bornstein and Daw, by a spatial location or a visual stimulus. Within these two
top-level strategies, we may further differentiate strategies
replay of previous trajectories through a maze, could indeed reduce the onset defined by their reference frame and modality of processed
of habit-like stereotypy in the paths taken through the maze. stimuli:

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 5


Khamassi and Humphries Model-free/model-based navigation strategies

such a strategy produces inflexible behavior because it needs to re-


A Acon selecon process learn sequences of place-response associations in case of a change
Inexible, slow to acquire Flexible, rapidly learned in goal location. This type of learning was prominent in early
(S-R associaons) (cognive graph)
models of hippocampus-dependent navigation (Burgess et al.,
Strategy Place Place strategies
1994; Brown and Sharp, 1995; Arleo and Gerstner, 2000; Foster
dimension Cue Response strategies
et al., 2000).
Following the same DLS vs. DMS double-dissociation logic as
B
was used for goal-directed and habitual learning then, if DMS
Acon selecon process
is the substrate for place strategies, lesions of the DMS should
Inexible, slow to acquire Flexible, rapidly learned
(model-free) (model-based)
impair place strategies and lesions of the DLS should not affect
Place strategies Place strategies
them. However, there is evidence against this dissociation and
Place indirect evidence in favor of a place strategy supported by DLS.
Strategy (PRTR) (map-based)
dimension Response strategies Response strategies Lesions of the DMS slow but do not prevent the learning of a
Cue
(habitual) (goal-directed) hidden platform in a water maze, which putatively requires a
place-based strategy (Devan and White, 1999). More compelling,
FIGURE 2 | New taxonomy of navigation strategies based on Botreau and Gisquet-Verrier (2010) tested control, DLS-lesioned,
model-based/model-free reinforcement learning. (A) Previous and DMS-lesioned rats learning a hidden platform water maze
taxonomies highlight the distinction between flexible rapidly acquired
task; after learning, a probe trial was used where the rats were
map-based strategies and inflexible slowly acquired S-R strategies.
(B) New taxonomy highlighting model-free and model-based place
started in a different location for the first time: they found that
strategies as well as model-free and model-based response strategies. rats were divided into the same ratio of place and response
PRTR, place-recognition triggered response strategies as classified by groups on the probe trial irrespective of whether they were con-
Trullier et al. (1997). trol, DLS-lesioned, or DMS-lesioned rats. Recently, Jacobson et al.
(2012) tested rats on an alternating strategy plus-maze, which
required the use of either a response-based or place-based strat-
egy on each trial as signaled by an extra-maze cue: they found
egocentric reference frame, relying on idiothetic (praxic), or that post-training DLS lesions impaired use of both the response
allothetic (cue-guided) stimuli; and place strategies. Thus, there is evidence that intact DLS is
allocentric reference frame, relying on idiothetic and/or allo- important for using place strategies.
thetic stimuli (places).
4.2. DMS AND (MODEL-BASED) RESPONSE STRATEGIES
Our hypothesis thus naturally extends to proposals for the striatal The proposal of a model-based response strategy is just the claim
substrates of model-free and model-based strategies in naviga- that we can conceive of states in a spatial navigation task as
tion: that the DLS is central to the model-free navigation system being defined by the position of intra- or extra-maze cues rel-
and DMS is central to the model-based navigation system. ative to the animal. In such a model, different states would not
This combined conceptual (model-free vs. model-based) and necessarily correspond to different spatial position. Rather, we
substrate (DLS vs DMS) hypothesis raises four implications that can conceive of an example task where distinct states s1 and s2
each explain some troubling or inconsistent data for the place vs. correspond to the same spatial location and differ on whether
response dichotomy in navigation. First, that we can conceive of a a light is turned on or off. Then a model-based system can
model-free strategy based on place information alone supported learn the transitions between these states and search the model
by the DLS. Second, that, correspondingly, we can conceive of a to proceed with action selectione.g., reward may be delivered
model-based response strategy based on cues alone supported only when the light is on. Thus, whereas others have explic-
by the DMS. Third, that, following the model-based/model-free itly identified a response strategye.g., a strategy guided by the
mapping in conditioning (Daw et al., 2005), model-based and lightwith habitual behavior (e.g., Yin and Knowlton, 2004), we
model-free control of navigation could be distinguished behav- are proposing that the two are orthogonal.
iorally by whether or not the animal reacts to changes in the Again we may follow the same double-dissociation logic: if
value or contingencies of rewards, and by lesions to the DLS and DLS is the sole substrate for response strategies, then lesions of
DMS. Fourth, that both place and cue information should be the DLS should impair response strategies and lesions of the DMS
available to both the model-based and model-free navigation sys- should not affect them. There is evidence against this dissoci-
tems, and thus should be detectable within both the DMS and ation, and in favor of DMS involvement in response-strategies.
DLS. We consider each of these in turn, then discuss the key As noted in section 2.3, lesions of the DLS do not impair the
role of the hippocampal formation as the likely source of state use of response strategies on probe trials, suggesting that intact
information. DMS is sufficient to support the use of response strategies (Chang
and Gold, 2004; Yin and Knowlton, 2004; Botreau and Gisquet-
4.1. DLS AND (MODEL-FREE) PLACE STRATEGIES Verrier, 2010; De Leonibus et al., 2011). Chang and Gold (2004)
Model-free navigation strategies based on place information further reported that the DLS lesions only effectively impaired the
alone have been called Place-Recognition Triggered Response use of response strategies when there were no extra-maze cues.
(PRTR) strategies by Trullier et al. (1997) who emphasized that This suggests that model-based (and putatively DMS-based) use

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 6


Khamassi and Humphries Model-free/model-based navigation strategies

of cues was sufficient to maintain a response strategy in the cue- results (Devan and White, 1999; Yin and Knowlton, 2004).
rich conditions; but that a model-free (and putatively DLS-based) However, the response strategy sub-group for both early and
praxic response strategy was necessary in the cue-deficient condi- late were then split, with approximately half receiving a devalu-
tions (that is, in the absence of sufficient cues, learning a sequence ation regime for the food reward in the maze. On the subsequent
of turns was required). second probe trial, only the early group showed awareness of
Moussa et al. (2011) tested the effects of DLS and DMS lesions the devaluation, through a significant drop in their use of a
on the ability of rats to learn a return-arm T-maze in which the response strategy (Figure 3D). There was no change in the use
rats were required to alternate their choice of visited arm (left or of response strategy by the devalued late group (Figure 3G).
right) to obtain reward, but were free to run at their own pace. Thus, while both early and late groups of rats preferentially
The task is a seemingly simple response strategy but requires a used a response strategy, only the early group modified use of
minimal model to achieve rewards above chance level. At the that strategy after change in the value of reward, evidence of a dis-
choice point of the T-maze, a model-free learning system would tinction between a model-based and model-free form of response
assign equal value to turning left or turning right as both would strategy.
be rewarded on (approximately) half the visits. To achieve better, De Leonibus et al. (2011) then separately tested the effects
a minimal model would be needed to at least link the previ- of pre-training sham and DMS lesions on a new early group,
ous choice of arm to the current choice, chaining at least two and of pre-training sham and DLS lesions on a new late group.
(state, action) pairs in a loopwhich corresponds to a model- They found that the DMS lesion prevented the devaluation from
based process. Moussa et al. (2011) found that DMS lesions, and changing the proportion of early group rats using a response
not DLS lesions, impaired learning of this task irrespective of strategy (Figure 3E). This is consistent with the loss of DMS pre-
the amount of training. Their data thus suggest a model-based venting value updates from propagating through the model-based
response strategy role for DMS. system. Conversely, they found that the DLS lesion now permit-
ted the devaluation to change the proportion of late group rats
4.3. VALUE-SENSITIVITY IN NAVIGATION AND ITS ALTERATION BY using a response strategy (Figure 3H). This is consistent with the
DMS BUT NOT DLS LESIONS loss of DLS preventing transfer to the model-free system, and
If the prediction of Daw et al. (2005) is correct, then model-based subsequently value updates continued to propagate through the
and model-free control of action can be distinguished behav- model-based system. Together, these results support the double
iorally by whether or not the animal reacts to changes in the value dissociation of DMS as part of a model-based and DLS as part of
or contingencies of rewards. Thus, under our hypothesis, such a model-free system for navigation.
sensitivity to value or contingency changes in spatial navigation Moussa et al. (2011) found results consistent with this pic-
should be reflected in both place and response strategies if using ture from rats tested in extinction on a navigation task. As noted
a model-based controller and in neither place nor response strat- above, they tested rats on an alternating arm T-maze task, thus
egy if using a model-free controller. Similar to the goal-directed to requiring rats to maintain a memory of the previously visited
habitual transfer observed in instrumental conditioning (Yin and arm. As the rats ran at their own pace, Moussa et al. (2011) were
Knowlton, 2006), we might expect that this outcome sensitivity unusually also able to test the effects of extinction on navigation
would disappear with over-training on a sufficiently determinis- tasks by leaving the arms unbaited in the final 10-min session.
tic task, reflecting the transfer from a model-based to a model-free They found that control rats did decrease their laps of the maze
controller for navigation. Also similarly, our hypothesis is that this over the 10-min period, so that extinction effects were detectable.
transfer is from the DMS to the DLS-based systems; so lesions Moreover, though DLS lesions had no effect on learning the task,
to those systems should differentially affect how changes in value they did lead to significantly faster extinction of maze running.
subsequently change behavior. These data are thus consistent with lesions of DLS removing the
Whereas above we reviewed evidence in favor of their breaking putative model-free navigation substrate, thus leaving intact the
the place vs response dichotomy, here we consider evidence more putative model-based substrate in DMS that was subsequently
directly in favor of the association of DMS with a model-based faster to respond to the outcome devaluation.
system and DLS with a model-free system. De Leonibus et al.
(2011) recently provided intriguing evidence from devaluation in 4.4. PLACE AND CUE INFORMATION IS AVAILABLE TO BOTH
favor of both (1) the existence of model-based and model-free MODEL-BASED AND MODEL-FREE SYSTEMS
response strategies and (2) their dissociable modulation by DMS If the DLS and DMS are indeed, respectively, substrates for
and DLS lesions. Further, Moussa et al. (2011) provided evidence model-free and model-based navigation systems, and not the
from extinction during navigation for both. We consider these response and place systems, then cue- and place-based correlates
studies in turn. of movement should appear in the activity of both.
Figures 3A,B outlines De Leonibus et al. (2011) dual-solution DLS activity is consistent with the development of cue-based
plus-maze task and experimental design. Key to the design was correlates of movement. Jog et al. (1999) showed that develop-
separately training early and late groups of rats for, respec- ing DLS activity over the course of a T-maze task stabilized to
tively, 26 and 61 days before the first probe trial, which established just the start and end positions in the maze once the rats had
the strategy they were using to locate the reward (Figure 3B). reached operationally habitual behavior. van der Meer et al.
Both early and late groups preferentially used the response (2010) showed that decoding of position information from dorsal
strategy on the first probe trial (Figures 3C,F), replicating earlier striatal activity consistently improved over experience, and that its

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 7


Khamassi and Humphries Model-free/model-based navigation strategies

FIGURE 3 | Evidence for model-based and model-free navigation in data received a saline injection. The devaluation group developed a taste aversion
reported by De Leonibus et al. (2011). (A) Dual-solution plus-maze task to the pellets, but no reduction in completed trials (De Leonibus et al., 2011).
used by De Leonibus et al. (2011). On training trials, rats always start from (CE): data from early group; (FH) data from the late group.
the same arm (south) and have to learn the location of the reward in a (C) Proportion of early group rats using each strategy on first probe trial.
consistently baited arm (e.g., east). After training, a probe trial starting in the (D) From the second probe trial, the proportion of rats continuing to use a
opposite arm is used to ascertain the rats strategy for locating the reward response strategy after devaluation compared to controls. (E) From the
(a food pellet): a response strategy based on direction of turn, or a place second probe trial, the proportion of rats continuing to use a response
strategy based on location of reward with respect to extra-maze cues. strategy after devaluation and pre-training DMS lesion, compared to controls
(B) The experimental design of De Leonibus et al. (2011). Rats were in two for both. (F) Proportion of late group rats using each strategy on first probe
broad categories, designated early and late with respect to the first trial. (G) From the second probe trial, the proportion of late group rats
probe trial (day 27 or day 62). All response rats from that trial were taken continuing to use a response strategy after devaluation, compared to
forward to the second stage, and split approximately evenly into devaluation controls. (H) From the second probe trial, the proportion of rats continuing to
and control (value) groups. Both groups had free access to food pellet reward use a response strategy after devaluation and pre-training DLS lesion,
for 15 min immediately after training for each of five days; the devaluation compared to controls for both. An indicates a significant difference of at
group received an injection of LiCl immediately afterwards, the control group least p < 0.05see De Leonibus et al. (2011) for details.

activity peaked only at choice points in the maze, consistent with of place-based strategies (Yin and Knowlton, 2004) as does loss of
a slow learning model-free system that learnt to associate differ- dopamine from that region (Lex et al., 2011). Its input from the
entiable intra-maze states with actions (Graybiel, 1998; Yin and prefrontal cortex (PFC), particularly medial PFC which receives
Knowlton, 2006). DLS activity is also selectively correlated with considerable direct input from the CA1 place cells, is one of the
position: Schmitzer-Torbert and Redish (2008) found that dorso- most likely sources of place information; there is clear evidence
lateral striatal electrophysiological activity correlated with place that medial PFC supports place representation [e.g., Hok et al.
when the task required knowledge of spatial relationships, but no (2005)]. Nonetheless, there is also evidence for DMS receipt of
correlation when the task was non-spatial. cue-information. Devan and White (1999) reported that asym-
DMS is clearly in receipt of place information in that activity metric lesions (unilateral hippocampus and contralateral DMS)
is correlated with actions or rewards in particular locations, but produced mild retardation of acquisition of both cue-based and
not correlated with the location alone (Wiener, 1993; Berke et al., place-based learning. Correspondingly, recording studies report
2009). Furthermore, lesions of posterior DMS prevent execution that the largest changes in DMS neural activity occur in the

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 8


Khamassi and Humphries Model-free/model-based navigation strategies

middle stages of learning during cue-guided (both with auditory Hasselmo, 2005; Doll et al., 2010; Martinet et al., 2011). It is also
and tactile cues) navigation (Thorn et al., 2010). similar to points made by Redish and Touretzky (1998) that one
can both store sequences and do location-recall in hippocampal
4.5. HIPPOCAMPAL INPUT TO MODEL-BASED AND MODEL-FREE attractor networks without interfering with each other (see also
SYSTEMS Redish, 1999).
For spatial navigation the primary candidate for generating the Consequently, lesions of the hippocampus should affect both
states and the relationship between them is the hippocampal for- model-free and model-based systems through loss of spatial
mation. Although hippocampus has been largely associated with information, but transient interference with its activity should
spatial encoding (OKeefe and Nadel, 1978), it could be more affect only the model-based system through loss of the use of the
broadly involved in learning (and planning in) a model or graph model. Figure 4 illustrates how our proposition may account for
of possible transitions between states, no matter if these states the recent results obtained by Jadhav et al. (2012). In this study,
are spatial or not (van der Meer et al., 2012). Consistent with rats experienced a W-track spatial alternation task: they alternated
this, hippocampal place cells are also sensitive to non-spatial between inbound trials where they had to go to the center start-
information (e.g., the presence of a certain object or the color ing from either the left or the right arm and outbound trials
of the walls), this non-spatial information modulating or re- where they had to go from the central arm to the arm (left or
mapping the place representation (Wiener et al., 1989; Redish, right) that they did not visit on the previous trial (Figure 4A).
1999). Similarly, hippocampal place cells re-map on maze tasks Outbound trials present a higher degree of difficulty in that they
following a change of context, such as the change of rewarded require linking past experiencethe previously experienced side
arm in a plus-maze (Smith and Mizumori, 2006). Thus, within of the mazewith current location in order to make an appropri-
our proposal, the role of the hippocampus would be both to sup- ate decision. Strikingly, lesion of the hippocampus impaired both
ply spatial information to a model-free system and to contribute inbound and outbound learning (Kim and Frank, 2009) while
to a model-based system by building the modelin interaction disruption of awake hippocampal replay only impaired outbound
with the VS as argued laterand planning actions within this learning (Jadhav et al., 2012).
model. This view is similar to ideas that the hippocampus pro- We show on Figure 4B (resp. C) how a model-free (resp.
vides contextual information to some aspects of learning such as model-based) system dependent on hippocampal input could
contextual fear conditioning (Rudy, 2009) and spatial planning explain the results. A model-free system learning the association
information to other aspects of learning (Banquet et al., 2005; between a spatial state (i.e., left arm, right arm, or central arm)

FIGURE 4 | Model-based/model-free framework applied to a spatial outbound learning (Jadhav et al., 2012). (B) A model-free system associating
alternation task requiring both inbound and outbound learning. places with actions can learn inbound trials but would face high uncertainty
(A) W-shaped maze experienced by rats, adapted from Kim and Frank (2009). during outbound trials. (C) A model-based system associating previous
Hippocampal lesions impair both inbound and outbound learning (Kim and transitions with actions can associate past experience with current location
Frank, 2009) while disruption of awake hippocampal replay only impairs and is thus able to learn both inbound and outbound trials.

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 9


Khamassi and Humphries Model-free/model-based navigation strategies

and an action would be able to learn inbound trials but not Bornstein and Daw, 2011), the literature on core involvement
outbound trials. This is because the center state is half of the in navigation clearly points to a major role in the direct con-
time followed by rewarded trials on the left and half of the trol of locomotion. For the shell of the VS, we discuss further
time followed by rewarded trials on the right, thus producing a the suggestion that it is a key locus of the critic that signals the
situation with high uncertainty. In contrast, a model-based sys- reward prediction error for the model-based system (Bornstein
tem learning to associate previous state transitions with actions and Daw, 2011) 5; we also discuss the possibility that it acts
can solve both inbound and outbound trials (Figure 4C). Thus, as a critic that signals a state prediction error in the predicted
within our proposal, hippocampal lesions impair both inbound and actual state transitions. As these functions of the core and
and outbound learning because they suppress spatial information shell are essential for correct assemblage of the model of
required by both place-based model-free and model-based sys- the world, we informally label the VS as part of the model-
tems. By contrast, disruption of hippocampal awake replay would builder.
impair only the model-based system, potentially by blocking the
storage of transitions in the model (Gupta et al., 2010), sparing 5.1. VENTRAL STRIATUM AS SUBSTRATE FOR BUILDING THE REWARD
the model-free system to still learn inbound trials. FUNCTION
In the machine learning literature, one of the requirements for
5. VENTRAL STRIATUMMODEL BUILDER? model-based algorithms is to build the so-called reward func-
What, then, might be the role of the VS in model-free and model- tion which relates states to rewards [see Figure 1; (Sutton and
based navigation? Ventral striatal recordings and lesion studies Barto, 1998)]. In spatial tasks, this consists of memorizing the
have provided strong evidence for an evaluative role, either as places in which reward is found. This is crucial information
part of the critic contributing to the calculation of the reward for deliberative decision-making where inference of future out-
prediction error (ODoherty et al., 2004; Khamassi et al., 2008), comes within the estimated world modele.g., the tree-search
or as the locus for general Pavlovian-instrumental transfer where processrequires reaching a terminal state where a reward can
rewarded stimuli act to motivate future action (Corbit et al., 2001; be found. The reward function is also important for off-line
Yin et al., 2008; Corbit and Balleine, 2011). The actor/critic archi- simulations within the world model to consolidate trajectories
tecture is a variant of the model-free reinforcement algorithms, leading to rewardsee for instance the DynaQ algorithm (Sutton
which conceptually splits the value learning and action selection and Barto, 1998). Indeed, such mental simulations should be
components (Sutton and Barto, 1998): the critic learns the value informed when the agent has virtually reached a state contain-
of every state, and uses those values to compute the reward pre- ing a reward, although the agent is not necessarily physically
diction error after each state transition s to s , given any reward experiencing such reward.
obtained; the prediction error is used by the actor to change Interestingly, sequences of hippocampal place cell activations
the probability of selecting each action in state s, thus reflecting that occur while an animal is running a track in search for reward
the outcome. The existing evidence that dorsal striatum supports are known to be replayed during subsequent sleep (Euston et al.,
action selection while the VS supports stimulus-outcome asso- 2007) or during awake resting periods (Foster and Wilson, 2006;
ciation has led to proposals that they respectively subserve the Gupta et al., 2010). These replay events have been hypothesized to
actor and critic roles (Joel et al., 2002; ODoherty et al., 2004; participate in the consolidation of relevant behavioral sequences
Khamassi et al., 2005, 2008; Daw et al., 2011; van der Meer and that lead to reward. Of particular interest for this review are recent
Redish, 2011). The primary candidate for transmitting the reward reports of off-line synchronous replay between ventral striatal and
prediction error is the phasic activity of the midbrain dopamine hippocampal activity (Lansink et al., 2009). Lansink et al. (2009)
neurons (Schultz et al., 1997; Bayer and Glimcher, 2005; Cohen found pairs of hippocampusVS neurons that were reactivated
et al., 2012); further strengthening the proposed identification of during awake fast forward replay preferentially if: the hippocam-
the VS with the critic is that it is the major source of inputs to pal cell coded for space, the ventral striatal cell coded for reward,
the dopamine neurons (Watabe-Uchida et al., 2012) that in turn and the hippocampal cell was activated slightly before the ventral
project to the dorsal striatum (Maurin et al., 1999; Haber et al., striatal cell during the task. The reactivation occurred 10 times
2000) (see Figure 6). faster than the sequence of activity during the task execution, pos-
We sketch an account here that finesses this view, extending sibly complying with physiologically plausible eligibility timing.
previous proposals (Yin et al., 2008; Bornstein and Daw, 2011) The ventral striatal cells were predominantly in the corebut
for separately considering the core and shell. We first argue that also included the shell. By illustrating possible neural mecha-
in addition to being useful for the critic in model-free pro- nisms for the off-line consolidation of place-reward associations,
cesses, reward information encoded by the VS also contributes these results provide striking examples of activity that could
to model-based processes such as the building of a reward func- underly the building of the reward function, which relates states
tion. Second, from the perspective of navigation tasks, we find to rewards.
evidence that the core of the VS is a key locus for learning the 5 This relates to the notion, in the machine learning literature, that some
correct sequences of actions in a task. A useful consequence of
model-based algorithms such as Dyna-Q can update their state-action values
considering this proposed model-based/model-free dichotomy in through a reward prediction error (RPE), although other model-based algo-
both conditioning and navigation is that, whereas the core of rithms based on so-called value iteration processes do not rely on a RPE: they
the VS is often ascribed a purely evaluative role in the con- instead propagate value information from each state to other proximal states
ditioning literature (Yin and Knowlton, 2006; Yin et al., 2008; (Sutton and Barto, 1998).

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 10


Khamassi and Humphries Model-free/model-based navigation strategies

Of course, it is plausible that such replay events could at the separate populations of core neurons that either project to the
same time be used to update value estimations and action proba- dopaminergic neurons of the midbrain or project to the other
bilities in the model-free system, consistent with the hypothesized structures of the basal ganglia could, respectively, fulfill the eval-
critic role of part of the VS (ODoherty et al., 2004; Khamassi uative and motor control roles (Humphries and Prescott, 2010).
et al., 2008; Bornstein and Daw, 2011). But if the ventral stri- Here we focus on how the latter role may fit into a putative model-
atal part engaged during these replay events was only dedicated to based/model-free separation of navigation based on the dorsal
model-free reinforcement learning, all ventral striatal cells encod- striatum.
ing reward predictions in any locationnot only in the reward It has long been known that core application of NMDA,
locationshould be reactivated in correspondence with the hip- AMPA, or dopamine agonists, or of drugs of abuse
pocampal cells coding for their associated states, which is not (amphetamine, cocaine), induces hyperlocomotion in rats,
the case here. These results thus emphasize that the VSs eval- and that intact output of the core through the basal ganglia is
uative role and its involvement in encoding reward information necessary for this hyperlocomotion to occur (Pennartz et al.,
may also contribute to model-based processes. In support of this 1994; Humphries and Prescott, 2010). The phasic activity of indi-
view, McDannald et al. (2011) recently showed in rats experi- vidual core neurons also correlates with the onset of locomotion
encing an unblocking procedure that VS not only incorporates during self-administration of cocaine (Peoples et al., 1998).
information about reward value but also about specific features During behavioral tasks, the activity of individual neurons in
of the expected outcomes. Along with the orbitofrontal cortex, VS the core correlates with the direction of upcoming movement,
was indeed found to be required for learning driven by changes in irrespective of the properties of the cue used to prompt that
reward identity, information only relevant for model-based pro- movement (Setlow et al., 2003; Taha et al., 2007). Moreover,
cesses but not for model-free ones which only work with value when rats navigate a maze, the activity of core neurons correlates
information. with the direction of movement in specific locations (Shibata
Now where does the information which is replayed off-line et al., 2001; Mulder et al., 2004). Together, these data suggest that
between VS and hippocampus come from? One possibility is the core not only directly controls movement, but also receives
that relevant place-reward associations experienced during task spatial information on which to base that control.
performance are tagged in order to be preferentially replayed In addition, the core is necessary for correctly learning
during subsequent sleep or awake resting periods. In support of sequences of motor behaviors. Blocking NMDA receptors in
this proposition, van der Meer and Redish (2010)s synchronous the core, which putatively prevents synaptic plasticity, degrades
recordings of VS and hippocampus in a T-maze disentangled pos- performance on many spatial tasks: rats cannot learn paths to
sible mechanisms underlying the binding of hippocampal place rewards (Kelley, 1999), learn spatial sequences (in this case, of
representations and ventral striatal reward information during lever presses) to achieve reward (Bauter et al., 2003), or locate a
task performance. They found a ventral striatal phase precession hidden platform in a Morris water maze when encoded by dis-
relative to the hippocampal theta rhythm. This phase precession tal cues alone (Sargolini et al., 2003). Lesioning hippocampal
was found in ventral striatal ramp neurons preferentially receiving afferents to VS by cutting the fornix/fimbra pathway results in
input from those hippocampal neurons that were active lead- numerous spatial navigation problems. Whishaw and colleagues
ing up to reward sites. This phenomenon was accompanied by have shown that rats with such lesions have intact place responses,
increased theta coherence between VS and the hippocampus, pos- but great difficulty in constructing paths to them (Whishaw et al.,
sibly underlying the storage of relevant place-reward associations 1995; Gorny et al., 2002). In a Morris water maze, lesioned rats
that should be tagged for subsequent consolidation. can swim to a pre-lesion submerged platform location, but not
to a new one (Whishaw et al., 1995); in open-field exploration,
5.2. VENTRAL STRIATAL CORE AS SUBSTRATE FOR BUILDING THE lesioned rats do not show path integration trips to their homebase
ACTION MODEL (Gorny et al., 2002). Data from these studies has to be interpreted
Yin et al. (2008) proposed that one of the cores primary functions with care, but are consistent with the NMDA blockade studies.
is to learn stimulus-outcome associations that drive preparatory Together these data point to a key role for ventral striatal core in
behavior such as approach. Bornstein and Daw (2011) proposed linking together sequential episodes of behavior.
in turn that, as preparatory behavior is value-agnostic, this is con- So what is the motor control part of the core doing within
sistent with the core playing the role of the critic in a model-free the model-based/model-free framework? A general proposition
controller: that it either computes directly or conveys the values is that the core is the route via which hippocampal sequencing of
of current and reached state to midbrain dopamine neurons (Joel states reaches the motor system, a finessing of the long-recognized
et al., 2002), which in turn signal the reward prediction error to position of the core at the limbic-motor interface (Mogenson
targets in the striatum and PFC (Schultz et al., 1997; Dayan and et al., 1980). We sketch a proposal here that its specific compu-
Niv, 2008). This proposal naturally extends to the core playing the tational role is to learn and represent the probability of action
role of model-free critic in navigation as well as conditioning. selection within the transition model of the model-based system.
However, it is equally clear that the core has a role in direct
control of motor behavior, and may even serve as an action selec- 5.2.1. Actions in the transition model
tion substrate separate from the dorsal striatum (see Pennartz Consider the transition model T(s , a, s), giving the probabil-
et al., 1994; Nicola, 2007; Humphries and Prescott, 2010 for ity of arriving in state s given action a and current state s;
reviews). These dual roles for the core are not in conflict: the which we can also write p(s |a, s). The model has two uses: for

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 11


Khamassi and Humphries Model-free/model-based navigation strategies

off-line learning, it is used to sample trajectories through the between states. This substrate decomposition also suggests that
world model, and update the values of each state accordingly hippocampal formation and the core should be synchronized
(Sutton and Barto, 1998; Johnson and Redish, 2005); for on-line throughout free exploration, as continually changing states repre-
action selection, it can be queried for the probability that each sented in hippocampus should have a corresponding recruitment
action will lead to the desired transition from state s to s . To of changing action selection probabilities in the corejust such
achieve this dual use it might be advantageous to decompose an exploration-specific synchronization in local-field potentials
the transition model p(s |a, s) using Bayes theorem into repre- between hippocampus and the core has been reported by Gruber
sentations of the state transitions and of the probability of action et al. (2009). More electrophysiological studies will be required
selection: to confirm this hypothesis and precisely identify the underlying
mechanisms.
p(a|s , s) Recent neurophysiological studies also support the existence
p(s |a, s) = p(s |s) ,
p(a|s) of neural activity consistent with off-line model use for decision-
making in the core. In a multiple T-maze, van der Meer and
where we assume that current state s is known. The first-term Redish (2009) found that neurons in the core which fired at
p(s |s) is then just the probability model for state transitions, the either reward site also fired at the mazes decision point, just
second term is just the probability p(a|s , s) that each action will where hippocampal activity correlates of forward planning have
cause that transition, normalized by the probability p(a|s) of ever been previously found (Johnson and Redish, 2007). Such activity
taking that action in state s. Consequently, off-line learning is a at decision points occurred before reward was actually experi-
product of the two terms, whereas on-line action selection can be enced, and thus before error correction. This activity appeared
based on the second term only. only during initial stages and disappeared after additional train-
Such a decomposition in turn suggests a decomposition into ing producing behavioral automation. Such activity could thus
neural substrates. The hippocampal formation has long been reflect a search process related to the early use of model-based
proposed to represent potential state transitions (Poucet et al., processes for decision-making by providing signals for the evalua-
2004), and so is a natural candidate for representing p(s |s) in the tion of internally generated possible transitions considered during
simultaneous activity of current (s) and adjacent (s ) place cells. navigation (van der Meer and Redish, 2009).
Alternatively, neural network modeling of hippocampal forma-
tion functions in spatial navigation has even suggested that the 5.3. VENTRAL STRIATAL SHELL AS CRITIC(S) IN THE MODEL-BUILDER:
directional-specificity of many place fields could be interpreted ONE SYSTEM AMONGST MANY
not as place cells but rather as transition cells, representing the More than any other region of the striatum, the ventral stri-
possible transitions between the current and next states in the atal shell is a complex intermingling of multiple separate systems
environment (Gaussier et al., 2002). In this account, each cell is a (Humphries and Prescott, 2010), which may include control of
candidate for directly encoding p(s |s). approach and aversive behaviors (Reynolds and Berridge, 2003),
The ventral striatal core is then a potential substrate for repre- hedonic information, outcome evaluation, memory consolida-
senting the transition-conditioned probability of action selection tion, and appetitive control (Kelley, 1999). Consequently, we can-
p(a|s , s). A plausible network implementation is that hippocam- not meaningfully speak of a role for the shell; not least because,
pal outputs representing s and s converge on neuron groups as we noted in Humphries and Prescott (2010), the lateral and
in the core, whose consequent activity is then proportional to medial shell are themselves easily distinguished entities in terms
p(a|s , s). Learning this action component p(a|s , s) of the tran- of their afferent and efferent structureswe will return to this
sition model is then equivalent to changes in the synaptic weights distinction below.
linking the two state representations in hippocampus to the neu- Yin et al. (2008) proposed that the shells primary function
ron group in the core. Over all known state transitions from the is to learn stimulus-outcome associations that drive consumma-
current state s, the activity in the core then encodes a probability tory behavior. Bornstein and Daw (2011) argued that this role
distribution over potential actions; the selection of action based in consummatory behavior requires a sensitivity to the values
on this distribution is then done by the cores corresponding basal of the outcome, and thus makes the shell a natural candidate
ganglia circuit (see Redgrave et al., 1999; Nicola, 2007; Humphries for subserving a role equivalent to the critic for the model-
and Prescott, 2010; Humphries et al., 2012 for detailed models of based system. While strictly speaking the actor/critic algorithm
this process). is a model-free system, the model-based system still may rely on
This decomposition into substrates suggests that core neurons the computation of a prediction error to update the values of
should thus show activity correlated with both off-line model each state (van der Meer and Redish, 2011), whether during off-
search and on-line action selection. The latter we have already line model search or on-line update after each performed action.
discussed: core activity is correlated with specific actions; in par- Recently, Daw et al. (2011) tested human subjects on a multi-stage
ticular, the studies of Shibata et al. (2001) and Mulder et al. decision task that separated model-based and model-free pre-
(2004) showing a set of core neurons with motor-related activ- diction errors, and found that the model-based prediction error
ity only in specific places within a maze (such as an arm), correlated with the fMRI BOLD signal in VS.
and then only when the rats move in a particular direction Against this idea, earlier work has shown that the shell appears
in that place (e.g., toward the arm end), are consistent with not to be required for knowledge of the contingency between
the encoding of action probability conditioned on a transition instrumental actions and their outcomes: lesioning the shell does

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 12


Khamassi and Humphries Model-free/model-based navigation strategies

not stop devaluation or contingency changes from changing achieved state after a transition. Consequently, the medial shell
behavioral choice (Balleine and Killcross, 1994; Corbit et al., would be in a position to compute a state prediction error, that
2001). Consequently, the shell could appear not to be neces- adjusts the transition probability p(s |s) based on model predic-
sary for establishing goal-directed learningor, by extension, tions, rather than on simply counting the occurrences of each
model-based learning. transition.
However, a closer reading of the lesion studies allows us Lesioning the medial shell would then be predicted to show
to refine that conclusion. In shell lesion studies, only the subtle deficits in tasks that require building a world model: in suf-
medial shell is targeted (see, for example, Figure 1 of Corbit ficiently simple tasks, the mere construction of the links between
et al., 2001)not a flaw in experimental design but a limitation a limited number of states, whose values are correctly learnt,
imposed by anatomy, as attempts to lesion the lateral shell would may be sufficient to solve the task and respond to subsequent
undoubtedly also damage the overlying lateral core (Ikemoto, changes in the value of those states. Consequently, the intact
2002). Consequently, the lateral shell remains intact, and is thus sensitivity to devaluation by medial shell-lesioned rats (Balleine
a prime candidate for a model-based critic that leaves the animal and Killcross, 1994; Corbit et al., 2001) suggests that these were
sensitive to outcome devaluation and contingency changes. sufficiently simple tasks. That task complexity is a factor is sug-
Moreover, as we detailed in Humphries and Prescott (2010), gested by the data of Albertin et al. (2000). They trained rats on
lateral and medial shell are separable entities: medial shell receives a plus-maze on which a currently lit arm-end contained reward
extensive input from hippocampal field CA1 and subiculum, in the form of water drops; each day the rats experienced a new
while lateral shell receives scant hippocampal input; and both sequence of lit arms, and each day one of the arms was chosen
have separate direct and indirect pathways through the basal to contain six drops and the others contained one drop. A probe
ganglia to separate populations of midbrain dopaminergic neu- trial was then run in which every arm was lit, allowing the rat
rons (Figure 5A). As we show in Figures 5B,C, the dual pathways to choose which arm to visit. Albertin et al. (2000) found that
are a plausible candidate for computing a prediction error based lesioning the medial shell prevented rats from correctly remem-
on comparing the forebrain inputs to the two pathways; con- bering which maze arm contained the high value reward on a
sequently both medial and lateral shell could support different probe trial, but did not impair their ability to learn to visit the
critic roles (Humphries and Prescott, 2010). lit arm in the sequence during training. Such a task plausibly
Which leaves the question of the role of the medial shell, if it is requires each day building anew a world model and querying it on
indeed in a position to compute a prediction error. In Humphries the probe trial to recall which available state-transition contained
and Prescott (2010) we proposed the idea that the projections the high reward on that day. If damage to the medial shell pre-
from hippocampal formation and PFC to the direct and indi- vented correct learning of the transition model, then this would
rect pathways could, respectively, represent the expected and selectively impair querying of the model, while leaving intact the

FIGURE 5 | Dual pathways from shell to ventral tegmental area (VTA) prediction transmitted by the direct pathway and the actual outcome
potentially support prediction error computation. (A) The medial and transmitted by the indirect pathway (PPn, pedunculopontine nucleus; VP,
lateral shell both support a dual pathway circuit that converges on ventral pallidum). (B) Simulation of neural population activity showing how a
dopaminergic neurons in the VTA: a direct pathway originating from a greater outcome (indirect pathway) than predicted (direct pathway) drives a
population of D1 receptor expressing striatal projection neurons, and an phasic increase in VTA activity, signaling a positive prediction error.
indirect pathway originating from a mixed population of D1 and D2 receptor (C) Simulation of neural population activity showing how a lower outcome
expressing striatal projection neurons [see (Humphries and Prescott, 2010) (indirect pathway) than predicted (direct pathway) drives a phasic dip in VTA
for review]. This arrangement is consistent with the shells role as a critic: activity, signaling a negative prediction error. Simulation details given in
the pathways support the computation of a prediction error between the Humphries and Prescott (2010).

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 13


Khamassi and Humphries Model-free/model-based navigation strategies

ability to do simple light-reward association in the model-free 6. CONCLUSIONS


system. In this paper, we have proposed a functional distinction between
Glascher et al. (2010) searched for correlates of a state pre- parts of the striatum by bridging data about their respective
diction error in the fMRI BOLD signal recorded from humans involvement in behavioral adaptation taken from both the spatial
learning a decision-tree of stimulus choices in the absence of navigation literature and the instrumental conditioning litera-
reward, which was subsequently used as the basis for a rewarded ture. To do so, we have first formally mapped taxonomies of
task. Encouragingly, subjects behavior during the learning stage behavioral strategies from the two literatures to highlight that
was well-fit by a reinforcement learning model incorporating a navigation strategies could be relevantly categorized as either
state prediction error; moreover, the BOLD signal in lateral PFC model-based or model-free. At root, the key distinction is that it is
and intra-parietal sulcus correlated with the state prediction error the use of information in building a world representation, rather
in the model. The equivalent regions in rat are known afferents than the type of information (i.e., place vs. cue), that defines the
of the shell (Uylings et al., 2003; Humphries and Prescott, 2010). different computational processes at stake and their substrates
However, they reported that the ventral striatal BOLD signal cor- in the striatum. Within this framework, we explicitly identified
related only with the fitted model-free reward prediction error the role for dorsolateral striatum in learning and expression of
during the rewarded task stage, and not the state prediction error. model-free strategies, the role of dorsomedial striatum in learn-
It is not clear, though, whether something computed by a set of ing and expression of model-based strategies, and the role of
neurons as small as the proposed sub-set in medial shell could be model-builder for the VSmost probably in conjunction with
resolved by the voxel-size used, a problem compounded by the the hippocampus (Lansink et al., 2009; van der Meer et al.,
conservative multiple-comparison corrections used in searching 2010; Bornstein and Daw, 2012). Our scheme is summarized in
for BOLD signal correlates. Figure 6.

FIGURE 6 | Striatal-domain substrates of model-free and model-based 2008). There are also closed loop links between dopamine cell populations
controllers. The proposed organization of navigation strategies and and each striatal region. Abbreviations: Mb, model-based; Mf, model-free;
potential control of learning across the three striatal domains. The PPn, pedunculopontine nucleus; SNc, substantia nigra pars compacta; VP,
identification of the shell and core as critics for the model-based and ventral pallidum; VTA, ventral tegmental area. Note that the inhibitory
model-free controllers in dorsal striatum partly rests on the spiral of and excitatory labels refer to the dominant neurotransmitter of the
striatal-dopamine-striatal projections (Maurin et al., 1999; Haber et al., connection, not the effect that connection may have on the target nucleus
2000; Haber, 2003), originating in the shell of the VS (the spiral is as a whole (e.g., basolateral amygdala input to VS neurons can suppress
indicated by the thicker lines) and on the permissive role dopamine plays other excitatory inputs despite using glutamate, which is an excitatory
in plasticity at cortico-striatal synapses (Reynolds et al., 2001; Shen et al., neurotransmitter).

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 14


Khamassi and Humphries Model-free/model-based navigation strategies

The hypothesis that two decision-making systems (i.e., model- The model-based/model-free dichotomy would respect such
based and model-free) are processed in parallel in DMS and a general principle of common selection operation: that striatal
DLS while VS is important for the acquisition of the model territories receiving state transition information (i.e., p(s |s) cor-
seems to well explain the results of Atallah et al. (2007). In a responding to the probability of transition from state s to state
forced-choice task in a Y-maze requiring rats to learn the asso- s , no matter if these states are spatial or determined by a percep-
ciation between two odors and two actions (go left or right), tual cue) would be involved in model-based action selection while
they found that transient inactivation of DLS 6 did not prevent striatal territories receiving simple state information (i.e., p(s), no
a covert learning process which became visible as soon as the matter if state s represents a spatial position or the perception of a
DLS was released. Although this task is typically interpreted as stimulus) would be involved in model-free action selection. As we
a habit learning task (van der Meer et al., 2012), the absence discussed throughout the text, in contrast to DLS, VS and DMS
of over-training in the animals60 trials performed in total receive direct projections from the hippocampal system as well
suggests that model-based learning in the DMS was still playing as medial PFC which place them in a good situation to process
an important role at this stage and was unaffected by DLS inac- hippocampal state transition information (Gaussier et al., 2002;
tivation. Moreover, Atallah et al. (2007) found that inactivation Poucet et al., 2004) and hence to participate in the model-based
of VS mostly impaired acquisition and only partially affected per- action selection. Correspondingly, the dominant projections of
formance, consistent with the proposed role of VS in building the sensorimotor cortices to DLS may thus convey current state infor-
model used by the model-based system. mation, whether originating from the periphery or from higher
cortical areas (Haber, 2003), and hence the DLS participates in
6.1. COMPUTATIONS BY THE STRIATUM model-free action selection.
Our proposed division of function between different parts of
the striatum preserves the classical hypothesis that striatal ter- 6.2. OPEN QUESTIONS
ritories all contribute to behavioral regulation but mainly differ The account here provides concrete proposals for the dorsolat-
in function because of their different afferents (Alexander et al., eral and dorsomedial striatums role in spatial navigation, while
1990; Joel and Weiner, 1994; Middleton and Strick, 2000)a introducing new but comparatively speculative ideas about the
common division of cortical afferents among the striatal terri- VSs roles in the model-free and model-based systems. As such,
tories is illustrated in Figure 6. Throughout its dorso-lateral to our account is of course incomplete; so let us conclude with the
ventro-medial extent, the striatum has a consistent micro-circuit primary open questions:
dominated by GABAergic projection neurons controlled by at
least three classes of interneurons (Tepper et al., 2004; Bolam We have drawn a distinction between place/response strategies
et al., 2006; Humphries and Prescott, 2010). Such a consistent and model-based/model-free use of those strategies. To the best
micro-architecture points to common operational principles for of our knowledge, we lack good evidence for the existence of a
how striatum computes with its afferent inputs. Moreover, the model-free place strategy.
cortex-basal ganglia-thalamus-cortex anatomical loop involving The observations of a place-to-response strategy shift with
the ventral striatal core respects the same organization princi- over-training (Dickinson, 1980; Packard and McGaugh, 1996;
ples as loops involving the dorsal striatum: thus DLS, DMS, and Pearce et al., 1998; Chang and Gold, 2003) underpinned the
VS core are all involved in complete basal ganglia circuits com- existing idea that a response strategy is by nature habitual. Our
posed of direct and indirect pathways (Humphries and Prescott, hypothesis postulates that the central mechanism underlying
2010). Since numerous computational studies have shown that all these observed behavioral shifts is a shift from model-
this basal ganglia circuitry is efficient for performing a selec- based to model-free rather than from place-based to either
tion process (Houk and Wise, 1995; Mink, 1996; Redgrave et al., cue-guided or praxic behaviors; but why then is the shift
1999; Humphries et al., 2006; Leblois et al., 2006; Girard et al., often (but not always Yin and Knowlton, 2004; Botreau and
2008), it has been proposed that loops involving different striatal Gisquet-Verrier, 2010) from model-based place to model-free
territories could perform different levels of selection influencing response?
behavior. One such scheme envisions a hierarchy running from What is anterior DMS doing? Ragozzino and Choi (2004) pro-
course-grained selection of overall goal or strategy to achieve a posed a role for it in strategy selection, as lesions caused a
goal, through actions toward a goal, to fine-grained movement selective deficit in reversal learning, but not in initial acquisi-
parameters of each action (Redgrave et al., 1999; Ito and Doya, tion. Alternatively, perhaps DMS is divided into sub-territories
2011). differentially involved in place, cue, and praxic model-based
systems.
6 Although the injection site was referred to as the central part of the dorsal Lesion data on the core provide conflicting accounts of its
striatum by the authors (see Supplementary Figures 3 and 4 of their original roles. For example, the results of Corbit et al. (2001) dis-
paper), the great majority of injections were located outside the dorsal stri- agree with evaluation: for why, if the core forms part of the
atal region receiving projections from the prelimbic cortex [see Figure 3 in transition model, does lesioning it not then prevent outcome
Voorn et al. (2004)], and thus outside the zone called dorsomedial striatum
and related to goal-directed behaviors and model-based learning [see Figure 1
devaluation from affecting behavior? By contrast, McDannald
in Yin et al. (2008) and Figure 1 in Bornstein and Daw (2011)]. Thus, the et al. (2011) found that lesions of core affected responding
injections seem to have mostly reached the dorsolateral striatum related to to both changes in outcome value and changes in outcome
model-free habit learning. identity, emphasizing its involvement in model-based learning.

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 15


Khamassi and Humphries Model-free/model-based navigation strategies

From our account, it is not surprising that conflicting data arise these selections are based on different representations of the
if core lesions interfere with both evaluative and action selec- world.
tion systems; however, it is not clear what task designs would
be sufficient to tease apart the selective effects of core lesions ACKNOWLEDGMENTS
on its evaluative and action selection roles. This work was supported by LAgence Nationale de la Recherche:
Do the striatal domains underpin a common computation? ANR-11-BSV4-006 LU2 (Learning Under Uncertainty)
Our focus has been on the algorithmic-level distinctions project (Mehdi Khamassi), and ANR-2010-BLAN-0217-04
between behavioral strategies, and the striatal substrates within NEUROBOT project (Mark D. Humphries); by the HABOT
the neural systems implementing those algorithms. As noted project of the Ville de Paris Emergence(s) program (Mehdi
throughout, this computation may be action selection: the Khamassi); by a MRC Senior non-Clinical Fellowship (Mark D.
resolution of competing inputs at the striatal level into one Humphries); and by the European Community FP6 IST 027819
(or a few) selected signals at the output of the basal gan- ICEA (Integrating Cognition Emotion and Autonomy) Project
glia. Based on our proposals here, we may speculate that (Mark D. Humphries and Mehdi Khamassi).

REFERENCES the nucleus accumbens on instru- sequential prediction learning. Eur. (2011). Model-based influences on
Adams, S., Kesner, R. P., and Ragozzino, mental action. Behav. Brain Res. 65, J. Neurosci. 35, 10111023. humans choices and striatal predic-
M. E. (2001). Role of the medial 181193. Botreau, F., and Gisquet-Verrier, tion errors. Neuron 69, 12041215.
and lateral caudate-putamen in Banquet, J. P., Gaussier, P., Quoy, P. (2010). Re-thinking the Daw, N. D., Niv, Y., and Dayan, P.
mediating an auditory conditional M., Revel, A., and Burnod, Y. role of the dorsal striatum in (2005). Uncertainty-based com-
response association. Neurobiol. (2005). A hierarchy of associa- egocentric/response strategy. petition between prefrontal and
Learn. Mem. 76, 106116. tions in hippocampo-cortical sys- Front. Behav. Neurosci. 4:7. doi: dorsolateral striatal systems for
Albertin, S. V., Mulder, A. B., Tabuchi, tems: cognitive maps and naviga- 10.3389/neuro.08.007.2010 behavioral control. Nat. Neurosci. 8,
E., Zugaro, M. B., and Wiener, S. I. tion strategies. Neural Comput. 17, Brown, L., and Sharp, F. (1995). 17041711.
(2000). Lesions of the medial shell 13391384. Metabolic mapping of rat striatum: Dayan, P., and Balleine, B. (2002).
of the nucleus accumbens impair Barnes, T. D., Kubota, Y., Hu, D., Jin, somatotopic organization of sen- Reward, motivation, and reinforce-
rats in finding larger rewards, but D. Z., and Graybiel, A. M. (2005). sorimotor activity. Brain Res. 686, ment learning. Neuron 36, 285298.
spare reward-seeking behavior. Activity of striatal neurons reflects 207222. Dayan, P., and Niv, Y. (2008).
Behav. Brain Res. 117, 173183. dynamic encoding and recoding of Burgess, N., Recce, M., and OKeefe, Reinforcement learning: the good,
Alexander, G. E., Crutcher, M. D., procedural memories. Nature 437, J. (1994). A model of hippocam- the bad and the ugly. Curr. Opin.
and DeLong, M. R. (1990). Basal 11581161. pal function. Neural Netw. 7, Neurobiol. 18, 185196.
ganglia-thalamocortical circuits: Bauter, M. R., Brockel, B. J., Pankevich, 10651081. De Leonibus, E., Costantini, V. J. A.,
parallel substrates for motor, ocu- D. E., Virgolini, M. B., and Cory- Chang, Q., and Gold, P. E. (2003). Massaro, A., Mandolesi, G., Vanni,
lomotor, prefrontal and limbic Slechta, D. A. (2003). Glutamate Switching memory systems during V., Luvisetto, S., et al. (2011).
functions. Prog. Brain Res. 85, and dopamine in nucleus accum- learning: changes in patterns of Cognitive and neural determinants
119146. bens core and shell: sequence brain acetylcholine release in the of response strategy in the dual-
Arleo, A., and Gerstner, W. (2000). learning versus performance. hippocampus and striatum in rats. solution plus-maze task. Learn.
Spatial cognition and neuro- Neurotoxicology 24, 227243. J. Neurosci. 23, 30013005. Mem. 18, 241244.
mimetic navigation: a model of Bayer, H. M., and Glimcher, P. W. Chang, Q., and Gold, P. E. (2004). De Leonibus, E., Oliverio, A., and Mele,
hippocampal place cell activity. (2005). Midbrain dopamine neu- Inactivation of dorsolateral stria- A. (2005). A study on the role of
Biol. Cybern. 83, 287299. rons encode a quantitative reward tum impairs acquisition of response the dorsal striatum and the nucleus
Arleo, A., and Rondi-Reig, L. (2007). prediction error signal. Neuron 47, learning in cue-deficient, but not accumbens in allocentric and ego-
Multimodal sensory integration and 129141. cue-available, conditions. Behav. centric spatial memory consolida-
concurrent navigation strategies for Berke, J. D., Breck, J. T., and Neurosci. 118, 383388. tion. Learn. Mem. 12, 491503.
spatial cognition in real and artifi- Eichenbaum, H. (2009). Striatal Cohen, J. Y., Haesler, S., Vong, L., Devan, B. D., and White, N. M. (1999).
cial organisms. J. Int. Neurosci. 6, versus hippocampal represen- Lowell, B. B., and Uchida, N. Parallel information processing in
327366. tations during win-stay maze (2012). Neuron-type-specific sig- the dorsal striatum: relation to hip-
Atallah, H. E., Lopez-Paniagua, performance. J. Neurophysiol. 101, nals for reward and punishment in pocampal function. J. Neurosci. 19,
D., Rudy, J. W., and OReilly, 15751587. the ventral tegmental area. Nature 27892798.
R. C. (2007). Separate neural Bolam, J. P., Bergman, H., Graybiel, 482, 8588. Dickinson, A. (1980). Contemporary
substrates for skill learning and A. M., Kimura, M., Plenz, D., Seung, Corbit, L. H., and Balleine, B. W. Animal Learning Theory.
performance in the ventral and H. S., et al. (2006). Microcircuits (2011). The general and Cambridge, UK: Cambridge
dorsal striatum. Nat. Neurosci. 10, in the striatum, in Microcircuits: outcome-specific forms of University Press.
126131. The Interface Between Neurons pavlovian-instrumental transfer Dickinson, A. (1985). Actions and
Balleine, B. (2005). Neural bases of and Global Brain Function, eds are differentially mediated by the habits: the development of
food-seeking: affect, arousal and S. Grillner and A. M. Graybiel nucleus accumbens core and shell. behavioural autonomy. Philos.
reward in corticostriatolimbic cir- (Cambridge, MA: MIT Press), J. Neurosci. 31, 1178611794. Trans. R. Soc. B Biol. Sci. 308, 6778.
cuits. Physiol. Behav. 86, 717730. 165190. Corbit, L. H., Muir, J. L., and Balleine, Doll, L., Sheynikhovich, D., Girard,
Balleine, B., and Dickinson, A. Bornstein, A. M., and Daw, N. D. B. W. (2001). The role of the B., Chavarriaga, R., and Guillot,
(1998). Goal-directed instrumental (2011). Multiplicity of control in the nucleus accumbens in instrumental A. (2010). Path planning versus
action: contingency and incentive basal ganglia: computational roles conditioning: evidence of a func- cue responding: a bio-inspired
learning and their cortical sub- of striatal subregions. Curr. Opin. tional dissociation between accum- model of switching between navi-
strates. Neuropharmacology 37, Neurobiol. 21, 374380. bens core and shell. J. Neurosci. 21, gation strategies. Biol. Cybern. 103,
407419. Bornstein, A. M., and Daw, N. D. 32513260. 299317.
Balleine, B., and Killcross, S. (1994). (2012). Dissociating hippocam- Daw, N. D., Gershman, S. J., Seymour, Euston, D. R., Tatsuno, M., and
Effects of ibotenic acid lesions of pal and striatal contributions to B., Dayan, P., and Dolan, R. J. McNaughton, B. L. (2007).

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 16


Khamassi and Humphries Model-free/model-based navigation strategies

Fast-forward playback of recent a simple function of experience. (d)-amphetamine, dopamine and Khamassi, M., Mulder, A. B., Tabuchi,
memory sequences in prefrontal Neuron 65, 695705. d1/d2 agonists. Neuroscience 113, E., Douchamps, V., and Wiener,
cortex during sleep. Science 318, Haber, S. N. (2003). The primate 939955. S. I. (2008). Anticipatory reward sig-
11471150. basal ganglia: parallel and integra- Ito, M., and Doya, K. (2011). Multiple nals in ventral striatal neurons of
Faure, A., Haberland, U., Cond, F., and tive networks. J. Chem. Neuroanat. representations and algorithms behaving rats. Eur. J. Neurosci. 28,
Massioui, N. E. (2005). Lesion to the 26, 317330. for reinforcement learning in the 18491866.
nigrostriatal dopamine system dis- Haber, S. N., Fudge, J. L., and cortico-basal ganglia circuit. Curr. Kim, S. M., and Frank, L. M. (2009).
rupts stimulus-response habit for- McFarland, N. R. (2000). Opin. Neurobiol. 21, 368373. Hippocampal lesions impair rapid
mation. J. Neurosci. 25, 27712780. Striatonigrostriatal pathways in Jacobson, T. K., Gruenbaum, B. F., learning of a continuous spatial
Foster, D., Morris, R., and Dayan, primates form an ascending spiral and Markus, E. J. (2012). Extensive alternation task. PLoS ONE 4:e5494.
P. (2000). Models of hippocam- from the shell to the dorsolateral training and hippocampus or stria- doi: 10.1371/journal.pone.0005494
pally dependent navigation using striatum. J. Neurosci. 20, 23692382. tum lesions: effect on place and Kimchi, E. Y., and Laubach, M. (2009).
the temporal difference learning Hannesson, D. K., and Skelton, R. W. response strategies. Physiol. Behav. Dynamic encoding of action
rule. Hippocampus 10, 116. (1998). Recovery of spatial perfor- 105, 645652. selection by the medial striatum.
Foster, D. J., and Wilson, M. A. (2006). mance in the morris water maze Jadhav, S. P., Kemere, C., German, P. W., J. Neurosci. 29, 31483159.
Reverse replay of behavioural following bilateral transection of the and Frank, L. M. (2012). Awake hip- Kimchi, E. Y., Torregrossa, M. M.,
sequences in hippocampal place fimbria/fornix in rats. Behav. Brain pocampal sharp-wave ripples sup- Taylor, J. R., and Laubach, M.
cells during the awake state. Nature Res. 90, 3556. port spatial memory. Science 336, (2009). Neuronal correlates of
440, 680683. Hartley, T., and Burgess, N. (2005). 14541458. instrumental learning in the dorsal
Franz, M. O., and Mallot, H. A. (2000). Complementary memory systems: Joel, D., Niv, Y., and Ruppin, E. (2002). striatum. J. Neurophysiol. 102,
Biomimetic robot navigation. Rob. competition, cooperation and Actor-critic models of the basal gan- 475489.
Auton. Syst. 30, 133153. compensation. Trends Neurosci. 28, glia: new anatomical and computa- Krech, D. (1932). The genesis of
Gallistel, C. R. (1990). The Organization 169170. tional perspectives. Neural Netw. 15, hypotheses in rats. Publ. Psychol.
of Learning. Cambridge, MA: MIT Hasselmo, M. (2005). A model of 535547. 6, 4564.
Press. prefrontal cortical mechanisms for Joel, D., and Weiner, I. (1994). The Lansink, C. S., Goltstein, P. M.,
Gaussier, P., Revel, A., Banquet, J. P., goal-directed behavior. J. Cogn. organization of the basal ganglia- Lankelma, J. V., McNaughton,
and Babeau, V. (2002). From view Neurosci. 17, 11151129. thalamocortical circuits: open inter- B. L., and Pennartz, C. M. A.
cells and place cells to cognitive map Heimer, L., Alheid, G. F., de Olmos, connected rather than closed segre- (2009). Hippocampus leads ventral
learning: processing stages of the J. S., Groenewegen, H., Haber, S., gated. Neuroscience 63, 363379. striatum in replay of place-reward
hippocampal system. Biol. Cybern. E., Harlan, R. E., et al. (1997). The Joel, D., and Weiner, I. (2000). The con- information. PLoS Biol. 7:e1000173.
86, 1528. accumbens: beyond the core-shell nections of the dopaminergic sys- doi: 10.1371/journal.pbio.1000173
Girard, B., Tabareau, N., Pham, Q. C., dichotomy. J. Neuropsychiatry Clin. tem with the striatum in rats and Leblois, A., Boraud, T., Meissner,
Berthoz, A., and Slotine, J. J. (2008). Neurosci. 9, 354381. primates: an analysis with respect W., Bergman, H., and Hansel, D.
Where neuroscience and dynamic Hok, V., Save, E., Lenck-Santini, to the functional and compartmen- (2006). Competition between feed-
system theory meet autonomous P. P., and Poucet, B. (2005). tal organization of the striatum. back loops underlies normal and
robotics: a contracting basal ganglia Coding for spatial goals in the Neuroscience 96, 451474. pathological dynamics in the basal
model for action selection. Neural prelimbic/infralimbic area of the Jog, M. S., Kubota, Y., Connolly, ganglia. J. Neurosci. 26, 35673583.
Netw. 21, 628641. rat frontal cortex. PNAS 102, C. I., Hillegaart, V., and Graybiel, Lex, B., Sommer, S., and Hauber, W.
Glascher, J., Daw, N., Dayan, P., and 46024607. A. M. (1999). Building neural rep- (2011). The role of dopamine in the
ODoherty, J. P. (2010). States versus Honzik, C. H. (1936). The sensory resentations of habits. Science 286, dorsomedial striatum in place and
rewards: dissociable neural predic- basis of maze learning in rats. Comp. 17451749. response learning. Neuroscience 172,
tion error signals underlying model- Psychol. Monogr. 13, 113. Johnson, A., and Redish, A. D. (2005). 212218.
based and model-free reinforcement Houk, J. C., and Wise, S. P. (1995). Hippocampal replay contributes Martel, G., Blanchard, J., Mons, N.,
learning. Neuron 66, 585595. Distributed modular architectures to within session learning in a Gastambide, F., Micheau, J., and
Gorny, J. H., Gorny, B., Wallace, D. G., linking basal ganglia, cerebellum, temporal difference reinforcement Guillou, J. (2007). Dynamic inter-
and Whishaw, I. Q. (2002). Fimbria- and cerebral cortex: their role in learning model. Neural Netw. 18, plays between memory systems
fornix lesions disrupt the dead planning and controlling action. 11631171. depend on practice: the hip-
reckoning (homing) component of Cereb. Cortex 5, 95110. Johnson, A., and Redish, A. D. (2007). pocampus is not always the first to
exploratory behavior in mice. Learn. Humphries, M. D., Khamassi, M., and Neural ensembles in CA3 tran- provide solution. Neuroscience 150,
Mem. 9, 387394. Gurney, K. (2012). Dopaminergic siently encode paths forward of 743753.
Graybiel, A. M. (1998). The basal gan- control of the exploration- the animal at a decision point. Martinet, L.-E., Sheynikhovich, D.,
glia and chunking of action reper- exploitation trade-off via the J. Neurosci. 27, 1217612189. Benchenane, K., and Arleo, A.
toires. Neurobiol. Learn. Mem. 70, basal ganglia. Front. Neurosci. 6:9. Kelley, A. E. (1999). Neural integra- (2011). Spatial learning and action
119136. doi: 10.3389/fnins.2012.00009 tive activities of nucleus accumbens planning in a prefrontal cortical
Groenewegen, H. J., Wright, C. I., Humphries, M. D., and Prescott, subregions in relation to learning network model. PLoS Comput.
and Beijer, A. V. (1996). The T. J. (2010). The ventral basal and motivation. Psychobiology 27, Biol. 7:e1002045. doi: 10.1371/
nucleus accumbens: gateway for ganglia, a selection mechanism at 198213. journal.pcbi.1002045
limbic structures to reach the the crossroads of space, strategy, Khamassi, M. (2007). Complementary Maurin, Y., Banrezes, B., Menetrey, A.,
motor system? Prog. Brain Res. 107, and reward. Prog. Neurobiol. 90, Roles of the Rat Prefrontal Cortex and Mailly, P., and Deniau, J. M. (1999).
485511. 385417. Striatum in Reward-based Learning Three-dimensional distribution of
Gruber, A. J., Hussain, R. J., Humphries, M. D., Stewart, R. D., and and Shifting Navigation Strategies. nigrostriatal neurons in the rat: rela-
and ODonnell, P. (2009). Gurney, K. N. (2006). A physio- PhD thesis, Universit Pierre et tion to the topography of striaton-
The nucleus accumbens: a logically plausible model of action Marie Curie. igral projections. Neuroscience 91,
switchboard for goal-directed selection and oscillatory activity in Khamassi, M., Lacheze, L., Girard, B., 891909.
behaviors. PLoS ONE 4:e5062. doi: the basal ganglia. J. Neurosci. 26, Berthoz, A., and Guillot, A. (2005). McDannald, M. A., Lucantonio,
10.1371/journal.pone.0005062 1292112942. Actor-critic models of reinforce- F., Burke, K. A., Niv, Y., and
Gupta, A. S., van der Meer, M. A., Ikemoto, S. (2002). Ventral stri- ment learning in the basal ganglia: Schoenbaum, G. (2011). Ventral
Touretzky, D. S., and Redish, A. D. atal anatomy of locomotor from natural to arificial rats. Adapt. striatum and orbitofrontal cortex
(2010). Hippocampal replay is not activity induced by cocaine, Behav. 13, 131148. are both required for model-based,

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 17


Khamassi and Humphries Model-free/model-based navigation strategies

but not model-free, reinforcement Packard, M. G., and Knowlton, B. J. the hippocampus. Hippocampus 7, Smith, D. M., and Mizumori, S. J. Y.
learning. J. Neurosci. 31, 27002705. (2002). Learning and memory func- 1535. (2006). Hippocampal place cells,
Middleton, F. A., and Strick, P. L. tions of the basal ganglia. Annu. Rev. Redish, A. D., and Touretzky, D. S. context, and episodic memory.
(2000). Basal ganglia and cerebel- Neurosci. 25, 563593. (1998). The role of the hippocam- Hippocampus 16, 716729.
lar loops: motor and cognitive cir- Pearce, J. M., Roberts, A. D., and Good, pus in solving the morris water Sutherland, R. J., and Hamilton, D. A.
cuits. Brain Res. Brain Res. Rev. 31, M. (1998). Hippocampal lesions maze. Neural Comput. 10, 73111. (2004). Rodent spatial navigation:
236250. disrupt navigation based on cogni- Reynolds, J. N., Hyland, B. I., at the crossroads of cognition and
Mink, J. W. (1996). The basal ganglia: tive maps but not heading vectors. and Wickens, J. R. (1957). movement. Neurosci. Biobehav. Rev.
focused selection and inhibition of Nature 396, 7577. Discrimination of cues in mazes: 28, 687697.
competing motor programs. Prog. Pennartz, C. M., Groenewegen, H. J., a resolution of the place-vs.- Sutherland, R. J., and Rodriguez,
Neurobiol. 50, 381425. and da Silva, F. H. L. (1994). response question. Psychol. Rev. 64, A. J. (1989). The role of the
Mogenson, G. J., Jones, D. L., and Yim, The nucleus accumbens as a com- 217228. fornix/fimbria and some related
C. Y. (1980). From motivation to plex of functionally distinct neu- Reynolds, J. N., Hyland, B. I., and subcortical structures in place
action: functional interface between ronal ensembles: an integration Wickens, J. R. (2001). A cellular learning and memory. Behav. Brain
the limbic system and the motor of behavioural, electrophysiolog- mechanism of reward-related learn- Res. 32, 265277.
system. Prog. Neurobiol. 14, 6997. ical and anatomical data. Prog. ing. Nature 413, 6770. Sutton, R. S., and Barto, A. G.
Morris, R. G. M. (1981). Spatial local- Neurobiol. 42, 719761. Reynolds, S. M., and Berridge, K. C. (1998). Reinforcement Learning: An
ization does not require the pres- Penner, M. R., and Mizumori, S. (2003). Glutamate motivational Introduction. Cambridge, MA: MIT
ence of local cues. Learn. Motiv. 12, J. Y. (2012). Neural systems analy- ensembles in nucleus accumbens: Press.
239260. sis of decision making during goal- rostrocaudal shell gradients of fear Taha, S. A., Nicola, S. M., and Fields,
Moussa, R., Poucet, B., Amalric, directed navigation. Prog. Neurobiol. and feeding. Eur. J. Neurosci. 17, H. L. (2007). Cue-evoked encoding
M., and Sargolini, F. (2011). 96, 96135. 21872200. of movement planning and execu-
Contributions of dorsal striatal Peoples, L. L., Gee, F., Bibi, R., and Rudy, J. W. (2009). Context repre- tion in the rat nucleus accumbens.
subregions to spatial alternation West, M. O. (1998). Phasic firing sentations, context functions, and J. Physiol. 584, 801818.
behavior. Learn. Mem. 18, 444451. time locked to cocaine self-infusion the parahippocampal-hippocampal Tang, C., Pawlak, A. P., Prokopenko, V.,
Mulder, A. B., Tabuchi, E., and Wiener, and locomotion: dissociable firing system. Learn. Mem. 16, 573585. and West, M. O. (2007). Changes
S. I. (2004). Neurons in hippocam- patterns of single nucleus accum- Sargolini, F., Florian, C., Oliverio, A., in activity of the striatum during
pal afferent zones of rat stria- bens neurons in the rat. J. Neurosci. Mele, A., and Roullet, P. (2003). formation of a motor habit. Eur.
tum parse routes into multi-pace 18, 75887598. Differential involvement of NMDA J. Neurosci. 25, 12121227.
segments during maze navigation. Ploeger, G. E., Spruijt, B. M., and and AMPA receptors within the Tepper, J. M., Koos, T., and Wilson, C. J.
Eur. J. Neurosci. 19, 19231932. Cools, A. R. (1994). Spatial local- nucleus accumbens in consolida- (2004). GABAergic microcircuits in
Nicola, S. M. (2007). The nucleus ization in the morris water maze tion of information necessary for the neostriatum. Trends Neurosci.
accumbens as part of a basal in rats: acquisition is affected place navigation and guidance strat- 27, 662669.
ganglia action selection circuit. by intra-accumbens injections egy of mice. Learn. Mem. 10, Thorn, C. A., Atallah, H., Howe,
Psychopharmacology (Berl.) 191, of the dopaminergic antagonist 285292. M., and Graybiel, A. M. (2010).
521550. haloperidol. Behav. Neurosci. 108, Schmitzer-Torbert, N. C., and Redish, Differential dynamics of activity
ODoherty, J., Dayan, P., Schultz, J., 927934. A. D. (2008). Task-dependent changes in dorsolateral and dorso-
Deichmann, R., Friston, K., and Potegal, M. (1972). The caudate encoding of space and events by medial striatal loops during learn-
Dolan, R. J. (2004). Dissociable roles nucleus egocentric localization striatal neurons is dependent on ing. Neuron 66, 781795.
of ventral and dorsal striatum in system. Acta Neurobiol. Exp. 32, neural subtype. Neuroscience 153, Tolman, E. C. (1948). Cognitive maps
instrumental conditioning. Science 479494. 349360. in rats and men. Psychol. Rev. 55,
304, 452454. Poucet, B., Lenck-Santini, P. P., Hok, Schultz, W., Dayan, P., and Montague, 189208.
OKeefe, J., and Nadel, L. (1978). The V., Save, E., Banquet, J. P., Gaussier, P. R. (1997). A neural substrate of Trullier, O., Wiener, S., Berthoz,
Hippocampus as a Cognitive Map. P., et al. (2004). Spatial navigation prediction and reward. Science 275, A., and Meyer, J.-A. (1997).
Oxford, UK: Oxford University and hippocampal place cell firing: 15931599. Biologically-based artificial naviga-
Press. the problem of goal encoding. Rev. Setlow, B., and McGaugh, J. (1998). tion systems: review and prospects.
Packard, M. (1999). Glutamate infused Neurosci. 15, 89107. Sulpiride infused into the nucleus Prog. Neurobiol. 51, 483544.
posttraining into the hippocam- Pych, J. C., Chang, Q., Colon-Rivera, accumbens posttraining impairs Uylings, H. B. M., Groenewegen, H. J.,
pus or caudate-putamen differen- C., and Gold, P. E. (2005). memory of spatial water maze and Kolb, B. (2003). Do rats have a
tially strengthens place and response Acetylcholine release in hip- training. Behav. Neurosci. 112, prefrontal cortex? Behav. Brain Res.
learning. PNAS 96, 1288112886. pocampus and striatum during 603610. 146, 317.
Packard, M., and McGaugh, J. (1992). testing on a rewarded spontaneous Setlow, B., Schoenbaum, G., and van der Meer, M. A. A., Johnson,
Double dissociation of fornix and alternation task. Neurobiol. Learn. Gallagher, M. (2003). Neural A., Schmitzer-Torbert, N. C., and
caudate nucleus lesions on acquisi- Mem. 84, 93101. encoding in ventral striatum during Redish, A. D. (2010). Triple disso-
tion of two water maze tasks: further Ragozzino, M. E., and Choi, D. (2004). olfactory discrimination learning. ciation of information processing
evidence for multiple memory sys- Dynamic changes in acetylcholine Neuron 38, 625636. in dorsal striatum, ventral striatum,
tems. Behav. Neurosci. 106, 439446. output in the medial striatum dur- Shen, W., Flajolet, M., Greengard, and hippocampus on a learned spa-
Packard, M., and McGaugh, J. (1996). ing place reversal learning. Learn. P., and Surmeier, D. J. (2008). tial decision task. Neuron 67, 2532.
Inactivation of hippocampus or Mem. 11, 7077. Dichotomous dopaminergic control van der Meer, M. A. A., Kurth-
caudate nucleus with lidocaine dif- Redgrave, P., Prescott, T. J., and Gurney, of striatal synaptic plasticity. Science Nelson, Z., and Redish, A. D. (2012).
ferentially affects the expression K. (1999). The basal ganglia: a verte- 321, 848851. Information processing in decision-
of place and response learning. brate solution to the selection prob- Shibata, R., Mulder, A. B., Trullier, O., making systems. Neuroscientist 18,
Neurobiol. Learn. Mem. 65, 6572. lem? Neuroscience 89, 10091023. and Wiener, S. I. (2001). Position 342359.
Packard, M. G., Hirsh, R., and White, Redish, A. D. (1999). Beyond the sensitivity in phasically discharg- van der Meer, M. A. A., and
N. M. (1989). Differential effects of Cognitive Map: From Place Cells to ing nucleus accumbens neurons Redish, A. D. (2009). Covert
fornix and caudate nucleus lesions Episodic Memory. Cambridge, MA: of rats alternating between tasks expectation-of-reward in rat ven-
on two radial maze tasks: evi- MIT Press. requiring complementary types tral striatum at decision points.
dence for multiple memory systems. Redish, A. D., and Touretzky, D. S. of spatial cues. Neuroscience 108, Front. Integr. Neurosci. 3:1. doi:
J. Neurosci. 9, 14651472. (1997). Cognitive maps beyond 391411. 10.3389/neuro.07.001.2009

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 18


Khamassi and Humphries Model-free/model-based navigation strategies

van der Meer, M. A. A., and Redish, Whishaw, I. Q., Mittleman, G., Bunch, to place and response learning. networks. Eur. J. Neurosci. 28,
A. D. (2010). Theta phase precession S. T., and Dunnett, S. B. (1987). Learn. Mem. 11, 459463. 14371448.
in rat ventral striatum links place Impairments in the acquisition, Yin, H. H., and Knowlton, B. J. (2006).
and reward information. J. Neurosci. retention and selection of spatial The role of the basal ganglia in Conflict of Interest Statement: The
31, 28432854. navigation strategies after medial habit formation. Nat. Rev. Neurosci. authors declare that the research
van der Meer, M. A. A., and Redish, caudate-putamen lesions in rats. 7, 464476. was conducted in the absence of any
A. D. (2011). Ventral striatum: a Behav. Brain Res. 24, 125138. Yin, H. H., Knowlton, B. J., and commercial or financial relationships
critical look at models of learn- White, N. M., and McDonald, R. J. Balleine, B. W. (2004). Lesions of that could be construed as a potential
ing and evaluation. Curr. Opin. (2002). Multiple parallel mem- dorsolateral striatum preserve out- conflict of interest.
Neurobiol. 21, 387392. ory systems in the brain of the come expectancy but disrupt habit
Voorn, P., Vanderschuren, L. J., rat. Neurobiol. Learn. Mem. 77, formation in instrumental learning. Received: 15 May 2012; accepted: 29
Groenewegen, H. J., Robbins, T. W., 125184. Eur. J. Neurosci. 19, 181189. October 2012; published online: 27
and Pennartz, C. M. (2004). Putting Wiener, S. I. (1993). Spatial and behav- Yin, H. H., Knowlton, B. J., and November 2012.
a spin on the dorsal-ventral divide ioral correlates of striatal neurons Balleine, B. W. (2005a). Blockade Citation: Khamassi M and Humphries
of the striatum. Trends Neurosci. 27, in rats performing a self-initiated of NMDA receptors in the dor- MD (2012) Integrating cortico-limbic-
468474. navigation task. J. Neurosci. 13, somedial striatum prevents action- basal ganglia architectures for learning
Watabe-Uchida, M., Zhu, L., Ogawa, 38023817. outcome learning in instrumental model-based and model-free navigation
S. K., Vamanrao, A., and Uchida, Wiener, S. I., Paul, C. A., and conditioning. Eur. J. Neurosci. 22, strategies. Front. Behav. Neurosci. 6:79.
N. (2012). Whole-brain map- Eichenbaum, H. (1989). Spatial 505512. doi: 10.3389/fnbeh.2012.00079
ping of direct inputs to midbrain and behavioral correlates of hip- Yin, H. H., Ostlund, S. B., Knowlton, Copyright 2012 Khamassi and
dopamine neurons. Neuron 74, pocampal neuronal activity. J. B. J., and Balleine, B. W. (2005b). Humphries. This is an open-access
858873. Neurosci. 9, 27372763. The role of the dorsomedial stria- article distributed under the terms of the
Whishaw, I. Q., Cassel, J. C., and Willingham, D. B. (1998). What differ- tum in instrumental conditioning. Creative Commons Attribution License,
Jarrad, L. E. (1995). Rats with entiates declarative and procedural Eur. J. Neurosci. 22, 513523. which permits use, distribution and
fimbria-fornix lesions display a memories: reply to cohen, poldrack, Yin, H. H., Ostlund, S. B., and Balleine, reproduction in other forums, provided
place response in a swimming pool: and eichenbaum (1997). Memory 6, B. W. (2008). Reward-guided the original authors and source are
a dissociation between getting there 689699. learning beyond dopamine in the credited and subject to any copyright
and knowing where. J. Neurosci. 15, Yin, H. H., and Knowlton, B. J. (2004). nucleus accumbens: the integrative notices concerning any third-party
57795788. Contributions of striatal subregions functions of cortico-basal ganglia graphics etc.

Frontiers in Behavioral Neuroscience www.frontiersin.org November 2012 | Volume 6 | Article 79 | 19

You might also like