Change, regularity, and value in the evolution of

Change, regularity, and value in the
evolution of animal learning
I present a simple model that considers how three factors—change, regularity, and value—
influence the evolution of animal learning. Change and regularity are considered by introducing
two terms that measure environmental persistence. One term, "between-generation persistence," defines the extent to which states in the parental generation predict states in the offspring
generation; the other term, "within-generation persistence," defines the extent to which today
predicts tomorrow within an individual's lifetime. Within-generation persistence is shown to
be the most important of these two terms. When there is some change, increasing the withingeneration persistence promotes the evolution of learning, and the between-generation persistence term has no effect. However, when the environment is almost completely fixed, then
increasing change, either within or between generations, promotes the evolution of learning.
This occurs because (1) the change required to promote the evolution of learning can occur
either within or between generations even though (2) the regularity required to promote the
evolution of learning must come within an animal's lifetime. The region of absolute fixity, in
which learning does not generally evolve, is relatively small. The results for value, or payoffs,
suggest that learning is most useful when all the alternatives to learning yield about the same
D. W. Stephens
School of Biological Sciences,
University of
Nebraska at Lincoln,
Lincoln, NE 68588, USA
payoff. [Behav Ecol 1991;2:77-89]
M
any students of animal learning have
claimed that learning is an adaptation
to environmental change (Gray, 1981; Johnston, 1982; Johnston and Turvey, 1980;
Mackintosh, 1983; Plotkin and Odling-Smee,
1979; Shettleworth, 1984). For example,
Johnston (1982) states that learning's "primary selective benefit" is to allow adaptation
to environmental change. This claim follows
most clearly from what I call the "absolute
fixity argument": if we suppose that some costs
are associated with learning, then it follows
that an absolutely fixed environment should
be met with a genetically fixed pattern of behavior, rather than with learned behavior.
A smaller, but significant, number of authors have stressed the importance of environmental predictability in the evolution of
learning. Staddon (1983), for example, writes:
"The evolutionary function of 'reinforced
learning' is to detect regularities . . ." Like the
argument for the importance of change, the
argument for regularity follows most clearly
from an extreme example that I call the "absolute unpredictability argument." In an absolutely unpredictable environment, where
today's state bears no predictable relationship
to tomorrow's, then there is literally nothing
to learn. It seems reasonable to suppose that
today must tell you something about tomorrow for learning to be worthwhile.
Two virtual opposites, change and regularity, are both credited with being the selective
force in the evolution of learning! Moreover,
viewed from the two extremes of absolute fixity and absolute unpredictability, both claims
have merit. The simplest resolution of this
paradox is to claim that learning is an adaptation to intermediate levels of predictability
(Johnston and Turvey, 1980; Slobodkin and
Rapoport, 1974). In this paper I propose a
different solution that takes account of two
distinct types of environmental predictability.
Specifically, this paper argues that the pattern
of predictability in relation to an animal's life
history determines the evolutionary value of
learning.
What is predictability?
Consider an animal population with discrete,
nonoverlapping generations, and suppose that
the environment's overall predictability is determined by two components: within-genera-
Stephens • Evolution of animal learning
Received 2 September 1990
Revised 2 November 1990
Accepted 5 November 1990
1045-2249/91/J2.00
© 1991 International Society
for Behavioral Ecology
77
Table 1
Hypothetical effects of within- and betweengeneration predictability on learning
What is learning?
Perhaps the most common definition of learning is "the modification of behavior by exBetweenperience" with a list of exceptions (for further
Within-generation predictability
generation
discussion, see Papaj and Prokopy, 1989;
predictability Lowest
Highest
Shettleworth, 1984; Staddon, 1983). Atypical
exception is the formation of calluses after
Learn
Lowest
Ignore experience
the experience of running over stony ground,
Ignore experience
Highest
Ignore experience
which may lead to faster running, but most
people would not call this change "learning."
The
central feature of this definition is that it
tion predictability, the extent to which today's
poses
learning as an alternative to innate, or
state predicts tomorrow's within an individucanalized,
behavior (Staddon, 1983). In the
al's lifetime; and between-generation predictabilmodel
presented
here the "modification of
ity, the extent to which the state (or states) in
behavior
by
experience"
definition of learnthe parental generation predicts the state in
ing
is
adopted
without
the
usual list of excepthe offspring generation. Within-generation
tions.
As
an
evolutionary
model
of learning,
predictability may be different from betweenthe
present
model
is
concerned
more with
generation predictability for several reasons.
outcomes
than
with
mechanisms.
It
makes no
For example, if offspring disperse a great disdifference
to
the
model
whether
experience
tance, this would probably lower betweenchanges behavior via "calluses" or via cognigeneration predictability but have little or no
tion. Although this paper is motivated by a
effect on within-generation predictability. Inproblem arising in the learning literature, it
deed, any factor that separates generations in
views
learning as an instance of the more genspace or time could have a similar effect.
eral phenomenon of phenotypic plasticity (for
Using these two predictability terms one can
recent syntheses of this, see Stearns, 1989;
combine the absolute fixity and absolute unWest-Eberhard, 1989).
predictability arguments in a novel way. Table
1 shows this idea: experience should be ignored in the first column because this coincides with the absolute unpredictability case THE MODEL
I suppose that a hypothetical population lives
outlined above, and experience should also
in an environment that contains exactly two
be ignored in the "highest-highest" cell beresources (Figure 1). The varying resource can
cause this coincides with the absolute fixity
be in a good state ("G") or a bad state ("B").
case discussed above. It makes sense to learn
If it is good, then exploiting it yields one unit
those things that change between generations
of fitness per period; if it is bad it provides b
and are regular within generations.
units of fitness per period. One of these reAt first glance it seems that the logic of
Table 1 might resolve the contradictory effects sources is stable and called "S"; when an individual exploits resource S it obtains s units
of change and regularity on the evolution of
of fitness per period (1 > s > b). The aslearning. For example, an advocate of the view
sumption that the good state of the varying
that learning is an adaptation to regularity
resource provides one unit of fitness can be
might assume that between-generation premade without any loss of generality because
dictability is low and compare the two cells in
any three values can be re-scaled to make this
the top row of Table 1 to conclude that pretrue. Taken together the two variables 5 and
dictability promotes learning. Another stub summarize the benefits of exploiting these
dent of learning might assume that withinresources. To simplify the discussion, I refer
generation predictability is high and compare
to these terms collectively as the "value" terms.
the two cells in the second column of Table
I suppose that each generation consists of
1 to conclude that environmental unpredicttwo periods and that generations do not overability promotes learning. This paper presents
lap. Two first-order Markov chains determine
a model that extends the logic of Table 1 by
the state of the varying resource. If the varying
allowing both predictability terms to take valresource is in the good state in the first period
ues from a continuous range. The model preof the ith generation, then it will be good in
sented here is the simplest model I can imagthe second period with probability w; similarine that incorporates both within- and
ly, if it is bad in the first period, then it will
between-generation predictability terms. Nevbe bad in the second period with probability
ertheless, it yields some surprising results.
03. The variable w (V4 < u> < 1) represents the
Specifically, although Table 1 is correct for
environment's within-generation persistence, and
extreme values for the two predictability terms,
it measures the within-generation predictabilone cannot generalize from these extreme vality. A similar variable measures the betweenues to intermediate predictability values.
78
Behavioral Ecology Vol. 2 No. 1
Between-Generation
Transition
enera tion ]
VARYING
RESOURCE
Generation 2
/ Generation 3
Generation 4
A /
G
G
B
B
G
B
B
G
s
S
S
S
Within- Generatk
Transition
STABLE
RESOURCE
s
S
s
S
Figure 1
Diagram of the model's assumptions. The model assumes that an individual may exploit (1) a varying resource that
varies between good (marked G) and bad (marked B) states, or (2) a stable resource (marked S). The value of the stable
resource is between the values of the good and bad states of the varying resource. Two distinct Markov transition
processes control the state of the varying resource. The first Markov process controls within-generation transitions.
The state remains the same within a generation (as shown in generations 1 and 2) with probability o>, and the state
changes within a generation (as shown in generations 3 and 4) with probability 1 — o>. The second Markov process
controls between-generation transitions. The state will be the same in the beginning of a given generation as it was at
the end of the previous generation with probability 0 (as shown in the transition from generations 3 to 4), and the
state will change from one generation to the next with probability 1 — /S (as shown in the transition from generations
1 to 2).
generation predictability. The state at the beginning of the next (i + 1st) generation will
be the same as the state at the end of the last
(»'th) generation with probability /3.1 call /3 the
between-generation persistence. For both persis-
tence terms (co and /3), a value of Vi represents
the least predictable situation and a value of
1 represents the most predictable situation.
The model animal is a haploid. This genetic
system is frequently used to simplify phenotypic models of evolution. There are three
alternative alleles—"V, Sf, and S£—which
code for the following strategies. Individuals
carrying T^and .S^are specialists. An individual carrying Valways exploits the varying resource and ^always exploits the stable resource. An individual carrying allele S£
"learns" whether to exploit the varying or stable resource. If it experiences a good state in
the first period, then it exploits the varying
resource again; but if it experiences a bad
state in the first period, then it exploits the
stable resource in the second period. I assume
that the learning tactic is more expensive to
implement than the two fixed tactics, and use
the variable c (which is measured in the same
normalized units as b and s) to represent this
cost. I assume that this cost is small (probably
much less than one in the normalized units
used here) and non-negative.
The proportion of allele *V"m the popula-
tion is p, the proportion of allele £f is q, and
the proportion of S£ is r (p + q + r = 1). In
a given generation, one of four environmental
permutations must occur: GG, BB, GB, BG
(where the first letter refers to good or bad
in the first period and the second letter to the
second period). Table 2 shows the fitnesses of
each genotype in each environmental permutation. Using these fitnesses one can write
conditional difference equations (four sets of
three simultaneous difference equations) that
reflect the changes in allele frequency expected under each permutation (Table 2). The
problem is now one of studying the long-term
dynamics of the 12 difference equations in a
situation in which a Markov chain, determined
by the two environmental persistence terms
(a> and /3), controls which one of the four sets
of difference equations applies in a given generation. A solution of this problem would tell
us how u and /S influence the evolution of the
learning allele. A general analytical solution
to this problem presents significant technical
difficulties; however, a limited analytical solution is possible for some values of o> and /3.
A limited analytical result:
w's dominance
The environment can change from one of the
four states GG, GB, BG, BB in generation i
Stephens • Evolution of animal learning
79
Table?
Dynamical* and fitness consequences of strategies (genotypes)
Environmental permutation
oiraicgy
GG
•V: Fix On V
Fitness
2
BB
GB
1 + ft
pa + *)
Dynamics p' — p2 + q2s + r(2 - c) pa +b) + q2s + r(l+ b -c)
2b
p2b
p2b + qis + rib + s -c)
£(1 + b)
pa +b) + q2s + r(b +s-c)
2s
q2s
q2s + rib + s -c)
2s
q2s
p{\ + b) + q2s + r(b +s-c)
if: Fix On S
Fitness
Dynamics q' =•
q2s
p2 + q2s + ri2- c) pa +b) + q2s + r(l+ b ~c)
Sf: Leam
Fitness
2 - c
1 + 6- c
r(2- c)
ra + b - c
1
Dynamics r =
p2 + q2s + r(2g - c) pa + b) + q2s + r[\
+ b ~c)
BG
P2b +
1 +ft
ft + s - c
rib + s - c)
ft + s - c
rib + s - c)
P2b + q2s + rib + s -c)
pa+b)
+
q2s + rib + s-c)
• The variables marked with a prime denote allele frequencies in the next generation.
to any other (of the four states) in generation
» + 1. A first-order Markov chain governs
these transitions (Table 3). In every case, except when the within- and between-generation
persistences are both at their highest levels (<o
= /8 = 1), it is possible to reach any state from
any other state; as long as this is true, the
and Lieberman (1974, 1975). Following this
argument the relevant expected values are
_ w
1 — co
^( w ) ~~ ^ l°g(2) H - logd + b)
,
w
T
Markov chain shown in Table 3 will approach
a stationary probability distribution. That is,
given a sufficiently large number of generations one can specify the overall relative frequencies of each of the four states. The stationary distribution of this Markov chain is
Pr(GG) = o>/2, Pr(GB) = (1 - «)/2, Pr(BG)
= (1 - «)/2, and Pr(BB) = «/2, which is independent of the between-generation persistence
fi\
, _/Oi.\ • 1 ~
lOfif(^C')
T
2
w
i « i IA
iOfifl 1
1 0)
2
_
~ u'
« v
=
+ l°gd + b),
~= , rf« ,
8V
(1)
m
W
;>
£fi \ = — \ <?{9 — -v
This fact is easily verified by multiplying the
row vector [w/2, (1 - w)/2, (1 - w)/2, w/2]
and the transition matrix shown in Table 3.
This multiplication returns the original vector, and so this operation confirms the stationary distribution (Feller, 1950).
The stationary distribution can be used to
calculate an expected value of the logarithm
of fitness for each allelf. The Appendix shows
that the allele with the highest expected logarithm offitnessshould evolve to fixation. The
importance of the expected logarithm of
fitness has been discussed at length by Karlin
2
+
" } oe (i + ^ _ c )
2
-|— log(6 + s — c)
2
1— w
H
— log(ft + s — c)
=
w
^ D°g(2 — c) — log(l + b — c)]
logd + b - c) + log(s + b - c)
2
' (3)
Table 3
The transition matrix for the system modeled in the text
To
From
GG
GB
BG
BB
80
GG
GB
<A\ - ft
(1 - <a)i/3
( 1 -<•>)i(l
o$
o.(l - f t
Behavioral Ecology Vol. 2 No. 1
BG
(1
-ft
(1 - 0 ) )1/3
1(1 - f t
BB
-«)d -ft
- <•>)£
(1
-
<•>)(!
(1 - « ) / »
-ft
0)(l
-ft
o>0
<"<1
-ft
To simplify the discussion I refer to these
expected values of the logarithm of fitness
simply as expectations. Hence "^w) will be
called the "expectation" of the fix-on-varying
strategy; £f\u>) will be called the "expectation" of the fbc-on-stable strategy; and &{u>)
will be called the "expectation" of the learning strategy. To determine when learning will
evolve, one must determine the conditions in
which the expectation of the learning strategy,
£P(ui), is greater than the expectations of both
of the nonleaming strategies. A mathematically equivalent question is: When is the expectation of the learning strategy greater than
the expectation of the best nonleaming strategy?
The first step in answering this question is
to consider how the within-generation persistence term («) affects the expectations of the
two nonleaming strategies. Equation 1 shows
that the expectation of the fix-on-varying
strategy, "t\u>), is a linear function of w with
a negative slope (applying Jensen's inequality
to the terms in square brackets shows that the
slope is negative). The expectation of the fixon-stable strategy (Equation 2) does not depend on o), so .S''would be a line of zero slope
when plotted against a>. Combining these facts
shows that the expectation of the best nonlearning strategy either decreases or stays the
same as u> increases.
Equation 3 shows that the expectation of
the learning strategy, Sfla), is also a linear
function of w, but with a positive slope; so one
may conclude that the expectation of the
learning strategy increases widi w. To summarize, we know that the expectation of learning always increases with u>, and that the expectation of nonleaming decreases or stays
the same; so there must be a critical withingeneration persistence value, «*, such that for
within-generation persistences greater than u*
the learning strategy has the highest expectation, and for within-generation persistences
less than w* one of the nonleaming strategies
has the highest expectation. The critical-<o value can be calculated (for a given set of c, b,
and s values) using the equation above: w* is
the larger of (1) the u that satisfies both Equation 1 and Equation 3 and (2) the w that satisfies both Equation 2 and Equation 3.
The existence of this critical persistence value, oj*, is a key result of the model, because
the value of w* completely determines the predicted outcome. There are three cases: (1) if
w* is less than Vi, then the model predicts
learning at all levels of within-generation persistence (Vi < u < 1); (2) if «* is greater than
Vi but less than 1, then the model predicts
nonleaming at low and learning at high widiin-generation persistence values; (3) if «• is
greater than 1 then the model predicts non-
learning at all levels at within-generation persistence (Vi < u < 1).
In short, when the stationary distribution is
valid, knowledge of the critical within-generation persistence value u* is sufficient to predict whether learning will evolve. The obvious
next step is to ask what determines «*? Equations 1-3 above indicate that w* is determined
by the three terms—s, the fitness gained from
exploiting the stable resource for one period;
b, the fitness gained from exploiting die bad
state of the varying resource; and c, the direct
cost of the learning strategy. One simple way
to summarize the effects of these terms on u*
is to plot 03* as a function of 5 and b for several
values of c; however, this operation is complicated by the restriction that 1 > s > b. To
overcome this problem I introduce a new variable, a = b/s, diat represents thefitnessgained
from exploiting die bad state of die varying
resource as a proportion of s. For example,
if a equals 1, die bad state of varying resource
has die same value as die stable resource, but
if a equals Vi, die bad state of varying resource
has half die value of die stable resource. This
means diat all cases diat satisfy die restricdon
1 > s > b can be plotted in die square region
1 > s > 0 and 1 > a > 0.
Figure 2 plots critical widiin-generation
persistence values, u*, against die s and a variables for four values of die direct cost term
c. The figure only plots values of a>* in die
range Vi to 1: (1) Values less dian Vi represent
as combinadons in which learning is favored
for all condidons die model allows, including
when co = Vi. The figure represents diese condidons as a flat "floor" as seen in panels A
and B. (2) Values greater dian 1 represent as
conditions in which learning is never favored,
since u cannot be larger dian one. The figure
represents diese nonleaming regions as a flat
"ceiling." The surfaces shown in Figure 2 represent die lower limit of die set of (a, s, o>)
points diat favor learning: points on or below
diese surfaces represent condidons in which
nonleaming is favored, while points above
diese surfaces represent condidons in which
learning is favored.
All four panels of Figure 2 show a trough
running diagonally from die origin. This
trough represents die region in which learning is favored (in die sense diat diis is die
region where diere are die most points above
die surface). A comparison of die four panels
shows diat increasing die direct costs of learning, c, decreases die region in which learning
is favored because die trough becomes shallower as c increases. However, die direct costs
of learning have litde effect on die position of
die trough widi respect to die s and a axes.
Widi respect to die s and a axes, die trough
runs roughly along a line where s is somewhat
Stephens • Evolution of animal learning
81
B.
Figure 2
The effect of the fitness
increments (J, a, and c) on a*.
Each panel shows the critical
within-generation persistence
value, <D* (learning is
predicted for withingeneration persistence values
greater than u*), as a function
of s (the value of the stable
resource) and a (the value of
the bad state of the varying
resource expressed as a
proportion of i). Points above
the surface represent
conditions in which learning is
favored. The four panels show
the same calculations for
different values of the direct
costs of learning c: (A) c =
0.0; (B) c = 0.01; (Q c0.055; (D)c = 0.1.
C.
CO*
larger than a, and it is wide at low s and a
values but narrow at high values. This corresponds to the region in which the two nonlearning (or "averaging") strategies (fix-onvarying in Equation 1, and fix-on-stable in
Equation 2), have roughly the same expectations (see Stephens, 1987, for a similar result).
Figure 2 also shows that when the direct
costs of learning, c, are low, there are (a, s)
points where learning is favored even though
there is no within-generation persistence, a> =
V£. The flat floors in Figure 2A,B show these
regions. In these conditions, learning cannot
be favored because it helps keep track of
changing environments; instead, it seems to
be favored here because it reduces the variance in fitness. Variance can affect these calculations via the well-documented variancesensitivity of the expected logarithm of fitness
(Gillespie, 1977). The fix-on-varying strategy
receives the payoffs associated with the extreme states G and B, whereas the learner
receives a mixture of G, B, and the intermediate S state. (Of course, the fix-on-stable
strategy yields the lowest variance, 0.) This
example suggests that learning can sometimes
evolve even when there is no within-generation persistence, however, the generality of
this result needs further analysis. It may be
an artifact of the alternatives to learning that
this model assumes.
Simulation studies
The analysis above suggests that within-generation persistence, w, dominates the evolu-
82
Behavioral Ecology Vol. 2 No. 1
tion of learning as long there is some change;
that is, as long as (w, J3) •/= (1, 1). This domination occurs because the stationary distribution of the Markov chain that governs environmental change approaches a stationary
distribution that is independent of j3 as long
as («, /3) ^ (1, 1). However, convergence to
this stationary distribution is only guaranteed
given infinite time. In the neighborhood of
the absolute fixity point, (co, |8) = (1, 1), convergence to the stationary distribution can take
so many generations that it would be dangerous to assume that the stationary distribution
applies. Put another way, within-generation
persistence dominates the evolution of learning given that the two persistence terms are
sufficiently far from the absolute fixity point;
but we do not know how far is sufficient. To
investigate this problem I have used computer
simulations. The simulations also help confirm
and extend the approximate analysis presented in the preceding section.
The simulation program was written in the
C programming language. For a given set of
parameters (s, a, c, o>, and /9), 500 simulations
were performed. Each such simulation always
began with all alleles having equal frequency
p = q = r = Vi. Given this information, the
states in a given generation were determined
using computer-generated random numbers,
and the difference equations in Table 2 were
applied for 3000 generations or until one of
the alleles reached a fixation criterion (> 1 —
10"6). The output from these 500 simulations
was the average value of r, the allele frequency
of learners. For a given triplet of s, a, and c
values, the program calculated the average r
for each of 256 combinations of within- (a>)
and between-generation (/3) persistence. These
256 (w, j8) pairs were chosen to complete a
regular 16 by 16 grid over the ranges Vi ^ a>
< 1 and V* < 0 < 1.
Table 4 shows the conditions simulated, as
well as the critical within-generation persistence values, «*, which were predicted by the
stationary distribution discussed in the previous section. The conditions studied here
correspond to Figure 2A and C.
Figures 3 and 4 summarize the results of
these simulations. Comparing the predictions
of Table 4 with the figures shows that the
stationary distribution, which is completely
determined by the within-generation persistence term, w, is a good predictor of the simulation results. However, the stationary distribution consistently fails to predict the
correct results in the neighborhood of the
absolute fixity point—i.e., (w, /3) « (1,1). Gen-
Table 4
Summary of conditions used in simulations
Predicted «• values
c = 0.0 (Figure 3)
c = 0.055 (Figure 4)
s
s
a
0.3
0.7
0.3
0.7
0.1
0.5
0.8
0.5
0.6983
0.8759
1.0
0.8245
0.6445
0.5
0.8189
1.0
1.0
1.0
0.8666
erally speaking, learning does not evolve at
the absolute fixity point, as one might expect.
However, there is one case in which learning is selectively neutral at the absolute fixity
point: when the varying resource is fixed in
the good state and the direct costs of learning,
c, are 0. This is equivalent to saying that there
are no costs of learning, because in this case
there are neither direct nor opportunity costs
of learning. If the varying resource were fixed
in the bad state, then there would be an op= 0.7
s = 0.3
a = 0.1
CO
Figure 3
a = 0.5
0 .5
0 . 5'
a = 0.8
0 .5
0 .5!
The relationship between
within-generation persistence
(01), between-generation
persistence (0), and the
average gene frequency of the
learning allele (r), as found by
simulation. This figure shows
the results when there are no
direct costs of learning, c =
0.0, and when the varying
resource is initially in the bad
state. The variable 1
represents the relative fitness
obtained by exploiting the
stable resource for one time
unit, and the variable a
represents the fitness
obtained from exploiting the
bad state of the varying
resource. The variable a is
expressed in units of s, so, for
example, a = 0.5 implies that
the bad state of the varying
resource is half as good as s.
Stephens • Evolution of animal learning
83
= 0.7
a = 0.1
r
0 .5
0.5:
CO
Figure 4
The relationship between
within-generation persistence
(a)), between-generation
persistence (0), and the
average gene frequency of the
learning allele (r), as found by
simulation. This figure shows
the results when there is a
small direct cost of learning, c
= 0.055, and when the
varying resource is initially in
the bad state. The variable s
represents the relative fitness
obtained by exploiting the
stable resource for one time
unit, and the variable a
represents the fitness
obtained from exploiting the
bad state of the varying
resource. The variable a is
expressed in units of s, so, for
example, a = 0.5 implies tha"t
the bad state of the varying
resource is half as good as s.
a = 0.5
a = 0.8
0.S
7C0
.6
portunity cost to the learning strategy. This is
true because the learning strategy loses relative fitness by checking the varying resource,
whereas the nonleamingfix-on-stablestrategy
does the right thing immediately. However,
there are no opportunity costs to learning if
the varying resource is fixed in the good state
because the learning strategy behaves in exactly the same way as the nonleaming fix-onvarying strategy.
RESULTS AND DISCUSSION
The model presented above is best viewed as
a worked example that shows how predictability, unpredictability, and value might combine
to affect the evolution of learning. My model
emphasizes the pattern of environmental
change—specifically, how this pattern relates
to an animal's life history. Table 1, and the
logic behind it, motivated this model, and it
is helpful to compare the results to Table 1.
The model has revealed several points that are
consistent with Table 1, some that seem to
84
CO
CO
Behavioral Ecology Vol. 2 No. 1
disagree with the table, and others that seem
to bear no particular relationship to it.
Results consistent with Table 1
When the critical within-generation persistence value, as predicted by the stationary distribution, is strictly between Vi and 1 (V£ <
or* < 1), then the results, as shown in Figures
3 and 4 are roughly in agreement with Table
1: learning does not evolve when there is no
within-generation persistence. Moreover,
learning generally does not evolve in the absolutefixitycase (although it can have neutral
selective value). There are no conditions in
which learning is more likely to evolve than
when between-generation persistence is low
and within-generation persistence is high.
Results different from Table 1
The result most strongly contradicting Table
1 is that learning can evolve when there is no
within-generation persistence. This occurs
when both s and a are low [e.g., the (a, s) —
(0.1, 0.3) case in Figures 3 and 4], and these
low s and a values coincide with cases in which
the critical within-generation persistence value, a*, predicted by the stationary distribution
is less than V4. Second, Table 1 suggests that
the absolute fixity case represents a quarter
of the parameter space, but the model indicates that the absolute fixity case is very restricted and special. Indeed, a very small
amount of "change" is required to promote
the evolution of learning. This has important
implications for the comparative study of
learning. Investigators may have better luck
looking for the absence of learning abilities
in highly variable environments than in relatively fixed environments because one may
need data from several lifetimes of observation to establish that an environment is fixed
enough to discourage learning.
Results that supplement Table 1
The model reveals two details that seem to
have no particular bearing on Table 1. The
first of these is the dominance of the withingeneration persistence term, «. This result is
particularly interesting in contrast to the conventional argument that "learning is an adaptation to change." This model suggests that
there are two qualitatively different regions in
persistence space, the <w-/3 plane. In the neighborhood of absolute fixity, increasing environmental change does promote the evolution of learning as the conventional argument
suggests; moreover, in this region it does not
matter whether this change comes within or
between generations. However, the conventional wisdom must be modified in a curious
way: change promotes learning in environments that change very little. Outside the absolute fixity neighborhood, increasing fixity
within generations («) promotes the evolution
of learning and the degree of change or fixity
between generations (/8) is totally immaterial.
The key to understanding the asymmetric effects of the two persistence terms is the observation that within-generation change can
substitute for between-generation change in
providing the change necessary to promote
the evolution of learning, but the environmental regularity that promotes learning must come
within an organism's lifetime.
It is equally incomplete and misleading to
argue that the function of learning is the detection of regularity as it is to flatly state that
learning is an adaptation to change. Any complete model of learning must include both
change and regularity. The present model
suggests that the most correct statement one
can make is that learning is an adaptation to
within-lifetime regularity and some environ-
mental unpredictability; this unpredictability
may occur either within or between generations.
Why aren't u and 0 the same?
This model treats within- and between-generation persistence as independent entities.
Although the processes controlling change
within generations could be the same as those
that control change between generations, this
is not necessarily the case. Consider the life
history of a temperate-zone insect with annual, nonoverlapping generations: the processes that relate the environment's state in
the autumn to its state in the spring may well
be different from the processes that relate the
environment's state on one summer day to its
state on the next. Similarly, dispersal may
weaken the predictive relationship between
parental and offspring environments.
Treating within- and between-generation
persistence as separate entities is the most
general approach. If we knew that a particular
function related within-generation persistence to between-generation persistence, then
we could use the results derived here for arbitrary as and fi to find the effects of this relationship. My results suggest that learning is
most likely to evolve at intermediate levels of
environmental persistence when the relationship between within- and between-generation
persistence satisfies the following conditions:
(1) /3 increases if u increases and (2) /3 = 1 if
oj = 1. The suggestion, made by earlier workers (Johnston and Turvey, 1980; Slobodkin
and Rapoport, 1974), that learning is an adaptation to intermediate levels of environmental predictability, is, therefore, a special
case of this model that applies when betweengeneration persistence is related to withingeneration persistence in a particular way. It
remains to be seen how common this type of
relationship is.
By separating within- and between-generation persistence terms, my model has shown
that it is not enough to consider overall environmental predictability; one must consider
the relationship between predictability and an
organism's life history. To see the significance
of this relationship, it may help to consider
what a model based on the idea that learning
is most likely to evolve at intermediate levels
of overall predictability (Johnston and Turvey,
1980; Slobodkin and Rapoport, 1974) might
predict. Under the assumptions of the present
model, one can derive a measure of overall
environmental persistence by considering the
mean number of generations for which the
varying resource will remain in a given state.
The probability that the varying resource will
stay the same from one generation to the next
Stephens • Evolution of animal learning
85
is w/3 and the probability that it will change is
1 — w/3. So, the number of generations of
constancy is a geometrically distributed random variable with mean
a>/S
1
-
(4)
On average, an environment with no withingeneration persistence (co = W) and complete
between-generation persistence (/? = 1) will
have the same overall persistence as an environment with complete within-generation
persistence (w = 1) and no between-generation persistence (0 = Vi). Indeed, both of these
environments would stay the same for an average of one generation (Equation 4), but in
the first instance the changes occur within
generations, while in the second instance they
occur between generations. These two cases
are clearly very different in die context of the
model developed here, aldiough any model
based on overall environmental predictability
would miss tiiis important distinction.
Although I have emphasized the effects of
predictability on the evolution of learning, it
should be clear that the values associated with
die resources that are available for an organism to "learn about" play an important role
in die evolution of learning. My model offers
a simple result: learning is most useful when
the two nonlearning strategies yield roughly
equal fitnesses. I speculate that diis result may
be general because it seems to stem from the
simple fact diat learning is a mixed strategy
that is affected by changes in underlying value
terms (e.g., 5 and b), but it is always affected
less strongly by these changes dian is an alternative strategy.
Contrasting models
Behavioral and evolutionary ecologists have,
of course, considered many of issues diat arise
in this paper before. Two categories of models
seem especially relevant, tracking and phenotypic plasticity.
Tracking and incomplete information
Foraging dieorists have long been concerned
widi die so-called incomplete informadon
problems in which decision-makers must evaluate alternadves in terms of bodi potential
food gain and potential information gain (see
Stephens and Krebs, 1986, for review). Although incomplete information problems arise
diroughout behavioral ecology,tiieyhave been
most widely studied in foraging dieory. Botii
behavioral ecologists (e.g., Pulliam, 1981) and
animal psychologists (e.g., Kamil, 1987) have
argued diat this literature might be a starting
86
Behavioral Ecology Vol. 2 No. 1
point for interaction between animal psychology and behavioral ecology. Indeed, as
Shetdeworth (1984) and Kamil (1987) have
argued, diis literature helps one see die natural and economic context of many learning
phenomena. For example, models in this literature (e.g., Green, 1980; MacNamara, 1982;
Oaten, 1977) ask whedier one should expect
foragers to attend to experience diat might
reveal somediing about patch quality.
However, most of diis literature reveals little about die effects of environmental change
on learning. This is because most of diese
models simply do not consider differing degrees of change, aldiough diere is invariably
some spatial or temporal uncertainty in each
of diese incomplete information models. The
exception is die literature of environmental
tracking (Arnold, 1978; Bobisud and Potratz,
1976; Estabrook and Jespersen, 1974; Shettleworth et al., 1988; Stephens, 1987; Tamm,
1987). The tracking literature considers die
economics of periodically checking die state
of a varying resource, and it explicitly considers differing degrees of variability. Moreover,
diis literature considers learning in the sense
diat die forager is free eidier to adopt a nontracking strategy or to use its experience to
determine its behavior. The tracking literature considers only within-generation effects,
and it finds that "keeping track" is most
worthwhile when environmental persistence is
highest (Stephens, 1987) as die present model
would lead one to expect.
Learning and phenotypic plasticity
The present model views learning as a type of
phenotypic plasticity, and, as argued above,
diis view is consistent widi die common definition of learning as die modification of behavior by experience. Aldiough diere can be
litde argument diat die lessons of die present
model apply to learning, diey may also apply
to odier phenomena widiin die broad category of phenotypic plasticity.
Aldiough phenotypic plasticity has attracted much attention recendy (for reviews see
Stearns, 1989, and accompanying papers;
West-Eberhard, 1989), relatively litde of this
recent work focuses on die question of die
origin of phenotypic plasticity (exceptions are
Harvell, 1986; Lively, 1986; Via and Lande,
1985). However, like die incomplete information literature, few of diese papers have
considered varying degrees of environmental
predictability. For example, die classical work
of Levins (1968) considers only two types of
environmental change (coarse versus finegrained). Many papers in diis literature conclude diat phenotypic plasticity ought to be
expected under very broad conditions (Caval-
li-Sforza, 1974; Levins, 1968; Via and Lande,
1985). When viewed in the context of the
present model, it is clear that many of these
papers have considered types of environmental change that are strongly biased in favor of
phenotypic plasticity. For example, Via and
Lande (1985) consider a case in which individuals spend their entire lives in the same
unchanging habitat, but their offspring disperse at random to different habitats where
the offspring, in turn, are supposed to spend
their entire lives. Thus, Via and Lande have
considered a case of complete within-generation predictability and low between-generation predictability; so their conclusion that
phenotypic plasticity ought to be expected almost universally is of dubious generality.
Moreover, this pattern of environmental
change is commonly assumed in the phenotypic plasticity literature (e.g., Cavalli-Sforza,
1974; Lively, 1986).
The incomplete information and phenotypic plasticity literatures have both occasionally
fallen into the trap of overgeneralizing about
the effects of environmental change. When
one says that a given feature of the environment is constant, this can mean only one thing.
However, if one says that a feature of the
environment changes, this can mean practically anything other than constancy: pure randomness, wave-like behavior, chaos, etc. My
model makes the point that it is dangerous to
try to deduce the general effects of change
from a model that considers only one or two
patterns of change. The details of how the
environment changes can have important effects on the evolution of phenotypic plasticity.
An evolutionary theory
of learning?
Many workers have called for a more evolutionary approach to the study of animal learning (Kamil and Yoerg, 1982; Shettleworth,
1983; Staddon, 1983), and these students of
the "biological boundaries" of learning have
focused much of their attention on phenomena, such as the celebrated Garcia effect (Garcia and Koelling, 1966), that appear to require
an evolutionary explanation (Seligman, 1970;
Seligman and Hager, 1972; Shettleworth,
1984). There has been relatively little discussion about what kinds of learning specializations might be expected by an evolutionary
theory of learning. Indeed, many theoretical
treatments of "the evolution of learning" are
actually amplifications of the strained analogy
between trial-and-error learning and the process of natural selection (Plotkin and OdlingSmee, 1979; Slobodkin and Rapoport, 1974;
Staddon and Simmelhag, 1971). The simple
model presented here represents a prelimi-
nary step toward an evolutionary theory of
learning.
Too simple?
I have made the simplest assumptions that
allow a meaningful discussion of learning.
These assumptions may be too restrictive (they
are certainly numerous). One would like to
build a model that allows for many periods
within a generation, and perhaps for more
complicated transition processes from state to
state, or perhaps a diploid genetic system. The
assumption that a generation consists of two
periods is an especially useful simplification
because it allows unambiguous specification
of what it means to "leam": a learner attends
to experience in the first period to determine
its behavior in the second. One assumption
that may exclude many important instances of
learning is the assumption of nonoverlapping
generations. This assumption excludes learning by so-called cultural transmission (Boyd
and Richerson, 1985,1988; Cavalli-Sforza and
Feldman, 1981; Rogers, 1988); in learning by
cultural transmission between-generation
regularity may be significant even though the
present model (of learning by individual experience) suggests it is unimportant.
A persistent question
Although there is much that can be done to
pursue an evolutionary theory of learning, our
ignorance of how natural environments
change and persist is a hurdle that must be
overcome. It is all very well to assert that animals should be able to learn about states that
change between generations and persist within generations; but what environmental states
follow this pattern? A crucial and seldom studied question in evolutionary studies of learning is the following: How do environments
change and persist, and how are these patterns of change and persistence related to animal life histories?
APPENDIX
The expected value of the
logarithm of fitness
This appendix justifies (1) using the stationary
distribution of the Markov chain (Table 3) to
calculate the expected value of the logarithm
of fitness, and (2) using these expectations to
deduce the long-term outcome of the model.
Although many students of population genetics have presented models demonstrating
that natural selection will maximize the expectation of the logarithm of fitness (see Karlin and Lieberman, 1974, for review and ci-
Stephens • Evolution of animal learning
87
tations), I include this appendix in an effort
to make this paper self-contained, and because I have been unable to find a version of
this argument that is direcdy applicable to the
case studied here (haploid inheritance, three
alleles, infinite population size, Markov chain
controls fitness values). This appendix is based
on the arguments of Karlin and Lieberman
(1974), who studied die diploid case for infinite population sizes.
Step 1. The difference equations in Table 2
can be rewritten in the form
Pn+l
=
Step 3. Each of die fitness terms can, in
general, take one of four possible values depending on the environmental state (GG, GB,
BG, or BB). However, we know diat when n
is large die Markov chain diat governs die
environmental state approaches a stationary
distribution diat specifies die long-term relative frequencies of die four environmental
states (and hence die corresponding fitness
values for each allele). Let E\\o$wk)] denote
the expected value of die logarithm of fitness
for the k allele (k = {I, s, v}); where die stadonary distribution is used to calculate die
expectation. For large n die stationary distribution allows us to write
(13)
i-0
(5)
log(u>,») - nEDogdi;,)],
(14)
) - nEDogdi;,,)].
(15)
i-0
where the fitness values w(j\ win\ w\n) are superscripted to indicate that a different realization of each value is possible in each generation. In the present model the stochastic
process governing these fitness values is the
Markov chain shown in Table 3.
Step 2. A transformation of the system of
difference equations above can be obtained
by writing
= rn/qn
vn = pn/rn.
'K.
(7)
(8)
or taking logarithms we have
log(u.+ 1) =
(9)
(10)
Using this result we can write
- 2 logfo®) + log(«o),
(ID
=2
i-0
n
- 2 logW'1) + log (wo).
i-0
88
Behavioral Ecology Vol. 2 No. 1
log(u B+1 ) = n(E\\og(w,)]
- Eflogdt;,)]) + log(u0),
(16)
- EPog(ni,)]) + logfoo),
(17)
(6)
This transformation is suggested because the
three terms in Equation 5 have die same denominator. By taking advantage of this fact,
we find that
^)un,
Therefore, for large n, Equations 11 and 12
can be written as
(12)
as n approaches infinity die absolute values of
bodi log(uH+1) and log(wn+1) also approach infinity, and if £[log(w,)] is greater tiian bodi
E\\og(tv,)] and E\\og(wv)], dien log(un+1) ap-
proaches positive infinity and log(u,+1) approaches negative infinity.
Step 4. Notice diat die condition in which
die learning allele is fixed rn = 1, pn = qn = 0
is equivalent to die condition un = oo, vn = 0,
or log(uj = oo, log(un) = — oo. Therefore die
argument above shows diat we can expect die
learning allele to evolve to fixation when
£[log(u/,)] is greater dian bodi E\\og(w,)] and
E\\og{wa)]. Exacdy analogous results can be
found for die odier two alleles by choosing
different transformations in Step 2 above.
I thank Tom Caraco, Steve Dunbar, Tom Getty, Tony
Joern, Timothy Johnston, Al Kami], Don Kramer, Marc
Mangel, Sara Shettleworth, John Staddon, and Tony Zcra
for their comments. The completion of this work was
partially supported by the University of Nebraska, the
University of Nebraska Foundation, and a Presidential
Young Investigator Award (BNS-8958228) from the National Science Foundation. I am grateful to these organizations for their support.
REFERENCES
Arnold SJ, 1978. The evolution of a special class of modifiable behaviors in relation to environmental pattern.
Am Nat 112:415-427.
Bobisud LI, Potratz CJ, 1976. One-trial versus multi-trial
learning for a predator encountering a model-mimic
system. Am Nat 110:121-128.
Boyd R, Richerson PJ, 1985. Culture and the evolutionary
process. Chicago: University of Chicago Press.
Boyd R, Richerson PJ, 1988. An evolutionary model of
social learning: the effects of spatial and temporal variation. In: Social learning: psychological and biological
perspectives (Zentall TR, Galef BG Jr, eds). Hillsdale,
New Jersey: Erlbaum; 29^18.
Cavalli-Sforza L, 1974. The role of plasticity in biological
and cultural evolution. Ann NY Acad Sci 231:43-59.
Cavalli-Sforza L, Feldman M, 1981. Cultural transmission
and evolution. Princeton, New Jersey: Princeton University Press.
Estabrook GF, Jespersen DC, 1974. The strategy for a
predator encountering a model-mimic system. Am Nat
108:443-457.
Feller W, 1950. An introduction to probability theory
and its applications. New York: Wiley.
Garcia J, Koelling RA, 1966. Relation of cue to consequence in avoidance learning. Psychonomic Sci 123124.
Gillespie J, 1977. Natural selection for variances in offspring numbers: a new evolutionary principle. Am Nat
111:1010-1014.
Gray, L, 1981. Genetic and experiential differences affect
foraging behavior. In: Foraging behavior ecological,
ethological and psychological approaches (Kamil AC,
Sargent TD, eds). New York: Garland STPM Press;
455-473.
Green RF, 1980. Bayesian birds: a simple example of
Oaten's stochastic model of optimal foraging. Theor
Popul Biol 18:244-256.
Harvell CD, 1986. The ecology and evolution of inducible
defenses in a marine bryozoan: cues, costs and consequences. Am Nat 128:810-823.
Johnston TD, 1982. The selective costs and benefits of
learning: an evolutionary analysis. Adv Study Behav 12:
65-106.
Johnston TD, Turvey MT, 1980. An ecological metatheory for theories of learning. In: The psychology of learning and motivation: advances in research and theory,
vol. 14 (Bower GH, ed). New York: Academic Press;
147-205.
Kamil AC, 1987. A synthetic approach to the study of
animal intelligence. In: Comparative perspectives in
modern psychology (Leger DW, ed). Lincoln: University of Nebraska Press; 257-308.
Kamil AC, Yoerg SJ, 1982. Learning and foraging behavior. In: Perspectives in ethology, vol. 5 (Bateson
PPG, Klopfer PH, eds). New York: Plenum; 325-364.
Karlin S, Lieberman U, 1974. Random temporal variation
in selection intensities: case of large population size.
Theor Popul Biol 6:355-382.
Karlin S, Lieberman U, 1975. Random temporal variation
in selection intensities: one-locus two-allele model. J
Math Biol 2:1-17.
Levins R, 1968. Evolution in changing environments.
Princeton, New Jersey: Princeton University Press.
Lively CM, 1986. Canalization versus developmental conversion in a spatially variable environment Am Nat
128:561-572.
Mackintosh NJ, 1983. General principles of learning. In:
Animal behaviour vol. 3. Genes, development and
learning (Halliday T, Slater PJB, eds). New York: W.
H. Freeman; 149-177.
McNamara JM, 1982. Optimal patch use in a stochastic
environment. Theor Popul Biol 21:269-288.
Oaten A, 1977. Optimal foraging in patches: the case for
stochasticity. Theor Popul Biol 12:263-285.
Papaj DR, Prokopy RJ, 1989. Ecological and evolutionary
aspects of learning in phytophagous insects. Annu Rev
Entomol 34:315-350.
Plotkin HC, Odling-Smee FJ, 1979. Learning, change,
and evolution: an enquiry into the teleonomy of learning. Adv Study Behav 10:1-41.
PuUiamHR, 1981. Learning to forage optimally. In: Foraging behavior: ecological, ethological and psychological approaches (Kamil AC, Sargent TD, eds). New York:
Garland STPM Press; 379-388.
Rogers AR, 1988. Does biology constrain culture? Am
Anthropol 90:819-831.
Seligman MEP, 1970. On the generality of the laws of
learning. Psych Rev 77:406-418.
Seligman MEP, Hager JL (eds), 1972. Biological boundaries of learning. New York; Appleton-Century-Crofts.
Shettleworth SJ, 1983. Function and mechanism in learning. In: Advances in analysis of behaviour, vol. 3 (Zeiler
MD, Harzem P, eds). London: Wiley; 1-39.
Shettleworth SJ, 1984. Learning and behavioural ecology.
In: Behavioural ecology: an evolutionary approach, 2nd
ed (KrebsJR, Davies NB, eds). Oxford: Blackwell Scientific; 170-194.
Shettleworth SJ, KrebsJR, Stephens DW, Gibbon J, 1988.
Tracking a fluctuating environment: a study of sampling. Anim Behav 36:87-105.
Slobodkin LB, Rapoport A, 1974. An optimal strategy of
evolution. Q Rev Biol 49:181-200.
Staddon JER, 1983. Adaptive behavior and learning. New
York: Cambridge University Press.
Staddon JER, Simmelhag VL, 1971. The superstition experiment: a reexamination of its implications for the
principles of adaptive behavior. Psych Rev 78:3—45.
Stearns SC, 1989. The evolutionary significance of phenotypic plasticity. Bioscience 39:436—444.
Stephens DW, 1987. On economically tracking a variable
environment. Theor Popul Biol 32:15-25.
Stephens DW, KrebsJR, 1986. Foraging theory. Princeton, New Jersey: Princeton University Press.
Tamm S, 1987. Tracking varying environments: sampling
by hummingbirds. Anim Behav 35:1725-1734.
Via S, Lande R, 1985. Genotype-environment interaction and the evolution of phenotypic plasticity. Evolution 39:505-522.
West-Eberhard MJ, 1989. Phenotypic plasticity and the
origins of diversity. Annu Rev Ecol Syst 20:249-278.
Stephens • Evolution of animal learning
89