Learning rules for optimal selection in a varying

Behavioral Ecology
doi:10.1093/beheco/arl008
Advance Access publication 19 June 2006
Learning rules for optimal selection in a
varying environment: mate choice revisited
Edmund J. Collins,a John M. McNamara,a and David M. Ramseyb
Centre for Behavioural Biology and Department of Mathematics, University of Bristol, University
Walk, Bristol BS8 1TW, United Kingdom and bInstitute of Mathematics and Computer Science,
Wroc1aw University of Technology, Wybrzeźe Wyspiańskiego 27, PL-50-370 Wroc1aw, Poland
a
The quality of a chosen partner can be one of the most significant factors affecting an animal’s long-term reproductive success.
We investigate optimal mate choice rules in an environment where there is both local variation in the quality of potential mates
within each local mating pool and spatial (or temporal) variation in the average quality of the pools themselves. In such
a situation, a robust rule that works well across a variety of environments will confer a significant reproductive advantage. We
formulate a full Bayesian model for updating information in such a varying environment and derive the form of the rule that
maximizes expected reward in a spatially varying environment. We compare the theoretical performance of our optimal learning
rule against both fixed threshold rules and simpler near-optimal learning rules and show that learning is most advantageous
when both the local and environmental variances are large. We consider how optimal simple learning rules might evolve and
compare their evolution with that of fixed threshold rules using genetic algorithms as minimal models of the relevant genetics.
Our analysis points up the variety of ways in which a near-optimal rule can be expressed. Finally, we describe how our results
extend to the case of temporally varying environments. Key words: Bayesian model, evolution of learning rules, genetic algorithm,
sequential search, sexual selection, spatially varying environment. [Behav Ecol 17:799–809 (2006)]
he quality of a chosen partner can be one of the most
significant factors affecting an animal’s long-term reproductive success. As a result, the performance of different mate
choice rules has been, and continues to be, of central interest
to behavioral ecologists (Janetos 1980; Parker 1983; Real 1990;
Simao and Todd 2002; Hutchinson and Halupka 2004; Wong
and Candolin 2005).
Here we investigate optimal mate choice rules in a varying
environment. We focus on modeling the case where only one
sex is discriminating. This restricted focus enables us to clearly
set out the principles underlying the structure and evolution of
robust rules and the logic of why and when learning can be
advantageous. Thus, we aim to combine precise quantitative
results with qualitative motivation. Moreover, we expect the intuition and insight developed to apply to more general optimal
selection problems and also to carry over to more realistic game
theoretic mate selection models where both sexes are choosy.
For simplicity, we refer throughout to the choosy sex as
female; our analysis applies equally well when the roles are
reversed (Real 1990). We assume that each individual female encounters a possibly unlimited sequence of potential
mates, whose qualities are assumed to be chosen at random
from a given (local) distribution, and we assume that costs of
some form are associated with searching (Alatalo et al. 1988;
Milinski and Bakker 1992). Under these assumptions, at any
given time the female will have an acceptance threshold that
will determine whether or not she accepts the current male.
Previous mate choice models have in general assumed that
each female knew (i.e., was fully adapted to) the local distribution of male qualities, which in turn implicitly assumed that
this local distribution remained constant from generation to
generation (Janetos 1980; Parker 1983; Real 1990). For this
T
Address correspondence to J.M. McNamara. E-mail: john.mcnamara@
bristol.ac.uk.
Received 1 December 2005; revised 28 April 2006; accepted 12 May
2006.
The Author 2006. Published by Oxford University Press on behalf of
the International Society for Behavioral Ecology. All rights reserved.
For permissions, please e-mail: [email protected]
setting, Real (1990) gives a comprehensive treatment of both
sequential search rules and ‘‘best-of-n’’ rules with costs. He
shows that a sequential search rule is optimal in terms of
expected reward and will always dominate the best-of-n rule
when the possible number of encounters is unlimited.
In reality, the distribution of male qualities may differ from
place to place and from season to season (Svensson and
Sinervo 2004). A mate choice rule that is optimal when qualities have one distribution may perform very badly when they
have another (Mazalov et al. 1996; Luttbeg 2002). A female
that inherits a robust rule that works well across a variety of
environments will thus have a significant reproductive advantage. There may be even more advantage in an adaptive rule
that learns from experience to respond optimally to the local
distribution, and there is experimental evidence that female
choice is affected by the quality of previously encountered
males (Bakker and Milinski 1991; Milinski and Bakker 1992;
Downhower and Lank 1994; Collins 1995).
Real (1990) recognized both the appropriateness of, and
the difficulties inherent in, a Bayesian approach to this adaptive choice model but did not address its solution. Since
then, there has been only limited progress in characterizing
and evaluating the form of the optimal rule in the presence
of variability and uncertainty. Apart from some simulation
comparisons (Mazalov et al. 1996; Luttbeg 2002; Hutchinson
and Halupka 2004), there are few clear theoretical results
(Dombrovsky and Perrin 1994; Mazalov et al. 1996). This paper is the first to give a full Bayesian treatment of the optimal
learning rule for mate choice in a varying environment.
Our model incorporates 2 sources of variability that may lead
a female to be uncertain about the quality of a given male relative to others she might meet—variation of individual male
qualities within the local pool of potential mates (residual variability) and variation of the average quality of the pools themselves, say from place to place in a given season or from season to
season (environmental variability). The different types of rules
allowed under our model range from rules that specify a
completely fixed acceptance threshold, or rules where the
Behavioral Ecology
800
acceptance threshold may change with each encounter but only
in some predetermined way that is independent of the actual
qualities encountered (but may, e.g., depend on the number of
encounters), through to much more complex fully adaptive or
learning rules where the acceptance threshold at any encounter
depends crucially on the female’s own past experience.
Within this framework, we are able to fully characterize and
quantify the optimal (learning) rule. Perhaps more significantly, we are able to draw clear qualitative conclusions about
how the relative strengths of the 2 sources of variation affect
the relative success of each type of rule and also how these
relative strengths affect the trade-off between the complexity
of a rule and its expected reward in the presence of evolutionary selection pressures. Thus, our emphasis is on optimality
rather than on heuristics (Simao and Todd 2002; Hutchinson
and Halupka 2004).
More complicated models are possible, which allow for game
theoretic aspects, perhaps due to both sexes being choosy or to
the distribution of male qualities changing over the mating
season as females remove the best males (McNamara and
Collins 1990; Collins and McNamara 1993; Johnstone 1997;
Bergstrom and Real 2000) or allow for courtship (Wong and
Candolin 2005) or imperfect observation of male qualities, so
that female choice rules need to take into account the costs and
benefits of time spent in further assessment or reassessment of
previously encountered males (Luttbeg 1996; Fawcett and
Johnstone 2003b). We do not address these issues here.
We start by specifying the details of our model and developing a clearer intuition of the advantages and disadvantages of
rules with a fixed acceptance threshold, gaining insight from
the asymmetric shape of the plot of expected net gain against
threshold. We formulate a full Bayesian model for updating
information in a varying environment and derive the form of
the corresponding optimal learning rule that maximizes expected reward in a spatially varying environment. We explore
when learning is valuable—either theoretically valuable or
valuable in a more practical setting that takes into account
robustness and adaptability under evolutionary pressures.
Thus, we first identify the conditions that might favor learning
under our model by comparing the theoretical performance
of our optimal learning rule against both fixed threshold rules
and simpler near-optimal learning rules. We then consider
how optimal simple learning rules might evolve and compare
their evolution with that of fixed threshold rules using genetic
algorithms (GAs) as minimal models of the relevant genetics.
Our analysis points up the variety of ways in which a nearoptimal rule can be expressed. Finally, we describe how our
results extend to the case of temporally varying environments.
MODELS FOR MATE CHOICE IN A VARYING
ENVIRONMENT
Our starting point is a model similar to that of previous authors
(Janetos 1980; Parker 1983; Real 1990). Each potential mate is
characterized by a value representing his quality or reproductive fitness, and this value can be immediately recognized by
the female. The female encounters a sequence of males with
individual fitness values x1, x2, x3, . . ., where there is no limit to
the potential number of encounters. She sequentially accepts
or rejects each male according to his fitness value and cannot
recall previously rejected males. The female stops searching if
and when she finally accepts a male. We assume that individual
male fitness values can be modeled as independent observations from the N(l, c2) distribution, that is, the normal distribution with mean l and variance c2. For clarity of presentation,
we assume the time between encounters, measured in appropriate units, has constant value 1. Finally, we assume that the
female incurs a fixed positive search cost c for each time unit
that elapses up to and including the time a male is finally
chosen, reflecting time or energetic costs incurred. The overall
net fitness gain to the female is the fitness value of the male
eventually chosen minus the total search costs.
Previous authors have used this basic model to study optimal mate choice rules for females in a constant environment,
where each female in each generation independently encounters males with fitness values drawn from the same N(l, c2)
distribution with known values of l and c2. We extend this
model to study optimal mate choice rules in environments
where the value of l experienced by different females can
vary, either within each generation or across generations.
We now define a spatially varying environment to be one in
which the value of l varies from female to female within
a given generation, according to a normal distribution with
mean l0 and variance r20 : The values of c2, l0, and r20 are
assumed to be known and constant from generation to generation. We can gain insight into the spatially varying model
by interpreting the average fitness values seen by females as
being influenced by factors specific to the local environment
or patch, so that l represents a local average male fitness
value. Indeed, we will sometimes speak of males ‘‘in a given
location,’’ though this is by no means the only interpretation
of how this variability might occur.
We define a temporally varying environment to be one in
which the value of l is the same for all females within a given
generation but now varies from generation to generation according to a normal distribution. Again we write l0 and r20 for
the mean and variance of this distribution, and we assume that
the values of c2, l0, and r20 are known and constant from generation to generation. The temporally varying model describes
cases where the average male fitness value is essentially the
same across locations in a given generation, but where this
average value changes appreciably from generation to generation because of seasonal factors that affect all locations equally.
In both these models, we interpret l0 as the long-run average male fitness value, we interpret r20 as the environmental
variance (the variance of l about this value l0, either within
generations or across generations according to the context),
and we interpret c2 as the residual variance (the variability of
local male fitness values around their local average).
In many cases, quantities of interest will depend linearly on
l0 and depend on r20 and c2 only through the value of the
ratios c/r0, c/c, or r0/c. In the numerical examples used to
illustrate our results, we will therefore mainly focus on fixed
values for the parameters l0 and c (taking l0 ¼ 5 and c ¼ 0.1)
and a fixed range of values for the parameters r20 and c2 (r0 ¼
0.5, 1, 2 and c ¼ 0.5, 1, 2), giving an appropriate range of
illustrative values.
Finally, we note that many of our qualitative results remain
the same if we relax the distributional assumptions to allow
distributions with roughly the same shape as the normal or if
we assume mortality or discounted costs (Real 1990). Similarly, the analysis is essentially unchanged if we allow the times
between encounters to be continuous variables, randomly
chosen from some given distribution.
FIXED THRESHOLD RULES IN CONSTANT AND
VARYING ENVIRONMENTS
A mate choice rule that is particularly attractive because of its
simplicity is a fixed threshold rule with threshold T of the form:
accept a prospective mate with fitness value x
if and only if x T
ð1Þ
In particular, Real (1990) showed that a fixed threshold rule
is optimal when the environment is constant, and the value
Collins et al.
•
Learning rules for optimal selection
801
Figure 1
Plot of the expected overall net fitness gain V(T) against the fixed
threshold T, for a range of values of the search cost c. For clarity, the
negative gains are omitted. For each value of c, the maximum value
of V(T) occurs at the value T * corresponding to the point of intersection with the line of unit slope through the origin, as shown.
The parameter values are l ¼ 5, c ¼ 1, and c ¼ (from bottom left) 1,
0.5, 0.1, 0.01, 0.001.
of l is fixed (corresponding to the case r0 ¼ 0 in our model).
Here we review the constant environment case in order to
develop insight into how the performance of these fixed
threshold rules is affected when the environment varies.
In a constant environment, the mate choice problem is stationary in that the expected net future gain of continuing under an optimal rule is constant, independent of the current
stage n and the current and previous observed male values.
This follows from the fact that the values of successive males
are independently drawn from the same known distribution
and that recall is not allowed. Denote the expected net future
gain under an optimal rule by V *. As with any similar sequential search model, an optimal female action at each stage is to
accept the current prospective male if and only if the reward
from mating is greater than or equal to the expected net future
gain under an optimal rule. It follows that the optimal rule in
a constant environment is a fixed threshold rule and that the
optimal threshold T * at each stage is exactly equal to V *.
In our model, we have specifically excluded recall of previously rejected males as an option. However, because the optimal rule uses a fixed threshold, we see that in a constant
environment, the recall of previously rejected potential mates
would be of no benefit; a male deemed unacceptable when first
encountered would have a fitness value below the fixed threshold and thus would always be unacceptable in the future.
An explicit expression for the expected overall net fitness
gain, V(T), under a fixed threshold rule with threshold T, is
given by the formula
V ðT Þ ¼ l 1 c
/ððT lÞ=cÞ
1
c
;
1 UððT lÞ=cÞ
1 UððT lÞ=cÞ
ð2Þ
where / and U are, respectively, the probability density function and distribution function of the N(0, 1) distribution.
Here ð1 UððT lÞ=cÞÞ is the probability a randomly chosen
male will exceed the threshold T and 1=ð1 UððT lÞ=cÞÞ is
the expected search time (i.e., the number of males encountered up to and including the one accepted). Figure 1 shows
typical plots of V(T) for a range of c values; in each case, the
graph takes value l c for T small, increases slowly with increasing T, and then falls steeply for larger values of T.
Because T * ¼ V * ¼ V(T *), the values of T * and V * can be
read off from the graph as the coordinates of the point of
intersection with the line of unit slope through the origin.
To motivate the shape of the graph of V(T), note that V(T) is
the difference between the expected value of the male eventually accepted and the expected cost of searching. Both of these
terms are increasing functions of the threshold T. For T small,
the expected value of the male accepted takes a value near l,
and the expected search time takes a value near 1. As T increases, the expected value of the male accepted also increases,
initially at a faster rate than the expected search time (and
hence the expected search cost), so V(T) increases. Eventually,
the rate of increase of the expected search time (and cost)
becomes so large that the expected cost of using a higher
threshold starts to outweigh the increase in the expected value
of the male accepted, until finally the costs dominate the expected return.
The important feature of this graph is its emphatically asymmetric shape; it is quite flat for values of T less than the
optimal value T * but falls very steeply for values of T greater
than T *. The expected overall net gain for a female using
a fixed threshold less than T * is close to that under an optimal rule, even if T is appreciably less than T *. However, her
expected overall net gain drops sharply if she uses a fixed
threshold significantly greater than T *. We would therefore
expect that selective pressure on females using a fixed threshold would lead them to err on the low side rather than the
high side in choosing the threshold value.
When male values come from the N(l, c2) distribution, V *
and T * are given explicitly by
V * ¼ T * ¼ l 1 cW1 ðc=cÞ ;
ð3Þ
where WðxÞ ¼ /ðxÞ xð1 UðxÞÞ for real values of x (Real
1990). Thus, the fixed threshold used by the optimal rule is
of the form
T * ¼ l 1 d*:
ð4Þ
Here l is the mean male fitness value and d* ¼ cW1 ðc=cÞ is
a relative threshold representing how good a male has to be,
relative to the known mean, to be acceptable. Note that the
optimal fixed threshold T * and hence V * both increase linearly with l for fixed c and c.
Consider 2 females following optimal rules in 2 environments that differ only in the mean male fitness value, say
l1 and l2. Although the optimal thresholds T1* ¼ l1 1
cW1 ðc=cÞ and T2* ¼ l2 1cW1 ðc=cÞ would differ, the relative
thresholds and the search time distribution would be the
same, so the observed pattern of searching and acceptance
would look exactly the same in both cases. However, the actual
thresholds used and the expected value of the mate actually
obtained would differ by an amount l1 l2, reflecting the
difference in the overall mean fitness values.
Moreover, if by mistake a female in an environment where
the mean was l2 used the fixed threshold T1* that was optimal
for an environment where the mean was l1, then we see from
the shape of the graph of V(T) that she would do only slightly
worse than the optimal if l1 was less than l2 because then
T1* , T2*; but she would do very much worse if l1 was significantly greater than l2 as then T1* would be significantly
greater than T2*:
BAYESIAN MODEL FOR A VARYING ENVIRONMENT
Now consider a female searching for a mate in a spatial or temporally varying environment. Under the varying environment model, the local male fitness values are drawn from
Behavioral Ecology
802
the N(l, c2) distribution, where the local variance c2 is known
but the value of the local mean l is unknown. Before the
female encounters her first male, her only information about
l is that it has been randomly drawn from the N ðl0 ; r20 Þ distribution, where l0 and r20 are known.
Initially, her best estimate is to assume that l takes the value
l0. However, with each male she encounters, she gains information that she can use to update her estimate of l and more
generally update her overall uncertainty about the value of l.
This process of updating can be precisely quantified by
modeling the process in a Bayesian framework.
Under the Bayesian model, the distribution of l across different locations or generations can be interpreted as a subjective prior distribution, representing the female’s initial beliefs
about the unknown value of l in her locality. Our assumptions
imply that this distribution is N ðl0 ; r20 Þ; where l0 is now interpreted as the known prior mean of this subjective distribution and r20 is interpreted as the prior variance (so 1=r20 is the
prior precision, giving a measure of how closely her beliefs are
concentrated around the mean).
Each time the female encounters another male, Bayes theorem specifies how her beliefs prior to the encounter should
be updated and modified to give a posterior distribution that
incorporates the information provided by the newly observed
male fitness value. Consider a female who has just encountered her nth male. Denote the fitness values observed so far
by x1, . . ., xn, with mean xn ¼ ðx1 1 1xn Þ=n: Standard results
(DeGroot 1970) show that her posterior distribution is the
N ðln ; r2n Þ distribution, so her information state can be summarized by the posterior mean ln and the posterior variance
r2n (or posterior precision 1=r2n ), where for n ¼ 1, 2, 3, . . .
ln ¼ bn l0 1 ð1 bn Þ
xn
and
1
1
n
¼
1
r2n r20 c2
ð5Þ
and the weights bn have the form
c2 =r2
bn ¼ 2 2 0 :
c =r0 1 n
vations from the N(l, c2) distribution, where the value of l
varies independently from female to female (or location to
location) according to the N ðl0 ; r20 Þ distribution.
Because the randomness affects each female in the population independently of the other population members, it is
appropriate to base fitness measures on the overall net fitness
gain to each individual female (Houston and McNamara
1999). Thus, we define an optimal mate choice rule in a spatially varying environment to be one that maximizes the expected net fitness gain, where this expectation is taken first
over the male values experienced for each value of l and then
over the distribution of l across locations.
Recall that an optimal action for a female at each stage is to
accept the current prospective male if and only if the reward
from mating is greater than or equal to the expected net future
gain under an optimal rule. Consider a female who has just
encountered her nth male with observed fitness value xn and is
deciding whether to accept or reject this potential mate. We
saw in the previous section that, having observed xn, her current state of information is characterized by the values ln and
r2n : Define V *ðln ; r2n Þ to be the expected future net fitness
gain if she continues searching under an optimal rule, starting
from the state ðln ; r2n Þ: Then, the optimal action at stage n is
accept xn if and only if xn T *ðln ; r2n Þ;
where T *ðln ; r2n Þ ¼ V *ðln ; r2n Þ:
Following DeGroot (1970, p. 336–41), the optimal threshold for fixed c can be expressed as ln plus a term that depends
only on r2n ; while Equation 5 shows that, for fixed c2 and r20 ;
r2n itself depends only on n. Thus, for fixed c, c2, and r20 ; there
is a corresponding sequence of fixed values d(0), d(1),
d(2), . . ., such that the dependence of the optimal threshold
and optimal future net fitness gain on the current information state ðln ; r2n Þ has the simple form
V *ðln ; r2n Þ ¼ T *ðln ; r2n Þ ¼ ln 1 dðnÞ;
ð6Þ
We see that the actual value of ln changes randomly at each
encounter, depending, through xn, on the value of xn observed. However, the uncertainty about l, represented by
r2n ; decreases deterministically with each extra observation
because 1=r2n increases by 1/c2 irrespective of the observed
values x1, . . ., xn.
The weight bn given to the initial estimate l0 also decreases
deterministically with each extra observation, again independently of the observed values x1, . . ., xn. Note that the weight
depends only on the value of n and the relative values of the
residual (local) variance c2 and the environmental variance
r20 and not on their absolute values. As the local variability c2
decreases relative to r20 ; less weight is given to l0 and more
weight is given to the data. This is to be expected because c2
decreasing means the local data are less variable and hence
a more reliable indicator of l. Indeed, if c2 was very small, then
every male in a location would have roughly the same value,
and the first male value observed would be a much better indicator of local male values than l0. Alternatively, if r20 was very
small, then there would be very little variation in the value of l
seen by different females, and the problem would reduce to the
setting of a constant environment with known mean l0.
OPTIMAL MATE CHOICE IN A SPATIALLY VARYING
ENVIRONMENT
In a spatially varying environment, the male fitness values
encountered by an individual female are independent obser-
ð7Þ
ð8Þ
where d(n)depends on n but not on l0 or x1, . . ., xn and hence
not on ln. DeGroot (1970) gives an iterative procedure for
numerically calculating d(0), d(1), d(2), . . ., but there is no
simple explicit expression for the sequence of values.
We call the rule defined by these thresholds an optimal
learning rule. In particular, because the female starts the process with an initial prior distribution N ðl0 ; r20 Þ; the overall
expected net fitness gain under an optimal rule is given by
V *ðl0 ; r20 Þ ¼ l0 1 dð0Þ:
ð9Þ
In the form given by Equation 8 above, the female compares xn with a threshold in which the value of ln already
incorporates the information provided by xn. From Equations
5 and 6, we can express ln in terms of xn and ln1 and hence
rewrite the inequality xn ln 1dðnÞ in the equivalent form
xn ln1 1dðnÞr2n1 =r2n : This provides an equivalent rule
for the optimal action at the nth stage of the form:
accept xn if and only if xn ln1 1 eðnÞ;
ð10Þ
where eðnÞ ¼ dðnÞr2n1 =r2n : Again, because r2n changes deterministically with n, e(n) depends only on n (for given c, c2, and
r20 ), and both d(n) and e(n) can be shown to converge to the
same limit cW1 ðc=cÞ as n becomes large. Figure 2 displays the
typical form of e(n) for a range of environmental variances.
Consider now how the action specified by the optimal learning rule changes in practice over a sequence of encounters.
From Equation 10, the optimal action for the female at the
nth encounter is to compare the new observation xn with
Collins et al.
•
Learning rules for optimal selection
Figure 2
Plot of relative threshold e(n) used by the optimal learning rule,
against n, for a range of values of the environmental variance r20 :
The parameter values are c ¼ 0.1, c2 ¼ 1 (so cW1 ðc=cÞ ¼ 0:90), and
r0 ¼ (from bottom left) 0.5, 1, 2. At n ¼ 10, the values of e(n) in the
figure lie between 0.91 and 0.92.
a threshold composed of the previous Bayes estimate of the
mean (the posterior mean ln1) plus the relative threshold
e(n). Before her first encounter, her estimate of the mean is
l0, and the value of the relative threshold e(1) is set somewhat
high (Figure 2). The learning process enables her to take
advantage of cases where the local value of l is higher than
the long-run average male fitness value l0 and not rush to
accept the first male whose value, though high, may well be
exceeded in later encounters. However, as she starts to encounter further males, the weights bn ensure that the posterior mean adjusts rapidly toward the observed mean-to-date
xn1, whereas at the same time, the relative threshold e(n) falls
rapidly to the value cW1 ðc=cÞ that is optimal in a constant
environment (see Equation 3). By this stage, the threshold has
adjusted to a value that would be optimal in a constant environment with the same values of c and c but where the local
mean l had a value equal to ln1. This enables the female to
adapt quickly in cases where the local value of l is lower
than the long-run average male fitness value l0 and not incur
unnecessary costs by continuing with a threshold that is too
high.
The threshold defined by the optimal learning rule is free
to rise or fall during a search, depending on the values x1,
x2, x3,. . . experienced by the searcher. However, because the
female only accepts mates of value higher than her current
threshold, each time she decides to continue searching, the
rejected male must have had a relatively low fitness value. This
in turn feeds into her posterior mean ln, and it can be shown
analytically that the expected value of the threshold used falls
over time, where the expectation is over females still searching
at that stage. Thus, the current posterior mean and the current threshold of a female still searching at the nth stage will
also be relatively low. Figure 3 gives some indication of how
thresholds fall on average as searching continues. For each n,
what is plotted is the average threshold used by all females still
searching at the nth stage, where the expectation is taken first
with respect to the (relatively low) male values that must have
been experienced to date and then with respect to the
N ðl0 ; r20 Þ distribution for the local mean l. Although the
comment of Real (1990) that ‘‘No simple, smooth behaviour
can be predicted’’ certainly does apply to the changing thresholds of any fixed focal female, we see that these average
803
Figure 3
Plot of the expected threshold for the optimal learning rule against
the number of prospective mates seen, for a range of values of the
environmental variance r20 : The parameter values are l0 ¼ 5, c ¼ 0.1,
c ¼ 1, and r0 ¼ (from left) 0.5, 1, 2.
thresholds do fall smoothly and predictably as the number
of males encountered increases.
Indeed, as n increases, we see from Equations 5 and 6 that
more and more weight is given to the increasing amount of
local data. Moreover, in the Bayesian context, the local value of
l will eventually be learned as more and more observations are
taken, and the problem then reduces to the known (constant)
environment model discussed above. This is exactly reflected
in the behavior of the thresholds specified by the optimal
learning rule; as n tends to infinity, the weight given to the
observations tends to 1, xn tends to l, and d(n) tends
to cW1 ðc=cÞ; so that the optimal threshold tends to
l1cW1 ðc=cÞ; agreeing with the threshold Equation 3 that is
optimal for the corresponding constant environment model.
Finally, we can gain some insight by comparing the optimal
learning rule with more simplistic rules. One estimate of the
local mean l, based only on the long-run average value and
ignoring the local observations, would be to use l0. Another
estimate, based only on the current local observations x1, . . .,
xn and ignoring the long-run average, would be to take the
local observed average xn ¼ ðx1 1 1xn Þ=n: Analogy with the
constant environment model suggests 2 simple but extreme
possible mate choice rules: 1) accept xn if and only if xn l0 1
d# and 2) accept xn if and only if xn xn 1 d$, where d#and
d$are appropriately chosen relative thresholds. The first rule
ignores the data and compares xn with an absolute standard
l0. It might fail to take advantage of good locations or lead to
a prolonged and costly search in poor locations. The second
rule is a purely relative rule, comparing xn only with the female’s own experience and ignoring the long-run average. It
might be misled into poor decisions by unusually high or low
early observations. Both rules have their drawbacks, and we
can see from Equations 5 and 10 that the optimal rule depends on a weighted average of these 2 simple extreme estimates considered before, where the Bayesian analysis precisely
identifies the optimal weights.
THE REPRODUCTIVE ADVANTAGE OF LEARNING
In a spatially varying environment, the optimal learning rule
specified by Equation 7 or 10 gives greater expected reward
than any other rule. It certainly does better than any fixed
Behavioral Ecology
804
Table 1
Expected reward using the optimal learning rule and the best fixed
threshold rule for a range of values of the environmental variance r20
and the residual variance c2
c ¼ 0.5
c¼1
c¼2
Optimal learning rule
Best fixed threshold rule
r ¼ 0.5
r¼1
r¼2
r ¼ 0.5
r¼1
r¼2
5.14
5.80
7.43
5.09
5.71
7.32
5.06
5.65
7.19
5.02
5.73
7.39
4.91
5.28
7.06
4.90
4.92
5.97
The parameter values are l0 ¼ 5; c ¼ 0.1; r0 ¼ 0.5, 1, 2; and
c ¼ 0.5, 1, 2.
threshold rule, but on the other hand, it is also much more
complex than a fixed threshold rule. There may be real costs
associated with, say, the increased neurological capacity required to successfully implement more complex rules. Although such costs are not explicitly modeled here, their
possible presence prompts us to compare the reproductive
advantage of the optimal learning rule with that of the best
fixed threshold rule, so as to identify what type of conditions
favor learning in a spatially varying environment.
The numerical results set out in Table 1 show that, for fixed
values of the residual variance c2, the expected reward under
the optimal learning rule decreases as the environmental variance r20 increases. Conversely, for fixed values of r20 ; it increases as c2 increases. The reward from using the best fixed
threshold rule behaves in a similar way; it falls with r20 for
fixed c2, and it increases with c2 for fixed r20 : Moreover, the
difference between the expected reward under the optimal
learning rule and that under the best fixed threshold rule
increases appreciably as r20 increases. The expected reward
under the optimal learning rule was calculated using Expression 9, and the other values were evaluated by repeated numerical integration. Overall, we see that the optimal learning
rule has the greatest advantage when both the residual and
environmental variances are relatively large.
Figure 4 compares the performance of the 2 types of rules
in terms of the percentage increase in expected reward from
using the optimal learning rule rather than the best fixed
threshold rule, for a range of values of r20 and c2. Although
the absolute values of the expected rewards increase with c2
and decrease with r20 ; the percentage increases do not behave
in quite the same way, as seen, for example, in the plotted
values at r0 ¼ 1. This is because the baseline (the reward from
the fixed threshold rule) also increases, but at a different rate
from that for the optimal learning rule. However, a similar
overall picture emerges—learning is most advantageous when
the environmental variance and the residual variance are both
large.
To gain more insight into why and when learning is advantageous, it is helpful to consider what happens for extreme
values of the variance parameters. When the environmental
variance, r20 ; is very small, then the environment is relatively
constant, and the mean is roughly the same in all locations, so
(Real 1990) the optimal fixed threshold will perform well
throughout. Conversely, if the environmental variance r20 is
very large, then the mean value of the prospective mates is
much more variable across locations, and it is advantageous to
use the observed values seen during the search to provide
a much better estimate of the local mean value. If the local
residual variance c2 is very small, there is very little variation in
the male values around the local mean. Now a rule that accepts the first male encountered is optimal as it minimizes
cost without sacrificing potential reward, and under this rule,
Figure 4
Percentage increase in expected reward from using the optimal
learning rule rather than the best fixed threshold rule, for a range of
values of the environmental variance r20 and the residual variance c2.
The parameter values are l0 ¼ 5; c ¼ 0.1; r0 ¼ 0.5, 1, 2; and
c ¼ 0.5, 1, 2.
the expected fitness of the mate accepted is equal to the local
mean. Indeed, it does not matter much which male is chosen
because all males in a given locality are of similar quality.
Conversely, when c2 is very large, male values show high local
variability; a fixed threshold rule will still do well when r20 is
small, but when r20 is large, it may fail to take advantage of the
local possibility for high rewards, and there is then strong
selection pressure for rules that learn. This explains why the
optimal learning rule has significantly greater advantage only
when both the residual and environmental variances are large.
The complexity of the optimal learning rule leads us to
investigate whether there are other learning rules that retain
its reproductive advantage but are significantly less complex.
With this in mind, we define the class of simple learning rules
to be the set of rules of the form:
accept xn if and only if xn ln1 1 d
for some fixed relative threshold d. Such rules retain the capacity of the optimal learning rule to learn the value of the
local mean from the observed male values, using this information in the form of the Bayes estimate, ln1. However, they use
a simple fixed relative threshold d instead of the more complicated optimal relative threshold e(n) that changes deterministically at each stage. Note that the initial value of the
Bayes estimate is just the long-run average male fitness value
l0, and each updating involves only the new observation and
the value of the ratio of the environmental variance r20 to the
residual variance c2. Because l0, r20 ; and c2 are all assumed to
remain constant from generation to generation, the population might more easily evolve a simple learning rule that was
adapted to these known values.
In our computations, the simple learning rule did nearly as
well as the optimal learning rule—its performance was within
1% of that of the optimal learning rule across the full parameter range, though the difference increased very slightly as
r0 increased. Thus, the numerical results and comments
above apply equally to comparing the best simple learning
rule with the best fixed threshold rule. This will be particularly
relevant when we discuss the evolution of rules.
Collins et al.
•
Learning rules for optimal selection
805
Figure 6
A plot of the average search time against the local value of l for the
optimal learning rule and the best fixed threshold rule. The rules
used for illustration were the optimal rules for the parameter values
l0 ¼ 5, c ¼ 0.1, r0 ¼ 2, and c ¼ 2.
Figure 5
(a) Plot showing the dependence of the best fixed threshold on
r0 and c. (b) Plot showing the dependence of the fixed relative
threshold for the best simple learning rule on r0 and c. In both
cases, the plots are over the range 0.5 c 2, with parameter values
l0 ¼ 5, c ¼ 0.1, and r0 ¼ 0.5, 1, 2.
We now look at the thresholds themselves and, in particular,
how the best fixed threshold and the fixed relative threshold of
the best simple learning rule depend on the (relative) values of
the environmental variance r20 and the residual variance c2.
Recall, from Figure 1, that a fixed threshold can perform
very badly if it is even slightly higher than the optimal value
for its local environment. To give a good average performance
across the full distribution of l values, an inflexible fixed
threshold must balance accepting lower than optimal values
in cases when l is large against incurring prohibitively high
search costs when l is small. We see from Figure 5a that the
value of the best fixed threshold has the following properties:
1) it is below the long-run average male fitness value (here
l0 ¼ 5) for a range of r0 values when c is small because there
is nothing to learn, 2) it increases significantly with c for each
fixed r0, and 3) it decreases slightly as r0 increases for each
fixed c, to guard against a low local value of l, where the
decrease is more pronounced for small values of c than for
larger values. Similarly, the value of the fixed relative threshold for the best simple learning rule increases significantly
with c for each fixed r0 but changes very little with r0
for each fixed c (increasing, but only relatively slowly, as r0
increases).
The fixed relative threshold of a simple learning rule is
itself a compromise between the different stage-dependent
values of the optimal relative threshold e(n) (see Figure 2).
In the cases considered here, e(n) is decreasing in n and
quickly approaches its limiting value of cW1 ðc=cÞ; so the best
value of d is always slightly above this limit. For comparison
with the fixed threshold values in Figure 5a, note that a simple
learning rule with relative threshold d starts off with initial
acceptance threshold l0 1 d. After that, its exact value
changes randomly in response to the observations. Taking
l0 ¼ 5 here, we find that the best simple learning rule starts
off initially with a higher threshold than the best fixed threshold rule, for all values of c and r0. This enables it to take
advantage of situations where the local value of l is high.
However, the learning component (ln1) allows it to adjust
quickly, reduce its threshold, and avoid excessive search costs
in cases where the observed male fitness values indicate that
the local value of l is low. This is well illustrated in Figure 6,
where we see that the average search time under the best fixed
threshold rule increases dramatically in locations where l is
low. Thus, in terms of the average search time, the best fixed
threshold rule does only slightly better than the optimal leaning rule in locations where l is high but is strikingly worse
than the optimal leaning rule in locations where l is low.
EVOLUTION OF RULES
We have analyzed optimal learning rules, but can rules of this
type easily evolve? To investigate this, we simulated the evolution of learning rules using a GA. A GA can be regarded as
a minimal model of the genetics rather than a realistic representation (Axelrod 1987). In each simulation, we followed
a population of size 250 for 6000 generations. Crossover and
mutation were allowed, and rules were coded using a Gray
code, so mutations were always to neighboring thresholds.
Generally convergence had occurred within the first 1000
generations, and the results shown are averages over the last
5000 of the 6000 generations. Details can be found in Ramsey
(1994), but these details are not important because results
were robust to parameters such as the mutation rate and the
fraction of the population replaced each generation.
We first considered a constant environment. Here we took
a rule to be described by a single parameter: the acceptance
Behavioral Ecology
806
threshold. In this case, the average population fitness rapidly
converges to near maximum values. In particular, evolved
rules easily outperformed the rule of choosing the first male
encountered when the variance c2 was appreciably large.
Evolved thresholds, however, fluctuated more markedly and
tended to be significantly below the optimal threshold. This is
consistent with Figure 1; there is low selection pressure below
the optimal threshold but high selection pressure to reduce
the threshold when it is above the optimal threshold.
We also considered these fixed threshold rules in a spatially
varying environment. Evolved thresholds decreased as r20 increased and increased as c2 increased, as theory predicts (see
Figure 5). Thresholds were again below that under the optimal fixed threshold.
To allow learning in a spatially varying environment, we
evolved rules whose form is motivated by the simple learning
rules previously considered. In particular, from Equations 7
and 8, an nth male of quality xn is accepted under the optimal
learning rule if and only if xn ln 1 d(n), where, by rewriting
Equation 5, ln is given by
ln ¼
ql0 1 nð1 qÞ
xn
;
q 1 nð1 qÞ
where q ¼
c2 =r20
:
2
c =r20 1 1
With this in mind, we specify a rule by 3 parameters m0, d,
and r, where 0 , r , 1. Under a rule with these parameters,
the nth male with quality xn is accepted if and only if
xn mn 1 d;
where
mn ¼
rm0 1 nð1 r Þ
xn
:
r 1 nð1 r Þ
Thus, the parameter m0 plays the role of a prior estimate of
the overall mean l0, d represents a fixed relative threshold,
and r plays the role of q, reflecting the relative weight given to
the prior mean m0 and the observed mean xn and thus the
speed of learning. We considered values of c2 and r20 lying in
the range f0.25, 0.5, 1, 2, 4g. We evolved rules repeatedly
for each of the 25 combinations of these parameters. When
such rules evolve, there is initially strong selection pressure
against high values of d. Convergence is slower than for
fixed threshold rules, but there was convergence by generation 1000.
As noted above, the theory predicts that when c2 is small, it
does not matter much which male is chosen because they are
all of similar quality, and the selection pressure is low provided
achieved thresholds are sufficiently low. For the evolution of
rules, this neutrality means that there may be significant drift
in evolved parameters, and it may be difficult for the more
complex rules to home in on exactly the right combinations
of rule parameters. Taken together with the continuing effects
of mutation and crossover, this means there may still be noticeable variation and possibly underperformance within
a population using complex rules, even when it is close to
convergence. When c2 is large, the theory predicts that the
performance of the best fixed threshold rule will be comparable to that of the optimal learning rule when r20 is small but
may be significantly worse than the optimal learning rule
when r20 is large as there is then strong selection pressure
for rules that learn. Figure 7 clearly illustrates these points
for c2 large, with low selection pressure leading to evolved
fixed threshold rules actually doing slightly better than
evolved simple learning rules when r20 is small but doing
significantly worse as r20 increases.
Figure 7
Comparison of the average reward obtained under the evolved fixed
threshold rules and the evolved simple learning rules for a range of
values of the environmental variance r20 : The rules were evolved
using environmental parameter values l0 ¼ 5, c ¼ 0.1, c ¼ 2 and
(a) r0 ¼ 0.5, (b) r0 ¼ 1, and (c) r0 ¼ 2. The results shown are
averages over the last 5000 of 6000 generations.
In all cases, regardless of the strength of selection, there was
high variability in parameters and correlations in parameters
across successive runs with the same parameter values. When
r20 is low, we see from Figure 8a that m0 and d are highly
negatively correlated. This correlation can be understood as
follows. In this case, the environment is approximately constant. Thus, learning is not valuable, and fixed threshold rules
perform well. A learning rule with r ¼ 1 is just a fixed threshold
rule with threshold m0 1 d. This threshold can be achieved
by different combinations of the 2 parameters m0 and d. In
particular, a range of values of m0 can achieve the optimal
threshold provided that as m0 is increased d is decreased to
compensate. Of course, the behavior observed under a rule
depends on how the threshold changes with experience. Thus,
high variability in m0 and d does not necessarily imply high
variability in what would be observed.
When the environmental variance r20 is large, Figure 8b
shows that there is a high negative correlation between the
values of r and d used by members of the evolved population.
The intuition is that the female needs to be very responsive to
the locally observed values because the mean of the distribution varies so much from location to location. For example,
if l is low and the relative threshold d is high, then the female
may take a very long time to accept a male; she will incur
considerable search costs unless she quickly lowers her threshold in response to encounters with poor-quality males, and
this responsiveness corresponds a low value of r. However, if
d is lower, the female does not need to be so responsive.
Thus, there are a variety of different ways of achieving
near-optimal behavior, ranging from adapting fast with a high
relative threshold to adapting more slowly but with a lower
relative threshold, and a good simple learning rule must use
a low r value if its d value is high and vice versa. Our results
indicate that different populations could have found different
ways of solving the same problem, for example, using different
combinations of values of r and d, and these different r and
d values will result in observable differences in behavior. A corollary of this is that observed differences in female behavior
between populations does not necessarily imply that females
Collins et al.
•
Learning rules for optimal selection
807
Figure 9
Comparison of the average thresholds used by fixed threshold
rules evolved in spatially varying environments with those evolved in
temporally varying environments. The rules were evolved using
environmental parameter values l0 ¼ 5, c ¼ 0.1, and c ¼ 2 and
(a) r0 ¼ 0.5, (b) r0 ¼ 1, and (c) r0 ¼ 2.
Figure 8
Correlations between the evolved parameters for simple learning
rules, across 20 successive runs. (a) Correlations between m0 and
d for rules evolved using environmental parameter values l0 ¼ 5,
c ¼ 0.1, c ¼ 2, and r0 ¼ 0.5. (b) Correlations between r and d for
rules evolved using environmental parameter values l0 ¼ 5, c ¼ 0.1,
c ¼ 1, and r0 ¼ 2.
are facing different problems or even using different types
of rule.
TEMPORALLY VARYING ENVIRONMENTS
When fluctuations are temporal as opposed to spatial, all females in the population experience the same value of l in
a given year. Thus, if l varies (i.e., r20 . 0), all females are
subject to the same environmental fluctuations, and natural
section maximizes geometric mean fitness (Lewontin and
Cohen 1969). To define the fitness of a particular rule, let
V(l) be the expected payoff to a female using the rule when
the environmental mean is l. Then, the geometric mean fitness of the rule is G ¼ exp(g), where
g ¼ Eflog V ðlÞg:
ð11Þ
The expectation in Equation 11 is an average over possible
values of l.
Because of the logarithm in Expression 11, geometric mean
fitness is highly sensitive to low values of V(l). This means that
it is more important to avoid poor performance for some
values of l than to achieve good performance for other values.
We can therefore expect the optimal learning rule to be conservative, avoiding risks so as to perform reasonably in all years.
In this situation, it is impractical to find optimal learning
rules with this fitness criterion. Instead, we evolved rules using
the GA. When fixed threshold rules are evolved, we see from
Figure 9 that the resulting thresholds are significantly lower
than in a spatially varying environment with the same parameters, especially for high r20 : This is as expected because low
thresholds reduce the risk of a very low payoff in bad years
(low l). However, when simple learning rules are evolved,
there is surprisingly little difference to the spatially varying
case. The exception is that r is significantly lower. This is so
that in a bad year the threshold is lowered quickly in response
to encounters with poor-quality males, so reducing the risk of
a long and costly search.
DISCUSSION AND COMPARISON WITH
PREVIOUS MODELS
We introduce a model for optimal mate choice in situations
where the observed values of prospective mates are subject to
2 types of variations—variation of individual observations
about the (local) mean and variation of the local mean about
some long-run average value.
Each female observes prospective mates in a single location,
rather than the multiple patch model of Hutchinson and
Halupka (2004). In contrast to the ‘‘full knowledge’’ models
of Real (1990) and Janetos (1980), we assume that the local
mean is unknown and can vary from location to location or
season to season. Under our model, the local male population
is possibly infinite and is observed sequentially with no possibility of recall, so the best-of-n rule (Janetos 1980) is not relevant, and finite population models (Dombrovsky and Perrin
1994) are excluded. The value of each male encountered is
observed exactly, obviating the need for reinspection or a
808
comparative Bayes approach (Luttbeg 1996, 2002; Fawcett and
Johnstone 2003b). We emphasize an optimality framework
with a single choosy sex, rather than a game theoretic framework (McNamara and Collins 1990; Johnstone 1997; Bergstrom
and Real 2000; Fawcett and Johnstone 2003a), and our
optimality criterion is based on a single measure combining
both the value of the mate chosen and the search costs, as
opposed to simply maximizing the value of the mate (Mazalov
et al. 1996) or maximizing the probability of choosing the
best mate (Dombrovsky and Perrin 1994).
In a constant environment, where a fixed threshold rule is
known to be optimal (Real 1990), we show that there will be
strong selective pressure against females that use thresholds
higher than this optimal value, but there may be only weak
selective pressure against females that use lower than optimal
thresholds. In a varying environment, the long-run performance of a mate choice rule must be evaluated in terms of
its average performance over the whole range of environmental conditions and not just by its performance in a single fixed
environment. We show that the optimal rule is a learning rule,
under which the acceptance threshold is composed of the
current estimate of the mean plus a relative threshold that
depends only on the number of males encountered to date,
and demonstrate how learning enables the female to take advantage of the local information through the changing weight
given to the local mean. We compare the performance of the
optimal learning rule with both fixed threshold rules and simpler learning rules, analyzing the effect of different levels of
variability on the performance of these rules. We show that the
relative advantage of learning is greatest when both residual
and environmental variances are large because this results in
high local variability about a very variable mean and therefore
strong selection pressure for learning. Although the acceptance threshold of an individual focal female may vary unpredictably over her search (Real 1990), we show that under the
optimal learning rule, the expected threshold used falls
smoothly as a function of the number of males encountered,
when averaged over females still searching at that stage. Results for temporally varying environments are not presented in
detail as they are qualitatively similar to those for spatially
varying environments. However, they do indicate even greater
selective pressure against setting thresholds too high.
We also break new ground in quantitatively and qualitatively
exploring the evolution of these different rules, again looking
at the effect of different levels of variability. The simulated
evolutions indicate that evolved fixed thresholds will be below
the optimal fixed threshold in both constant and spatially
varying environments, reflecting the selective pressure against
erring on the high side. In spatially varying environments, the
evolved thresholds decrease as the environmental variance
increases and increase as the residual variance increases. For
simple learning rules, our results indicate high variability in
the evolved rule parameters for different members of the
population, emphasizing the variety of ways in which a nearoptimal rule can be expressed.
Overall, Mazalov et al. (1996) comes perhaps closest to our
analysis in spirit and content. They consider a finite horizon
sequential learning model when local male values have a normal distribution with unknown mean (and possibly variance).
They derive an optimal learning rule, in some ways similar to
ours in structure, based on updating estimates of the local
mean and variance and compare it with the fully adapted rule
that would be optimal if the parameters were known. However, despite the common elements, our treatment differs
from theirs in several crucial respects.
Firstly, their model has no search costs, so their criterion for
an optimal rule is just that it maximizes the expected value of
the male obtained, and expected search times do not influ-
Behavioral Ecology
ence the choice of an optimal rule. Our optimality criterion
combines both rewards and costs into a complete overall measure of a rule’s performance. This leads us to very different
conclusions about the properties of an optimal rule and the
relative performance of fixed and learning rules in a varying
environment.
To illustrate the effect of the difference in optimality criterion, consider the following oversimplified analysis of how
a fixed rule fully adapted to a given local mean would perform
relative to an optimal learning rule in an environment with
a different local mean when the horizon is large. For the
example computed in Mazalov et al. (1996), very roughly
speaking, the fixed rule has a similar performance to their
learning rule when the actual local mean is much lower than
the value to which the rule is adapted (essentially because it
may eventually accept a relatively high-quality male, though
with small probability on each encounter) but performs significantly worse than the learning rule when the local mean is
higher than the value to which the rule is adapted (now it may
accept a relatively low-quality male). In a similar situation,
under our model and optimality criterion, the discussion earlier in this paper indicates just the opposite—the fixed rule
has a much worse performance than our learning rule when
the actual local mean is much lower than the value the rule is
adapted to (it may be penalized for the long search time
resulting from an over-high threshold) but has a similar performance to the optimal learning rule when the local mean is
higher than the adapted value (its low search cost may compensate for a low value of the male accepted).
A second crucial difference is that Mazalov et al. (1996)
assume that females have no prior knowledge of the parameters of the local distribution of male quality, and parameter
estimation is based purely on the observed values. This has the
particular disadvantage that under their model, the female is
not allowed to accept the first male encountered—however
high their value—but is constrained to use this value as an
initial estimate of the local mean against which to judge
the next encounter and update as appropriate. We give a
fully Bayesian treatment of inference for the local mean, assuming that females have adapted to the relevant long-run
parameters.
Finally, we note that our comparisons of different types of
rules are made under the assumption that recall is not allowed. The advantage of learning may be even greater if recall
of previously rejected prospective mates is possible. In a constant environment, l is effectively known, and the observations are independent given l, so they carry no extra
information (about l), and a learning rule can do no better
than a fixed threshold rule. In a varying environment, the
observations do carry information about l. A learning rule
can take advantage of this information to lower its threshold
in response to low observed male fitness values. In such cases,
a prospective mate rejected earlier may later seem much more
attractive. However, a fixed threshold rule can never take advantage of this information; a prospective mate that was not
acceptable at one point in time would always have a value
below the fixed threshold and would never be acceptable despite any indications that the local mean value was unexpectedly low. In particular, a learning rule will be favored over
a fixed threshold rule in cases where recall and search costs
are very low and recall is reliable, such as a female bird choosing a mate from a male lek.
We thank Alasdair Houston, Barney Luttbeg, and Peter Todd for
their comments on earlier drafts of this paper. John McNamara acknowledges the support of the Leverhulme Trust. We also thank the
editor and 2 anonymous referees for their helpful and perceptive
comments.
Collins et al.
•
Learning rules for optimal selection
REFERENCES
Alatalo RV, Carlson A, Lundberg A. 1988. The search cost in mate
choice of the pied flycatcher. Anim Behav 36:289–91.
Axelrod R. 1987. The evolution of strategies in the iterated prisoner’s
dilemma. In: Davis L, editor. Genetic algorithms and simulated annealing. Los Altos, CA: Morgan Kaufmann publishers Inc. p 32–41.
Bakker TCM, Milinski M. 1991. Sequential mate choice and the previous male effect in sticklebacks. Behav Ecol Sociobiol 29:205–10.
Bergstrom CT, Real LA. 2000. Towards a theory of mutual mate
choice: lessons from two-sided matching. Evol Ecol Res 2:493–508.
Collins EJ, McNamara JM. 1993. The job-search problem with competition: an evolutionarily stable dynamic strategy. Adv Appl Probab
25:314–33.
Collins SA. 1995. The effect of recent experience on female choice in
zebra finches. Anim Behav 49:479–86.
DeGroot MH. 1970. Optimal statistical decisions. New York: McGraw-Hill.
Dombrovsky Y, Perrin N. 1994. On adaptive search and optimal stopping in sequential mate choice. Am Nat 144:355–61.
Downhower JF, Lank DB. 1994. Effect of previous experience on mate
choice by female mottled sculpins. Anim Behav 47:369–72.
Fawcett TW, Johnstone RA. 2003a. Mate choice in the face of costly
competition. Behav Ecol 14:771–9.
Fawcett TW, Johnstone RA. 2003b. Optimal assessment of multiple
cues. Proc R Soc Lond B Biol Sci 270:1637–43.
Houston AI, McNamara JM. 1999. Models of adaptive behaviour.
Cambridge, UK: Cambridge University Press.
Hutchinson JMC, Halupka K. 2004. Mate choice when males are in
patches: optimal strategies and good rules of thumb. J Theor Biol
231:129–51.
Janetos AC. 1980. Strategies of female mate choice: a theoretical analysis. Behav Ecol Sociobiol 7:107–12.
809
Johnstone RA. 1997. The tactics of mutual mate choice and competitive search. Behav Ecol Sociobiol 40:51–9.
Lewontin RC, Cohen D. 1969. On population growth in a randomly
varying environment. Proc Natl Acad Sci USA 62:1056–60.
Luttbeg B. 1996. A comparative Bayes tactic for mate assessment and
choice. Behav Ecol 7:451–60.
Luttbeg B. 2002. Assessing the robustness and optimality of alternative decision rules with varying assumptions. Anim Behav 63:
805–14.
Mazalov V, Perrin N, Dombrovsky Y. 1996. Adaptive search and
information updating in sequential mate choice. Am Nat 148:
123–37.
McNamara JM, Collins EJ. 1990. The job-search problem as an
employer-candidate game. J Appl Probab 28:815–27.
Milinski M, Bakker TCM. 1992. Costs influence sequential mate
choice in sticklebacks, Gasterosteus aculeatus. Proc R Soc Lond B Biol
Sci 250:229–33.
Parker GA. 1983. Mate quality and mating decisions. In: Bateson P,
editor. Mate choice. Cambridge, UK: Cambridge University Press.
p 141–66.
Ramsey DM. 1994. Models of evolution, interaction and learning in
sequential decision processes [dissertation]. Bristol, UK: Department of Mathematics, University of Bristol.
Real L. 1990. Search theory and mate choice. I. Models of single-sex
discrimination. Am Nat 136:376–404.
Simao J, Todd PM. 2002. Modelling mate choice in monogamous
mating systems with courtship. Adapt Behav 10:113–36.
Svensson EI, Sinervo B. 2004. Spatial scale and temporal component
of selection in side-blotched lizards. Am Nat 163:726–34.
Wong BBM, Candolin U. 2005. How is female mate choice affected by
male competition. Biol Rev Camb Philos Soc 80:559–71.