REPORTS State-Dependent Learned Valuation Drives Choice in an Invertebrate Lorena Pompilio,* Alex Kacelnik,† Spencer T. Behmer‡ Humans and other vertebrates occasionally show a preference for items remembered to be costly or experienced when the subject was in a poor condition (this is known as a sunk-costs fallacy or statedependent valuation). Whether these mechanisms shared across vertebrates are the result of convergence toward an adaptive solution or evolutionary relicts reflecting common ancestral traits is unknown. Here we show that state-dependent valuation also occurs in an invertebrate, the desert locust Schistocerca gregaria (Orthoptera: Acrididae). Given the latter’s phylogenetic and neurobiological distance from those groups in which the phenomenon was already known, we suggest that state-dependent valuation mechanisms are probably ecologically rational solutions to widespread problems of choice. nimal decision-making is often modeled using the assumption that choices are based on the fitness consequences that each choice yields. Fitness gains, in turn, depend on both the intrinsic properties of the options and the state of the subject at the time of the choice. Recently, however, studies in humans and other vertebrates (1, 2) have shown that understanding the adaptive significance of learning mechanisms may be the key to progress in functional modeling of decisionmaking, because preferences more closely reflect the subject_s state at the time of learning than at the time of choice. Classical learning models (3) do not address the subject_s state, but recent treatments of evaluative incentive behavior do (4) and are compatible with the approach taken here. A recent theoretical model linking learning to decision-making (5) proposes that anomalies of choice behavior in which past investments rather than expected returns dominate preference (examples include Sunk Costs, Work Ethics, and the Concorde Fallacy) result from a decelerated function of value (fitness or utility) versus objective payoff, combined with a mechanism of choice that is dependent on the remembered benefit previously yielded by each option (Fig. 1). Although some utility functions can be accelerated or sigmoid, because Bdesperados[ in dire states would accrue less marginal gains from resources than would better-off individuals, most surviving organisms operate beyond this extreme zone, and hence the assumption of decelerated gains has very wide justification. In summary, if two sources (L and H) yielding the same objective payoff (ML 0 MH; M, magnitude) are systematically A Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK. *Present address: Departamento de Ecologı́a, Genética y Evolución, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Pabellon II Ciudad Universitaria, C1428EHA, Buenos Aires, Argentina. †To whom correspondence should be addressed. E-mail: [email protected] ‡Present address: Department of Entomology, Texas A&M University, College Station, TX 77843–2475, USA. encountered when the individual is in different states (low or high reserves for L and H, respectively), then the source encountered when needs are greater (L) will yield larger value gains (VL 9 VH). According to the model, although gains depend jointly on payoff magnitude and present state, it is the remembered gains, rather than remembered payoff magnitudes or states, that drive future preferences. The adaptive advantages of such a mechanism are not obvious because, at least under experimental conditions, they can produce irrational preferences: Starlings can prefer a more delayed over a more immediate reward even when having explicit knowledge of the delays involved (6), and rats can frantically operate a lever or chain that causes food or water rewards even when being neither hungry nor thirsty (7). Supporting evidence for incentive or statedependent learning comes from the mammal and bird species that have been studied so far, but there is an open question as to whether this mechanism of learned value assignment was an early vertebrate acquisition or a wider phenom- enon perhaps universally present because it confers selective advantages. We tested whether such state-dependent valuation learning occurs in a grasshopper, an animal with a simpler nervous system (8) than that of the vertebrates in which these effects are known. Grasshoppers make particularly good test subjects for studying and modeling individual decision-making because they forage for themselves and are capable of learning (9, 10). Additionally, much is known about how changes in their nutritional state affect their feeding behavior (11). We manipulated nutritional state both at the time of learning and at the time of preference testing. We trained grasshoppers so that they encountered each of two options under different nutritional states: low (option L) and high (option H). Each option consisted of an odor (lemon grass or peppermint) paired during learning with a food item (a small piece of seedling wheat). Food items were of the same size and quality in both options, and each odor was always associated with the same state for each subject. Individuals received an equal number of reinforced trials with each option over a 3-day training regime (fig. S1). After training was completed, individual grasshoppers were presented with a choice between the two options. Half of the subjects had the test in the low state and the other half in the high state (12). We considered four possible outcomes. The first of these, Magnitude Priority, states that if choices depend on the intrinsic properties of the options, no systematic preference will be observed between odors because the food items were identical. The second, Value Priority, states that if choices are controlled by past gains, preference should be for the option experienced in the low state during training, Fig. 1. Putative mechanisms of valuation learning as a function of a subject’s SH state. The ordinate is a curVH rency that is assumed to mH Value, correlate to adaptive value, MH SL mL and the abscissa is a metric of utility, objective state, here assumed function VL to be the level of accumulated reserves. The plot illusML trates consequences for a subject that encounters two food sources (L and H), each when the subject is in either of two states: low or high, L H respectively. The magnitudes State at the time of learning of the outcomes are labeled ML and MH and are represented as arrows causing positive state displacements. The value (or benefit) of each outcome (VL and VH) is the vertical displacement that corresponds to each change in state. The first derivative (marginal rate) of the value-versus-state function at each initial state is indicated by the slopes of the tangents SL and SH. The inset shows that the subject’s representation of the magnitude of rewards (m) may differ from the objective metrics of the outcomes (in the example, ML 0 MH but mL 9 mH). Models of learning may use M, S, V, or m as being directly responsible for value assignment. www.sciencemag.org SCIENCE VOL 311 17 MARCH 2006 1613 REPORTS regardless of state at the time of choice. The third, State Priority, stipulates that if options are valued by association with the desirability of the state they evoke, the option preferred should be that experienced in the high state during training, regardless of state during choice. The fourth, State-Option Association, stipulates that choice should favor the source met under the same state at the time of training. Thus, subjects may choose option L under state low and option H under state high. A majority of the grasshoppers preferred option L (the stimulus to which the grasshoppers were trained when in a state of low reserves) regardless of their state at the time of testing (Fig. 2). Averaged across all test subjects, the mean preference (TSE) for option L was 0.71 T 0.06. These results indicate a significant preference for option L (t1,15 0 3.60, P G 0.01; one sample t test against indifference). Preference was not affected by the state of the subject at the time of testing (Fig. 2) or by odor bias, and the state-by-odor interaction was not significant Eanalysis of variance (ANOVA): F1,15 0 0.09, P 9 0.77; F1,15 0 0.01, P 9 0.92; and F1,15 0 0.01, P 9 0.91, respectively^. There also was no left arm–right arm positioning effect (paired-samples t test: t1,15 0 0.17, P 9 0.86). Next, we considered whether the speed of learning during training, as measured by latencies to contact and eat the reward, might have anticipated the preference results. A repeated-measures ANOVA indicated that latencies to start eating decreased across the 3 days of training (F2,15 0 15.00, P G 0.01; fig. S2), but averaged over time, the latencies between the L and H options were similar (F1,15 0 0.08, P 9 0.78), and no significant option-by-day interaction was observed Percentage of choices for option L 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Low reserves High reserves State during testing Fig. 2. Percentage of choices for option L exhibited by each subject under the two different states (low and high reserves) during testing. The figure shows individual data points (black circles) and means (TSE) (gray circles) with respect to indifference (dashed line). For both states, the percentage of choices for option L was significantly higher than indifference. 1614 (F2,15 0 0.91, P 9 0.42). Finally, we considered the possibility that each grasshopper preferred the option for which it had a shorter latency during training (regardless of whether shorter latencies were exhibited for the L or H option) (12). We found, however, no association between latencies and choices (Pearson_s correlation index, r 0 0.19, P 0 0.48). Our experiment supports the idea that in this insect, the benefit gained at the time of training affects later preference even when the magnitudes of the rewards are equal (namely, Value Priority). The Value Priority outcome can be mediated by two very different mechanisms that could be labeled Perception Distortion and Remembered Value. Perception Distortion states that the energetic state at the time of training influences the distal mechanisms of perception, so that the memory of the properties of the options is altered: Equal payoffs are perceived as being different (In Fig. 1, mL 9 mH). Under the Remembered Value mechanism, the memory for the magnitudes is accurate, but the animal attaches different subjective attractiveness to each option, depending on its state while learning. These considerations may apply to similar anomalies of decisionmaking in all animals, including humans. In grasshoppers, there is evidence favoring the Perception Distortion mechanism, because preference for the option experienced when reserves are low could be explained by peripheral gustatory responses underlying feeding behavior. In these animals, as time since the last feed increases, nutrient levels in the hemolymph drop, and as a consequence, mouthpart taste receptors become increasingly sensitive to key depleted nutrients (13, 14). This means that at a neurological level, a grasshopper with low reserves will receive greater feedback when it contacts a food item (15–17). Similarly, through digestive adaptations, individuals may extract more nutrients from identical food items when in greater need (18), and later choices may be governed by the memory of the postabsorptive gain (or the sensory adaptation consequent on the gains) and not of the objective features of the food items. The latter route for Perception Distortion could in theory also apply to vertebrates, but the available evidence does not point in this direction, at least for starlings. In their case, peripheral adjustments leading to either distorted representations or distorted perceptions due to rapid absorptive adaptations are both unlikely. This is because in learned valuation effects, starlings_ preference between equally delayed rewards is not accompanied by alterations in the pecking rate (suggesting that neither the perception of the magnitude of the reward nor timing was altered) (1, 5, 6). Thus, although similar behavioral outcomes are observed in starlings and grasshoppers, it is possible that different underlying mechanisms drive state-dependent learned valuation in each species. This difference supports the view that state-dependent learned valuation has intrinsic, 17 MARCH 2006 VOL 311 SCIENCE although not yet identified, adaptive advantages and has probably emerged and persisted in distant species via convergent evolution. State-dependent valuation may be computationally more efficient than remembering the attributes of each option and weighing them against current nutritional state. This may reduce errors and help when decisions need to be made quickly and where neural constraints limit the amount of information that can be processed (19, 20). State-dependent valuation can cause suboptimal choices if there is a difference between the choice circumstances and the circumstances for learning about each option, as in our experiment. In particular, for there to be a cost, there must be a correlation between state and the probability of encounter with each option when options are met singly. This would occur because when these options are met simultaneously and a choice takes place, information about past gains could be misleading. Outside these probably rare circumstances, the mechanisms do not favor suboptimal alternatives (21). It could be argued that even if state-dependent valuation causes frequent and costly suboptimal choices in nature, it may persist because of neural or psychological constraints or because the cost associated with the development of a different mechanism is higher than the cost of using such a metric. The latter possibility cannot be discarded, but we do not favor it as a working hypothesis. Ultimately, it would be ideal to measure the prevalence of different learning and choice circumstances in natural environments, but until that becomes possible, progress can be made by modeling the theoretical ecological worlds under which state-dependent valuation would be evolutionarily stable when used in competition with animals that form preferences based on the absolute properties of their options. References and Notes 1. B. Marsh, C. Schuck-Paim, A. Kacelnik, Behav. Ecol. 15, 396 (2004). 2. T. S. Clement, J. R. Feltus, D. H. Kaiser, T. R. Zentall, Psychon. Bull. Rev. 7, 100 (2000). 3. R. A. Rescorla, A. R. Wagner, in Classical Conditioning II: Current Research and Theory, A. H. Black, W. F. Prokasy, Eds. (Appleton-Century-Crofts, New York, 1972), pp. 64–99. 4. B. W. Balleine, in The Behavior of the Laboratory Rats: A Handbook with Tests, I. Q. W. Kolb, Ed. (Oxford Univ. Press, Oxford, 2006), pp. 436–446. 5. A. Kacelnik, B. Marsh, Anim. Behav. 63, 245 (2002). 6. L. Pompilio, A. Kacelnik, Anim. Behav. 70, 571 (2005). 7. A. Dickinson, B. W. Balleine, in Steven’s Handbook of Experimental Psychology, vol. 3, Learning, Motivation and Emotion, ed. 3, H. Pashler, R. Gallistel, Eds. (Wiley, New York, 2002), pp. 497–533. 8. M. Burrows, The Neurobiology of an Insect Brain (Oxford Univ. Press, Oxford, 1996). 9. R. Dukas, E. A. Bernays, Proc. Natl. Acad. Sci. U.S.A. 97, 2637 (2000). 10. S. T. Behmer, C. E. Belt, M. S. Shapiro, J. Exp. Biol. 208, 3463 (2005). 11. S. J. Simpson, D. Raubenheimer, Adv. Study Behav. 29, 1 (2000). 12. Materials and methods are available as supporting material on Science Online. 13. S. J. Simpson, M. S. J. Simmonds, W. M. Blaney, S. James, Appetite 17, 141 (1991). www.sciencemag.org REPORTS 14. S. J. Simpson, C. L. Simpson, J. Exp. Biol. 168, 269 (1992). 15. J. D. Abisgold, S. J. Simpson, J. Exp. Biol. 135, 215 (1988). 16. C. L. Simpson, S. Chyb, S. J. Simpson, Entomol. Exp. Appl. 56, 259 (1990). 17. R. F. Chapman, Annu. Rev. Entomol. 48, 455 (2003). 18. R. M. Sibly, in Physiological Ecology: An Evolutionary Approach to Resource Use, T. C. P. Calow, Ed. (Blackwell, Oxford, 1981), pp. 109–139. 19. 20. 21. 22. L. A. Real, Am. Nat. 140, S108 (1992). E. A. Bernays, Annu. Rev. Entomol. 46, 703 (2001). G. Gigerenzer, Behav. Brain Sci. 27, 336 (2004). We thank C. Nadell and C. Spatola Rossi for help during data collection; C. Schuck-Paim and M. S. Shapiro for helpful discussion; and T. Dickinson, P. Harvey, T. Reil, O. Lewis, and two anonymous referees for valuable feedback on the manuscript. An Equivalence Principle for the Incorporation of Favorable Mutations in Asexual Populations Matthew Hegreness,1,2* Noam Shoresh,1* Daniel Hartl,2 Roy Kishony1,3† Rapid evolution of asexual populations, such as that of cancer cells or of microorganisms developing drug resistance, can include the simultaneous spread of distinct beneficial mutations. We demonstrate that evolution in such cases is driven by the fitness effects and appearance times of only a small minority of favorable mutations. The complexity of the mutation-selection process is thereby greatly reduced, and much of the evolutionary dynamics can be encapsulated in two parameters—an effective selection coefficient and effective rate of beneficial mutations. We confirm this theoretical finding and estimate the effective parameters for evolving populations of fluorescently labeled Escherichia coli. The effective parameters constitute a simple description and provide a natural standard for comparing adaptation between species and across environments. 1 Bauer Center for Genomics Research, Harvard University, Cambridge, MA 02138, USA. 2Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA. 3Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA. *These authors contributed equally to this work. †To whom correspondence should be addressed. E-mail: [email protected] www.sciencemag.org/cgi/content/full/311/5767/1613/DC1 Materials and Methods Figs. S1 and S2 References 15 December 2005; accepted 1 February 2006 10.1126/science.1123924 beneficial mutations (Fig. 1), including an exponential distribution as suggested by Gillespie_s (8) and Orr_s (3) use of extreme value theory. The salient feature of Fig. 1 is that very dissimilar underlying distributions—exponential, uniform, lognormal, even an arbitrary distribution— all yield a similar distribution of successful mutations (24). Moreover, the distribution of successful mutations has a simple form, peaked around a single value. This fitness value is typical of those mutations whose effects are not so small that they are lost through competition with more fit lineages, but are also not so large that they are impossibly rare. The unimodal shape motivates the hypothesis that an equivalent model that allows mutations with only a single selective value might approximate the behavior of the entire distribution of beneficial mutations. We investigate whether the adaptive dynamics observed in evolving E. coli populations can be reproduced by an equivalent model with only a single value, a Dirac delta function of mutational effects. We rely on a classic strategy for characterizing beneficial mutations in coevolving subpopulations that differ initially only by selectively neutral marker. The spread of mutations is monitored through changes in the marker ratio (22, 25–29). Our experimental technique uses constitutively expressed variants of GFP (green fluorescent protein)—YFP (yellow fluorescent protein) and CFP (cyan fluorescent protein)—as neutral markers. All experimental populations start with equal numbers of YFP and CFP E. coli cells (NY and NC ) and evolve for 300 generations through serial transfers while adapting to glucose minimal medium. The expected behavior of the marker-ratio trajectories depends upon the rate at which beneficial mutations appear in a population. When beneficial mutations are rare, mutant lineages arise and fix one at a time (8, 17). The spread of each individual mutant lineage shows as a line of Fig. 1. Successful mutaA B C D tions cluster around a single value, irrespective of the shape of the underlying mutational distribution. Probability density of 0 0.2 0 0.2 0 0.2 0 0.2 four underlying distributions: (A) exponential; (B) uniform; (C) lognormal; 0 0.03 0 0.3 0 0.15 0 0.2 (D) arbitrary. (Insets) The Selection coefficient corresponding distributions of successful mutations, defined here as those whose lineages constitute at least 10% of the population at any time before the ancestral genotype diminishes to less than 1%. All simulations were done with beneficial mutation rate of mb 0 10j5 and population size Ne 0 2 106 and were replicated 1000 times (23). www.sciencemag.org SCIENCE Successful mutations S and the corresponding effective rate of beneficial mutations for laboratory populations of Escherichia coli, and we demonstrate the predictive power of these effective parameters. First, we use numerical simulations to demonstrate the simplification that emerges in a population large enough and a mutation rate high enough that clonal interference (17–19)—competition among lineages carrying favorable mutations—is common. In an evolving population, most beneficial mutations are rapidly lost to random genetic drift (20, 21). Of the remaining mutant lineages, some increase in frequency slightly, only to decline as more fit lineages appear and expand in the population (10, 16, 17, 22). The evolutionary path taken by the population as a whole is determined by successful mutations that escape stochastic loss and whose frequencies rise above some minimal level. Using a population genetics model that includes mutation, selection, drift, as well as clonal interference (23), we explore the distribution of these successful mutations for several underlying distributions of Probability density of available mutations pontaneous beneficial mutations are the fuel for adaptation, the source of evolutionary novelty, and one of the least understood aspects of biology. Although adaptation is everywhere—cancer invading tissues, bacteria escaping drugs, viruses switching from livestock to humans—beneficial mutations are notoriously difficult to study (1, 2). Theoretical and experimental advances have been made in recent years by focusing on the distribution of fitness effects of spontaneous beneficial mutations (3–8). Mapping the options for improvement available to single organisms, however, is insufficient for understanding the adaptive course of an entire population, especially in asexual populations of microorganisms or cancer cells where multiple mutations often spread simultaneously (9–16). Here, we use modeling and experimental results to show that the seeming additional complication of having multiple lineages competing within a population leads in fact to a drastic simplification: Regardless of the distribution of mutational effects available to individuals, a population_s adaptive dynamics can be approximated by an equivalent model in which all favorable mutations confer the same fitness advantage, which we call the effective selection coefficient. We provide experimental estimates of the effective selection coefficient Supporting Online Material VOL 311 17 MARCH 2006 1615
© Copyright 2026 Paperzz