Autonomous Agents and Multi-Agent Systems © 2005 Springer Science+Business Media, Inc. Manufactured in The Netherlands. DOI: 10.1007/s10458-005-4948-2 2 Autonomous Adaptive Agents for Single Seller Sealed Bid Auctions 3 4 ANTHONY BAGNALL AND IAIN TOFT School of Computing Sciences, University of East Anglia, Norwich, UK 5 Published online: XXX 1 ajb,[email protected] 6 7 8 9 10 11 12 13 14 15 16 17 18 Abstract. In developing open, heterogeneous and distributed multi-agent systems researchers often face a problem of facilitating negotiation and bargaining amongst agents. It is increasingly common to use auction mechanisms for negotiation in multi-agent systems. The choice of auction mechanism and the bidding strategy of an agent are of central importance to the success of the agent model. Our aim is to determine the best agent learning algorithm for bidding in a variety of single seller auction structures in both static environments where a known optimal strategy exists and in complex environments where the optimal strategy may be constantly changing. In this paper we present a model of single seller auctions and describe three adaptive agent algorithms to learn strategies through repeated competition. We experiment in a range of auction environments of increasing complexity to determine how well each agent performs, in relation to an optimal strategy in cases where one can be deduced, or in relation to each other in other cases. We find that, with a uniform value distribution, a purely reactive agent based on Cliff’s ZIP algorithm for continuous double auctions (CDA) performs well, although is outperformed in some cases by a memory based agent based on the Gjerstad Dickhaut agent for CDA. 19 Keywords: 20 1. Introduction 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 The dramatic increase in the quantity of goods and services sold via auctions has fuelled greater interest in the study of protocols for auction structure and strategies for agent bidding. The potential for software agents in e-commerce, and in auctions in particular, is enormous [20, 52]. Agent systems can be employed as a practical mechanism by which individuals and companies may more usefully engage in online commercial activity. The applications for autonomous, adaptive agents that can compete and learn effectively in real time online auctions are numerous. For example, Market Based Control (MBC) [8] systems have been applied to a variety of applications, including air conditioning [24], network bandwidth, telecommunications [12] and Advanced Life Support Systems (ALS) [34]. Agent simulations may also serve as a theoretical economic testbed to study the effect of alternative market mechanisms on competitor behaviour, as demonstrated by Grossklags et al. [16], Farmer et al. [11] and Brewer et al. [6]. The majority of adaptive agent research in simulated auctions has focused on algorithms for bidding in double auctions, i.e. auctions with multiple buyers and sellers [9, 13, 20, 46]. Double auctions and particularly continuous double auctions (CDA) are an economic mechanism known to be very efficient at allocating resources [42] and are widely used in online and offline markets. Agents for simulated CDA can adaptive agents, auctions, zero intelligence plus. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 offer insights into the effect of market structure on behaviour and offer the possibility of automated traders in real world markets [40]. However, CDA are not the only form of auction gaining in popularity. Auctions with a single seller or buyer, which we refer to as single-sided auctions (or simply as auctions when the meaning is unambiguous), are a format more familiar to the majority of people. The success of consumer to consumer auction sites such as eBay has meant that many individuals have competed in a single sided auction and are aware of the strategic issues involved in bidding. Also, the growth of business to business auction services such as Freight Traders (Freight Traders Ltd, http://www.freight-traders.com/.) has meant that many companies are considering shifting their procurement methods from a request to tender system to one that employs auctions. Business to consumer auctions are also growing in popularity, as witnessed by the decision by Google to handle their sale of shares via an online single-sided Dutch auction [53]. Single sided auctions are also a crucial mechanism in many agent based systems such as the Contract Net Protocol [41]. Singlesided auctions can have many formats. The four most commonly researched formats are: 54 55 56 57 58 59 60 61 62 – The open ascending price or English auction, where bidders submit increasing bids until no bidders wish to submit a higher bid; – The open descending price or Dutch auction, where the price moves down from a high starting point until a bidder bids, at which point the auction terminates; – The first-price sealed-bid auction (FPSB), where each bidder submits a single bid, the highest bidder gets the object and pays the amount he bid; – A second-price sealed-bid auction (SPSB), where each bidder submits a single bid, the highest bidder gets the object and pays the second highest bid (also known as a Vickrey auction). 63 64 65 66 The strategies and learning issues in single-sided auctions are fundamentally different to those in a CDA. Our research aim is to study the behaviour of autonomous adaptive agents in alternative protocols for single-sided auctions. Some of the key issues in the study of auctions are: 67 68 69 – what are the optimal strategies for a given auction structure; – how do agents learn the optimal strategy; and – how does the restriction of information prevent agents from learning a strategy? 70 71 72 73 74 75 76 77 78 These questions have been addressed through auction theory [48], field studies [31, 38], experimental lab studies [2, 26, 42], and, in CDA, agent simulations [9, 13, 14, 20, 22, 23, 46]. Our broad aim is to extend the study of agent simulations into the area of single-sided auctions. It is our belief that prior to examining agent behaviour in complex, dynamic, multi-adaptive agent systems, a proposed agent architecture should be tested in learning environments where a known optimal strategy exists. Agents should be able to solve simpler learning problems before being applied to complex problems of the same type. Auction theory provides us with a class of single-sided auctions where, under some fundamental assumptions, there is provably optimal behaviour. This class of auction has Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 79 80 81 82 83 also been extensively used in experimental studies with human agents. In Section 2 we describe the sealed-bid auction mechanism and the private values model commonly used in theoretical analysis and experimental studies. The question we address is, how well do learning mechanisms developed for the CDA perform in single-sided auctions? More specifically, the objectives of this paper are to: 84 85 86 87 88 89 90 91 92 93 – specify a sealed-bid auction format often used in economics experiments (for example, see [31, 38]) as an agent problem, and simulate auctions with agents following a provably optimal strategy; – adapt popular agent architectures for the CDA, Cliff’s Zero Intelligence Plus agents [9] and Gjerstad Dickhaut agents [13], for FPSB and SPSB auctions; – determine the level of complexity of agents required to learn the optimal strategy when competing against a population of non-adaptive agents following the known optimal strategy; – evaluate how well the agent algorithms perform when competing against each other in One vs. Many and Many vs. Many scenarios. 94 95 96 97 98 99 100 101 In Section 3 we review some of the adaptive agent mechanisms used in the CDA. In Section 4 we describe how we have adapted these algorithms for sealed-bid auctions to form three new adaptive agent types. Section 5 presents the results of the first set of experiments using four different adaptive agent architectures competing against a population of optimal agents (agents following the optimal strategy, described in Section 2.3). Section 5 describes a sequence of five sets of experiments in environments of increasing complexity. Finally, in Section 6, we discuss the differences in performance of the three adaptive agents and discuss how we intend to extend this research. 102 2. The auction model 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 The first game theory analysis of auctions by Vickrey [48] concentrated on describing behaviour in a small number of basic models. In the 40 years since Vickrey’s seminal work, auction theory has developed to consider a wider class of models and behaviours ([27, 28, 32] for reviews of auction theory). The key elements in describing an auction are: a specification of the assumptions about the parameters applicable to an agent; a description of the protocol, or market rules, under which the auction operates; and a description of the information available to the agent before, during and after an auction. We adopt a commonly adopted specification, the Private Values Model (PVM), decribed in Section 2.1. The PVM was initially proposed by Vickrey [48] and has been commonly adopted in auction theory [28] and experimental studies [42]. Under the PVM first and second price sealed-bid auctions are strategically equivalent to Dutch and English auctions, respectively [48]. Hence we restrict our attention to market rules that define FPSB and SPSB auctions (given in Section 2.2). The optimal strategies for agents under the PVM are presented in Section 2.3. The information made available to an agent in order to reach this optimal behaviour is described in Section 2.4 Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 118 2.1. Private value model (PVM) 119 120 121 122 123 124 125 126 127 The PVM involves an auction of N interested bidders. Each bidder i has a valuation xi of the single object. Each xi is an observation of an independent, identically distributed random variable X i with range [0, φ] (φ is the universal maximum price) and distribution function F. The benefits of this model are that for certain auction mechanisms and assumptions there is provably optimal behaviour. Hence, the PVM allows us to measure the performance of adaptive agents and assess under what conditions learning is most effective. This is a necessary condition to studying more interesting (and realistic) scenarios where the assumptions under the PVM concerning the competitors behaviour do not necessarily hold true. 128 2.2. Auction protocols 129 130 The fact we restrict our attention to FPSB and SPSB auctions means that each agent can submit at most one bid for any auction. An agent i forms a bid bi with a bid function 131 132 133 134 135 136 137 138 139 140 βi : [0, φ] → + , βi (xi ) = bi The set of all bids for a particular auction is denoted B = {b1 , b2 , . . . , b N }. For both FPSB and SPSB, the winning agent, w, is the highest bidder, w = arg max bi . (1) 0<i≤N So the bid of the winning agent is bw . The price paid by the winning agent, p, is dependent on auction structure. In a FPSB, the price paid is the highest bid, p = max (bi ∈ B), (2) 0<i≤N hence p = bw . In a SPSB, the price paid is the second highest bid, p= max 0<i≤N ,i=w (bi ∈ B). (3) 141 142 143 144 145 146 Proponents of multi-agent systems believe that trading agents will first be used where the allocation problem is simple, interactions are repeated frequently and the goods traded of relatively low value [30]. Sealed bid auctions have the benefit of being fast, well studied and private (in that the agents are unaware of the other bids). Sealed bid auctions have been used in a variety of contexts such as e-commerce and networking (e.g. [10, 24, 51]). 147 2.3. Optimal strategies in FPSB and SPSB auctions 148 In a sealed-bid auction an agent’s profit (or reward) is x − p if i = w r (xi ) = i 0 otherwise. 149 Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 (4) AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 150 151 152 153 154 155 156 157 158 159 160 161 All agents are risk neutral, i.e. they are attempting to find the strategy that maximises their profit. A dominant equilibrium strategy is a bid function that maximised the profit of the agent independent of the other agents’ bids. Thus if a dominant equilibrium strategy exists, it clearly provides the optimal bid strategy. However, generally, the optimal strategy will differ depending on the strategy of the opponents and hence no dominant equilibrium exists. In this case we loosen our definition of optimal to include symmetric equilibrium strategies. A symmetric equilibrium is a Nash equilibrium in which all bidders follow the same strategy. Hence a bid function, β ∗ , is a symmetric equilibrium if any one agent can do no better (in terms of maximising expected reward) than follow β ∗ if all other agents use β ∗ . We define the optimal strategy for an auction as the dominant equilibrium, if such as strategy exists. If there is no dominant strategy, we define the optimal as a symmetric equilibrium strategy, if such an equilibrium exists. 162 163 164 165 166 167 168 2.3.1. First Price Sealed Bid. The symmetric equilibrium strategy in a FPSB auction for agent i is to select a bid equal to the expected value of the largest of the other agents’ private values, given that the largest of these values is less than the private value of agent i, i.e. the agent should bid as high as all the other agents’ values (not bids) without exceeding its own value. More formally, let Y1 , Y2 , . . . , Y N −1 be the order statistics of the agent values X 1 , X 2 , . . . , X N −1 . The symmetric equilibrium strategy for another agent with value x is to bid the expected value of the largest of the other values, i.e. 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 β(x) = E(Y N −1 |Y N −1 < x) (5) The optimal strategy is dependent on the form and assumed commonality of the value distribution function F and the independence of the bidders’ values. When F is a uniform distribution on [0, 1] the symmetric equilibrium strategy is N −1 ·x (6) N The optimal strategy when competing against agents known to be following a suboptimal strategy is to bid the expected value of the highest of the other bids. This optimal bidding strategy may be a highly complex function of value with no analytical solution. For example, consider the Zero Intelligence Constrained (ZI-C) agents used by Gode and Sunder in the CDA [14]. Each ZI-C agent bids on a uniform range from 0 to its value x. We can thus consider the bid from a ZI-C agent as a random variable defined as the product of two uniformly distributed random variables with range [0, 1], β(x) = W =Y · X From Springer [43] we can deduce that the distribution of bids is h(w) = − ln(w) (7) The distribution of the largest bid of n ZI-C agents is the cumulative distribution function (cdf) of the largest order statistic of n independent, identically distributed random variables W1 , W2 , . . . , Wn with density given by Equation 7. The cdf of the largest order statistic, Yn , is given by G(yn ) = (F(yn ))n = −(yn (ln(yn ) − 1))n . Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 189 The optimal strategy is 190 191 β Z∗ I −C (x) = E(Yn |Yn < x) = x 0 y · g(y) dy G(x) which is β Z∗ I −C (x) = 192 n x · (ln(x) − 1) 0 x y · ln(y) · (y · (ln(y) − 1))n−1 dy. (8) 193 194 195 196 The optimal strategy against ZI-C agents (that we could have derived from expected profits in SPSBs via the revenue equivalence theorem) is not a linear function of value. Figure 1 shows how function β Z∗ I −C tails off for higher values, indicating that a greedier strategy should be adopted. 197 198 2.3.2. Second Price Sealed Bid. The symmetric equilibrium strategies in a SPSB auction for an agent with value x is simply to bid x, i.e. β(x) = x. 199 200 201 202 203 204 (9) The optimal strategy for a SPSB auction is independent of the form of the distribution function F and does not require that all bidders have the same value function. So, for example, the optimal strategy against a population of ZI-C agents is also to bid your value. Proofs and a more complete description of auction formats are given in Krishna [28]. 0.8 0.7 0.6 Bid 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Value Figure 1. Optimal bid function when competing in a FPSB auction against a population of 19 ZI-C agents. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 205 2.4. Simulation structure 206 207 208 209 Agents compete in a series of k auctions. For any auction each bidder i is assigned a value xi by sampling F. Allocation of trading rights and private values occurs prior to trade commencing. This practice is consistent with early experimental economics [42]. Prior to bidding in an auction the PVM assumes that bidder i is aware of: 210 211 212 213 1. 2. 3. 4. 214 Once the auction is complete, any agent i knows: 215 216 217 1. whether they won the auction or not, a boolean variable q; 2. the price the winning agent will pay, p. 3. Their own reward, as given Equation 4. 218 219 220 221 222 223 224 225 No other information relating to the other agents’ bids is made available. In fact, GD and ZIP S1 do not use F and N , and none of the agents require the universal maximum value, φ. We allow the agents to know the winning price (but not necessarily the winning bid) but not the identity of the winner because we believe this is the most commonly adopted real world model, particularly in on-line auctions. We discuss in Section 4 how this restriction of information affects an agent’s ability to learn a good strategy. The agent is also unaware of k, the number of auctions in any experiment. 226 3. Agents in double auctions 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 The first prominent research into software agents in auctions was by Gode and Sunder [14], who used agents bidding randomly within a budget constraint (ZI-C agents) to demonstrate that, under certain conditions, the CDA mechanism was able to drive a market consisting of agents operating under no intelligence towards a competitive equilibrium. The ZI-C experiments were effectively a demonstration of non tatonnement1 mechanism of the CDA driving agents along the Marshallian path2 to equilibrium. Despite zero intelligence having been shown to effectively model real world financial markets [11], it has been demonstrated that in many market scenarios zero intelligence is not enough to drive the market to equilibrium [6, 9]. This has lead to a wide variety of agent mechanisms being proposed for double auction simulations. There has also been a shift in research objective from the economics driven goal of showing under what conditions the market behaves optimally towards the agent based objective of determining which agent structure performs best in competition with other algorithms. A tournament of competing agent algorithms for the CDA is described in Rust et al. [39, 40]. Agents have also been developed for competing in multiple auctions simultaneously [1, 25, 36] and a regular competition, The Trading Agent Competition (TAC) [15], is now held for agents in this type of market. the the the the observed value xi of random variable X i ; distribution function common to all bidders, F; universal maximum value, φ; number of competitive bidders, N . Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 Many of the proposed algorithms for CDA and multiple simultaneous auctions are non-adaptive, in that they follow a fixed, though not necessarily pure, strategy independent of experience. For example, ZI-C agents for CDA is non-adaptive as is the SouthamptonTAC agent used for the 2001 TAC [18]. In contrast, the SouthamptonTAC agent for the 2002 TAC is adaptive [19]. Several non-adaptive strategies have been proposed for CDA, principally for benchmarking adaptive algorithms (see, for example, [21, 33]). Our first objective is to evaluate whether adaptive agents can learn to compete effectively against non-adaptive agents in sealed-bid auctions. We base the adaptive agent architectures for FPSB and SPSB auctions on those developed for the CDA. There are three categories of adaptive agent that have been proposed for CDA. The simplest type of agent stores no explicit history of information about past auctions and adapts its behaviour based purely on the outcome of the previous auction. Examples of these history free reactive agents used in CDA simulations include Cliff’s Zero Intelligence Plus (ZIP) agents [9], Preist and Tol’s Persistent Shout agent [37], the Q-learning agent (QLA) of Hsu and Soo [21] and the modified ZIP agents of Li and Smith [29] and Tesauro and Das [46]. Walverine [7] uses a reactive form of adjustment for the CDA component of TAC-03. These agents differ in how they utilise the information from the market. The key common feature for our research is that reactive agents adopt a bid function of the form given in Equation 10, and attempt to learn the optimal value of the parameter µ using reinforcement learning based on information about the outcome of the previous auction. βi (xi ) = (1 − µ) · xi (10) In Section 4.1 we describe how we adapt this reactive architecture for sealed bid single sided auctions with a family of architectures we denote ZIP S . The second type of agent for CDA store some historical information about past auctions and adjust their strategy based on an estimate of a global picture of auction outcomes. We call these type of agent history based agents. Gjerstad and Dickhaut [13] propose an agent structure that forms an estimate of the probability of each bid winning, a belief function, q(b), and the profit to be obtained from each possible bid for a given value, a surplus function s(x, b). Based on these functions, it chooses the bid that maximises its expected profit. The GD agents have been modified by Tesauro and Das [46] (MGD) so that the belief function is truncated based on an estimate of the global minimum and maximum price derived from the previous trading period. GD has been further extended by Tesauro and Bredin [45] to include a mechanism to optimise long term profits using a forward estimate of the profitability of bid success and a dynamic programming formulation of the value function. He et al. [20] and He and Jennings [19] present a fuzzy logic approach towards history based agents. The Fuzzy Logic (FL) agents form fuzzy rules based on the current ask and bid and the median price of the auctions stored in the history. The Speculation agent of Li and Smith [29] also uses a rule based approach to determine a bid based on price predictions derived from historical information. Vytelingum et al. [50] propose an algorithm involving a two stage process of modelling the risk of trades and estimating the equilibrium based on the history of transactions. Stone et al. [44] present an algorithm for the Hotel component of TAC-2000 that uses a form of logistic regression to predict prices based on the previous auction data. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 288 289 290 291 292 293 294 295 296 297 298 299 300 301 History based agents require greater amounts of memory and take longer to decide on a bid than reactive agents. The third type of agent we consider is even more complex. This class of agent, which we call modelling agent, also store historic market information, but use it to form models of the behaviour of other agents and hence find the optimal bid. Hu and Wellman [22, 23] and Vidal and Durfee [49] describe frameworks for alternative levels of modelling, based on the assumed complexity of the other bidders. The p-strategy of Park et al. [33] models the CDA as a Markov chain of states. We do not consider a model based approach for single seller sealed-bid auctions, primarily because in most sealed-bid auction formats an agent is not informed of the identity of the winner or of the bids of the other agents. This means it is impossible to model the competitors, even if the pool of agents remains constant. We restrict our attention to reactive and history based architectures. In Section 4 we give an overview of ZIP and GD agents and describe how these reactive and history based agents can be adapted for FPSB and SPSB single seller auctions. 302 4. Agents for sealed bid single auctions 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 The majority of work on agents in auctions has concentrated on double auction, multiple single auction and combinatorial auction models. An exception is Brandt and Weiss [5], where a model including “antisocial” reactive agents is proposed to demonstrate the potential problems with SPSB auctions. In this Section we provide an overview of a class of reactive agents based on the ZIP algorithm [9], and a history based agent, GD [13] from the perspective of a buyer in a CDA. We describe how we adapt these architectures for single auctions. We cannot simply consider single auctions as a special case of CDAs, because the information available to the agent, and hence the learning problem, is fundamentally different. In double auctions, the agent receives a stream of data of asks, bids, and transactions. For a single auction, an agent is only aware of whether it won the auction or not (a boolean variable q) and the price the winner must pay, p. It is not told the bids of any of the other agents or even necessarily the winning bid. This means it may have to make guesses, estimates or inferences from the data in order to assess the quality of the bids it makes. The estimates and inferences the agent can make from ( p, q) are dependent on the auction rules. In a FPSB auction when the agent did not win (q is false), the agent may infer that the largest bid of the other agents was p. If the agent wins a FPSB auction, it may only infer that all the other bids were less than its own, bi ≤ p, ∀i ∈ N . In a SPSB auction when the agent did not win (q is false), the agent may infer that the second largest bid was p, but it can only conclude that the highest bid must be larger than or equal to p. If the agent wins a SPSB auction it can infer that the largest bid of the other agents was p. These inferences may be used in the adaptive method of determining strategy described in Sections 4.1 and 4.2. 327 4.1. Reactive agents for sealed bid auctions 328 329 Commonly, reactive agents for CDA adopt a linear bid function (Equation 10), and attempt to learn a margin, µ, representing the fraction above or below value x at which Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 the agent bids. Broadly speaking, reactive agents update their margin in a two step process. The first step is to determine (or estimate) what the best bid in the previous auction would have been, and hence the best margin. The second step, if deemed appropriate, is to adjust the margin to be closer to the optimal margin from the previous auction. We denote the agent’s a posteri estimate of the “best" bid for auction j as o j . The desired margin, d j , is the margin that would have lead to the optimal bid, dj =1− oj . xj (11) The difference between the desired margin and the actual margin for auction j, denoted j , is used to update the margin for auction j + 1. If the learning rate is α, then j = β(d j − µ j ). (12) A normal Widrow–Hoff update would involve updating the margin directly with j , µ j+1 = µ j + j . However, large variations in margin can result from the wide range of observable optimal bids. To counter this effect, Cliff’s ZIP agents employ a momentum co-efficient γ to smooth the update variable j . This new update, Γ j , is defined as Γ j+1 = γ Γ j + (1 − γ ) j . (13) where γ ∈ [0, 1]. Larger values of γ result in greater smoothing (i.e. reduce the effect on the margin of the current update). The margin is then updated with Γ j as given in Equation 14. µ j+1 = µ j + Γ j . (14) So, the key procedures to defining a reactive agent specify: 351 352 353 1. how to estimate the optimal bid from the current information about the market; 2. how to adjust margin (e.g. whether to use momentum); and 3. under what market conditions to update the margin. 354 355 356 357 358 359 360 361 362 363 364 365 For example, Cliff’s ZIP agents: estimate the optimal bid, or target price, from the price of the last transaction perturbed by uniform random noise; use momentum to smooth the update; and only update the margin downwards when an agent has a good to trade [9]. Preist and van Tol’s PS agents are very similar, except they adopt a target price based on the lowest offer or highest bid so far seen, perturbed by noise [37]. In single seller auctions, the procedures required are identical, but because of the different nature of the information available to the agent, the same solutions cannot be employed. The key issue of how to estimate the optimal bid for auction j from the data ( p j , q j ) is dependent on the inferences that can be made from the data and the current parameter values for margin, bid and value (µ j , b j , x j ). Firstly, if p j is greater than the agent’s value there is no bid that could have yielded a positive reward. The choice of what to do with the information ( p j , q j ) when p j > x is Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 366 367 368 equivalent to the third CDA design choice of under what market conditions to update the margin. Secondly, when p j ≤ x j , there are two distinct cases requiring different approaches: 369 370 371 372 373 374 1. The agent wins (q is true). We characterise this situation as the agent being greedy, hence it increases its margin by estimating the optimal bid to be lower than the bid it made (o j < b j ). 2. The agent loses (q is false) but could have made a profit ( p j ≤ x j ). In this situation the agent is fearful and becomes more cautious by reducing its margin. It estimates the optimal bid to have been greater than the bid it made in the auction (o j > b j ). 375 376 377 378 379 380 381 382 Under these broad requirements, there are many schemes we could adopt to estimate the optimal bid and adjust the margin. A full investigation of the effects of alternative mechanisms such as momentum and parameter settings e.g. population size and learning rate, can be found in Toft and Bagnall [47]. Here we present two reactive agents. Both agents adjust their margin using momentum, and only update their margin when the price is less than value. They differ only in how they estimate the optimal bid o j . The first, ZIP S1 is closely modelled on ZIP for CDA. The second agent, ZIP S2 attempts to more accurately estimate the optimal bid using the PVM assumptions. 383 384 4.1.1. ZIP S1 optimal bid estimates. ZIP S1 agents select their optimal bid by randomly sampling a range of values given by 385 o j = b j · Ra + A a (15) 386 387 388 where R and A are observations of independent random variables with a uniform distribution. The range of R and A are determined by whether the agent is in a state of greed or fear. 389 390 391 392 393 394 1. Fear: If the agent loses, but possibly could have won, fear directs the agent to decrease the margin by estimating the optimal bid to be higher than the current bid, hence A ∈ [0.0, Amax ] and R ∈ [1.0, Rmax ]. 2. Greed: When the agent wins, its greed makes it increase the margin, hence the optimal bid is lower than the current bid (o j < b j ). To achieve this we set A ∈ [Amin , 0.0] and R ∈ [Rmin , 1.0]. 395 396 397 398 399 400 4.1.2. ZIP S2 optimal bid estimates. ZIP S1 agents increase or decrease their margin based on whether the best bid would have been higher or lower than their actual bid. However, the agent can sometimes infer more than this about the optimal bid. The ability of an agent to more accurately estimate what the best bid would have been is dependent firstly on auction structure and secondly on the assumptions of the PVM. Based purely on the auction structure, the agent can determine the following. 401 402 403 1. Fear: under a FPSB auction, if the agent loses (q is false) but could have made a profit ( p j < x j ), the optimal bid would have been p j + δ, where δ is the smallest bid increment. Under a SPSB auction the agent only knows the second highest bid of the other Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 404 405 406 407 408 409 410 411 412 413 agents was p j . Thus the highest bid of the other agents, and hence the optimal bid, must be in the range [ p j , x]. 2. Greed: under a FPSB auction, if the agent wins (q is true) and pays price p j , it knows only that the highest bid of the other agents is less than p j , hence o j ∈ [0, p j ]. In a SPSB auction, it knows that the highest bid of the other agents is p j . Because of the SPSB auction rules, the optimal bid would have been any bid greater than p j . However, the agent makes the assertion that profit could have been maximised by bidding less. We do this because we do not want to give the ZIP S2 agent a priori knowledge of the optimal strategy and we wish to adopt a common structure between FPSB and SPSB. 414 415 416 In the FPSB fear and SPSB greed scenarios the agent has perfect information in that it knows what bid would have been optimal. In these cases we set the optimal bid to be the maximum bid of the other agents (i.e. the price), 417 oj = pj 418 419 420 421 422 In the FPSB greed and SPSB fear scenarios the agent is faced with imperfect information, since it is not able to calculate exactly the optimal bid. Instead, it estimates the optimal bid by making broad assumptions. We assume: the agent knows the number of bidders, N ; the bidders are symmetric; the distribution of private values F is uniform; and the universal maximum value it may take is φ = 1.0. 423 424 425 FPSB Greed (q true, p j = b). The agent assumes all losing bids are uniformly distributed over the range [0, p j ]. The expected value of the second highest bid is estimated as 426 427 428 429 430 oj = N −1 pj N SPSB Fear (q false, p j < b). The agent estimates the highest and winning bid to be in the range [ p j , φ] by using the expected value of the highest order statistic, given by oj = N pj. N −1 431 432 433 434 435 In Bagnall and Toft [3, 4] and Toft and Bagnall [47] we present experiments with alternative mechanisms for estimating the optimal bid. ZIP S2 is the architecture that makes the most use of the information commonly assumed to be available under the private value model, and as such provides the greatest contrast to ZIP S1 , which utilises very little of the market information to direct learning behaviour. 436 4.2. GD agents for sealed bid auctions 437 GD agents for CDA [13] form a strategy based on the recent history of bids and offers. Because we deal with auctions with a single seller and many buyers, we describe GD Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 agents for CDA in terms of buyers only. The premise of GD agents is that if the agent knew the probability of any bid b being accepted (described by function q(b)) and the surplus for making an accepted bid b when the agent’s value is x (given by a surplus function s(x, b)) then the natural risk-neutral strategy to adopt would be to choose the bid that maximised the expected surplus E(x, b) = q(b) · s(x, b). GD agents do not know q(b). Instead, they estimate q(b) with a belief function q̂(b) derived from the auction history using Equation 16. q̂(b) = B(b) + A(b) B(b) + A(b) + R(b), (16) where B(b) is the number of transacted bids at or below b, A(b) is the number of transacted asks at or below b and R(b) is the number of rejected bids at or below b. The surplus is a linear function of price and value, as given by Equation 4. GD agents for CDA assume that the price they will pay will be equal to their bid, hence the surplus function is simply s(x, b) = x − b (17) The estimation of expected surplus is then given by the function E(x, b) = q̂(b) · s(x, b) (18) 455 456 457 458 and the strategy of a GD agent is to select the bid that maximises E. The key features of a GD agent of interest in sealed-bid auctions are the method of estimating the probability of bid success, q(b) and the problem of estimating the surplus for a given bid and value, s(x, b). 459 460 461 462 463 464 465 466 467 4.2.1. Sealed bid GD traders, GD S . Sealed bid auctions present an intrinsically different problem to the CDA for GD agents. Sealed bid GD agents, GD S , maintain a history, H , of length m of auction outcomes. Each history entry i consists of a h i = ( pi , bi ), where pi is the price the winner paid, bi the bid. The problem for a GD agent is to form q(b) and s(x, b) from H . The rules of allocation and information revelation in sealed-bid auctions restrict the quantity of information with which agents may learn about the seller or competing buyers. These restrictions on information are caused by the structure of sealed-bid auctions, primarily 468 469 470 471 472 473 1. Sealed Bidding: The sealed nature of bids prevents agents from observing competitors’ bids hence learning about their competitors valuation or strategy. 2. One Sidedness: Sealed-bid auctions are one sided, the seller is inactive and agents are only concerned with their competing bidders. In CDA agents may observe sellers offers and adjust their strategy to increase the likelihood of a profitable trade occurring. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 474 475 476 477 3. One Shot Deal: A sealed bid auction is a one shot deal unlike other one sided auctions. For example, in an English auction bidders may observe all rejected bids. Agents in sealed-bid auctions are only able to gain useful information once the auction has ended. 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 These restrictions mean that alternative mechanisms need to be designed for first price and second price GD agent. A further complexity is added to our simulations by the fact that we sample values on a continuous interval for each auction. CDA experiments are usually conducted with a small number of fixed values (in order to predetermine the supply and demand curve). The effect of this is that the belief function only need be estimated at a fixed finite number of points to evaluate all possible bids. Typically cubic spline interpolation is used over five to eight points in the memory [13, 45]. We require the agent to estimate the belief function and the surplus function on the entire possible range of bids, and this is a harder learning problem. The agent objective is to learn a bidding function that maps values onto bids. If the domain of this bidding function is a small number of fixed points then the learning problem is easier, even though the range of the bid function is continuous. So, for example, if the values in a FPSB auction are fixed, ZIP S1 will converge to the optimal much faster than if the values are sampled on [0, 1]. This is demonstrated in Figure 2, which shows the results for a single ZIP S1 competing against four optimal agents with both fixed and interval values. The optimal strategy is found much faster, and there is much less variation around it. To specify a GD S agent we describe how it estimates the surplus function and the belief function. Assume, momentarily, that an agent knows both the winning bid and price for any past auction. 0.5 0.45 Bidding Margin 0.4 0.35 0.3 0.25 0.2 Fixed values 0.15 Uniform random value Optimal 0.1 1 501 1001 1501 2001 2501 Number of Auctions Figure 2. Two runs of a five bidder FPSB auction. The lines show the margins when the values are sampled uniformly on [0,1] and when the values are randomly selected from either 0.2, 0.4, 0.6 or 0.8. The straight line at 0.2 is the optimal strategy. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 497 498 4.2.1.1. Surplus function. If an agent submits a bid b and then assumes they will win and pay a price p, the surplus function is given by s(x, p) = x − p 499 500 501 502 503 504 (19) 4.2.1.2. Belief function. Let the winning bid bw for auction j be denoted bw, j . If an agent is aware of the winning bids, then an obvious estimate for the probability of a bid b winning is the proportion of winning bids in the history at or below b. More formally, if we let 1 if bw, j ≤ b (20) T (b, bw, j ) = 0 otherwise. 506 then, assuming the history is full and of length m, m j=1 T (b, bw, j ) q̂(b) = . m 507 508 509 q̂(b) is the empirical cumulative distribution function found from the order statistics of the winning bids in H . So q̂(b) is calculated by sorting the winning bids in H into ascending order 505 Y = y1 , y2 , . . . , ym 510 511 512 (21) then estimating q̂(b) at each point yi using the formula 0 if b < y1 q̂(b) = mj if y j ≤ b < y j+1 1 if b > ym 1 0.6 (a) 0.9 (b) 0.5 0.8 0.7 0.4 0.6 0.3 r*(b) q(b) (22) 0.5 0.4 0.3 0.2 0.1 0.2 0 0.1 -0.1 0 0 0.2 0.4 0.6 0.8 1 bid 0 0.2 0.4 0.6 0.8 1 bid Figure 3. (a) Example belief function, q̂(b), formed from a sample history of length 255 in a FPSB auction. (b) Example surplus function, s(x, b) formed from the same history shown in (a), x = 0.76. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 0.05 0.04 0.03 0.02 E(b) 0.01 0 -0.01 -0.02 -0.03 -0.04 -0.05 0 0.2 0.4 0.6 0.8 1 bid Figure 4. Example surplus, E formed with belief and surplus function shown in Figure 3 a, b. The estimate of the optimal bid is approximately 0.6, with an expected profit of just over 0.04. 513 514 515 516 4.2.1.3. Example. Figure 3 a, b shows the belief and surplus functions for bids on the range [0.0, 1.0] based on a history of length of 255 and a value x = 0.76 when both the winning bid and price are available. Figure 4 shows the expected surplus function E resulting from the belief and surplus function shown in Figure 3. 517 518 519 520 521 522 523 524 525 526 4.2.1.4. GD implementation. The GD S procedure we have described assumes that the agent knows the winning bid and the price for all the auctions in the history. However, the agent is not explicitly told the winning bid, merely the price. In FPSB auctions, an agent pays a price equal to the winning bid, hence bw, j = p j . The same cannot be applied to SPSB auctions. The price in a SPSB auction is the second highest bid, and unless an agent won the auction it cannot know bw . In addition, the surplus function is dependent on the auction rules. For a FPSB auction, the winner pays their bid, hence the surplus is as given in Equation 17. However, in SPSB, the price is dependent on the other agents’ bids. In SPSB auctions GD agents face problems in forming both belief and payoff functions. 527 528 529 530 531 532 533 534 535 4.2.1.5. SPSB GD agent. In SPSB auctions, the agent is required to estimate the payoff given a bid b, assuming b wins. An agent can only learn about the relationship between winning bids and prices and hence estimate payoff when it actually wins a second price auction. The agent records in a separate personal history structure, H p , data for auctions it has itself won. Consider the agent storing the history of 10 auctions and the personal history of five auctions shown in Table 1. Using the personal history of pairs, the agent is able to determine a least squares regression of bid against payoff. For each pair in the personal history the agent estimates the surplus using its current value and forms a regression model on the resulting points. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS Table 1. Small example history and personal history. 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 History H Price p Bid b 0.79 0.68 0.64 0.55 0.84 0.95 0.86 0.94 0.88 0.75 0.83 0.70 0.66 0.56 0.90 1.00 0.90 0.99 0.92 0.78 Personal history H p Price p Bid b 0.68 0.84 0.85 0.76 0.88 0.70 0.90 0.85 0.80 0.95 This regression model can be used to estimate the surplus for any given bid considering the agents current valuation. Plots of the personal history pairs and a regression of bid against surplus for the example given in Table 1 can be found in Figure 5. To form a belief function an agent must have knowledge of the winning bids. When the winning bid is unknown an agent estimates it using a regression on price and bid formed from the personal history. Where a sufficient quantity of personal history is unavailable an agent estimates the winning bid to be NN−1 · p j . Figure 6 shows the belief function (a) and a payoff function (b) for the personal history shown in Table 1. With a measure of both belief and estimated payoff for all known and estimated winning bids the agent is able to determine expected surplus for each bid. Figure 7 shows expected surplus for the example in Table 1. It is common when an agent has a low private value for the maximum expected surplus to be zero. This occurs when the minimum winning bid in the history is higher than the agent’s current value. In this situation the agent could simply not bid. However, this would mean that it may not participate in auctions where a profit could be 1.0 0.50 (a) (b) y= -0.8135x+ 0.7414 0.9 0.40 0.8 0.7 0.30 s(x,b) p 0.6 0.5 0.4 0.20 0.3 0.10 0.2 0.1 0.00 0.0 0.0 0.0 0.2 0.4 0.6 b 0.8 1.0 -0.10 0.2 0.4 0.6 0.8 1.0 b Figure 5. (a) Personal history bid and price pairs. (b) Regression model of payoff function for personal history pairs, x = 0.86. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 1 0.35 (a) 0.9 0.8 0.25 0.7 0.2 s(x,b) 0.6 q(b) (b) 0.3 0.5 0.4 0.15 0.1 0.3 0.05 0.2 0 0.1 -0.05 0 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 -0.1 1 b b Figure 6. (a) Belief function (b) Payoff function derived from sampling points in (a) on the regression model in Figure 5. 0.04 0.02 E(x,b) 0 0 0.2 0.4 0.6 0.8 1 -0.02 -0.04 -0.06 -0.08 b Figure 7. Expected profit plot from the belief and payoff functions given in Figure 6 a, b. 551 552 553 554 555 556 557 made. Instead of simply ignoring auctions when the history is of no use, the GD S agent adopts a reactive strategy to form a margin. In cases where the maximum expected profit is zero, the agent uses an smoothed estimate of the historical margins adopted in auctions where max (E(x, b)) > 0. It does this by updating a margin estimate, µ j+1 using Equations 12–14 with the margin at which it bids as the desired output, d j . If the agent then encounters a situation where all the winning bids in the history are above it’s current value, it bids using the margin µ j instead. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 558 5. Results 559 560 561 The aim of these experiments is to determine the ability of the reactive and history based agent architectures described in Section 4 to learn a good strategy in sealed-bid auctions. We simulate auctions with five types of agents: 562 563 564 565 566 567 568 569 – ZI-C (Zero Intelligence Constrained) bid uniformly between 0 and their value; – ZIP S1 (Zero Intelligence Plus for single auctions) bid according to the modified ZIP algorithm, described in Section 4.1.1; – ZIP S2 (Estimating ZIP) bid according to the modified ZIP algorithm and utilise more information when estimating the optimal bid, described in Section 4.1.2; – GD S (Gjerstad and Dickhaut for single auctions) bid according to the modified GD algorithm described in Section 4.2; – OPT (Optimal) adopt the symmetric equilibrium optimal margin, given in Section 2.3. 570 571 572 573 574 575 576 577 578 579 580 581 582 583 A single run consists of 10,000 auctions (either FPSB or SPSB) with 20 agents bidding for a single good. In each auction each agent is assigned a private value. Every agents’ value is an observation of an independent, identically distributed random variable with a uniform density on [0, 1]. An experiment consists of 100 runs with identical settings. Learning parameters are re-initialised at the beginning of each run. In order to understand how each agent performs in alternative scenarios, we alter the number of each type of agent competing in each experiment. By changing the number of each type of agent in the auction we can present the adaptive agents with learning tasks of different levels of complexity. The experiments start with static, non-adaptive problems where the optimal strategy can be deduced. In these more controlled environments we can objectively evaluate an algorithm by measuring the behaviour against the optimal strategy. We then progress to more complex, dynamic environments with no obvious optimal strategy. In these environments we can assess how the learning algorithms perform relative to each other. More specifically, we perform the following five sets of experiments. 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 1. Homogeneous. The first set of experiments involve homogeneous populations (i.e. populations of agents of the same type) (Section 5.1). These experiments serve to demonstrate some fundamental characteristics of FPSB and SPSB auctions and to test whether each architecture can reach the symmetric equilibrium solution whilst co-evolving with other agents of the same kind. 2. Against Optimal. The second set evaluates how well a single adaptive agent performs when competing against a population of OPT agents (Section 5.2). The objective is to determine whether the adaptive agents can learn the strategy that is provably optimal. 3. Against ZI-C. In contrast to the second experiments, the third set examines the performance of the five agent architectures when competing against a population of ZI-C agents (Section 5.3). The objective here is to measure performance when competing against a population of irrational, suboptimal agents, but which are nevertheless still non-adaptive. 4. Head to Head. The fourth set of experiments involve a series of head to head runs with populations made up of two of the five types of agent (Section 5.4). These exper- Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 599 600 601 602 603 604 605 606 iments are an extension of the previous sets because we examine how adaptive agents perform against other adaptive agents and we also consider populations with different weightings of agent type. We can hence evaluate which agent type performs best in more complex, dynamic environments in paired environments. 5. Free for All. The fifth set involves a mixed population of ZIP S , GD S and OPT agents competing against each other (Section 5.5). This final set of experiments involves the most complex problem to the adaptive agents and hence gives us the clearest indication of how the alternative architectures may perform in the real world. 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 To assess performance of the agents we use three metrics, all calculated over the final 5000 auctions. The first two are the total profit achieved and the average bidding margin. Profit and margin provide a measure of an individual agent’s performance and allow comparison with other agents and the optimal strategy. The third performance measure is the revenue efficiency, defined as the average of the ratio of the price paid and the maximum value of the private values over the last 5000 auctions. Efficiency can be used to gauge the effect of bidding strategies on the auctioneers revenue. The agent parameters used for all experiments are: Amax = 0.01, Amin = −0.01, Rmax = 1.05, Rmin = 0.95, β = 0.1, γ = 0.7, m = 1000 and µ0 = 0.5. These parameters were set to be consistent with previously published research. The effect of changes to population size, learning parameters, initial conditions and memory size are evaluated through extensive experimentation in Toft and Bagnall [47]. To summarise, ZIP was found to be insensitive to population size and initial condition, but sensitive to momentum; low momentum results in ‘knee-jerk’ reactions to auction outcomes and reduced performance. Larger memory size was found to improve the performance of GD. Generally, the parameter settings that lead to faster convergence tend to result in worse bidding strategies and hence lower profit. 624 5.1. Homogeneous agent populations 625 626 Table 2 shows the results for homogeneous populations of the five agent types. Table 2 demonstrates several characteristics of the auctions. 627 628 629 630 631 – The theoretical revenue equivalence of FPSB and SPSB auctions is demonstrated by the average efficiencies and profits of the OPT agents, which are not significantly different. – ZI-C agents adopt a consistently greedy bidding strategy, which results in greater profits and the lowest market efficiency. 632 These experiments also illustrate several characteristics concerning the adaptive agents. 633 634 635 636 637 – In FPSB auctions ZIP S1 agents are able to co-adapt towards the symmetric equilibrium strategy with no external guidance or pre-existing equilibrium to guide them. In SPSB ZIP S1 learn a strategy that is very close to the optimal. In both auction types they learn a margin that is slightly too greedy. This has a greater impact on profit in SPSB than FPSB. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 638 639 640 641 642 643 644 645 646 647 – In FPSB auctions both the ZIP S2 and GD agents adopt a margin significantly above that of the symmetric equilibrium strategy, and hence the overall average profit is higher. The fact that ZIP S2 and GD learn a suboptimal strategy (reflected in the lower market efficiency), means that they could potentially be exploited by an agent that adopts a better strategy. – In SPSB auctions the GD agents learn a strategy close to the symmetric equilibrium, but tend to adopt a negative margin and bid slightly above their value. This results in the occasional negative profit. Hence, SPSB GD agents have a lower average profit compared to other agent strategies (including OPT), and the market of GD agents is the most efficient. 648 649 650 651 652 653 654 The fact that ZIP agents can co-adapt towards the optimal strategy demonstrates the robustness of the ZIP algorithm. Figure 8 shows the margins of six ZIP S1 in a series of 5000 FPSB auctions. It clearly demonstrates the trend towards optimality, and also shows how this evolution towards the equilibrium is independent of the initial margin. However, the ability of an agent to compete efficiently against duplicates of itself does not provide enough evidence to conclude that ZIP S1 is better for auctions generally. Hence we extend our experiments to consider non-homogeneous populations. 655 5.2. Single adaptive agent vs multiple optimal agents 656 657 658 The results for a single agent competing against 19 optimal agents in FPSB and SPSB auctions are shown in Table 3. The main conclusions that can be drawn from the results presented in Table 3 are as follows: 659 660 661 662 663 – In both FPSB and SPSB auctions, reactive agents (ZIP S1 and ZIP S2 ) learn a margin close to the optimal. However, the profits of the ZIP S1 and ZIP S2 are in fact significantly worse than the optimal (at the 1% level, using both a two sample paired t-test and a nonparametric Mann–Whitney test). Hence we conclude that the reactive agents tend to be slightly too greedy when competing against optimal agents. Table 2. Agent and auctioneer statistics for homogeneous scenarios. FPSB ZI-C ZIP S1 ZIP S2 GD Optimal SPSB ZI-C ZIP S1 ZIP S2 GD Optimal Agents Profit Margin Auctioneer Efficiency 30.19(±0.3097) 12.04(±0.1971) 16.29(±0.1127) 15.85(±1.2018) 11.91(±0.0077) 0.5001(±0.0009) 0.0525(±0.0008) 0.0702(±0.0005) 0.0716(±0.0094) 0.0500(±0.0000) 77.85%(±0.0016) 94.87%(±0.0008) 93.09%(±0.0005) 92.97%(±0.0056) 95.02%(±0.0000) 60.38(±0.3685) 13.48(±0.1653) 13.78(±0.2439) 10.64(±0.4014) 11.93(±0.1496) 0.5000(±0.0010) 0.0074(±0.0003) 0.0090(±0.0004) −0.0062(±0.0020) 0.0000(±0.0000) 65.20%(±0.0014) 94.31%(±0.0007) 94.15%(±0.0010) 95.36%(±0.0019) 95.01%(±0.0006) Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT Table 3. Agent and auctioneer statistics for One vs. Many optimal scenarios. FPSB ZI-C ZIP S1 ZIP S2 GD SPSB ZI-C ZIP S1 ZIP S2 GD S 664 665 666 667 668 669 670 671 672 673 674 675 676 Agent Profit Margin Auctioneer Efficiency Optimal agents Profit 1.39(±0.3421) 11.53(±0.7215) 11.49(±0.7788) 11.82(±0.6622) 0.500(±0.0036) 0.051(±0.0026) 0.063(±0.0019) 0.055(±0.0042) 94.81%(±0.0002) 95.02%(±0.0001) 94.97%(±0.0001) 95.02%(±0.0000) 12.43(±0.0165) 11.92(±0.0301) 12.04(±0.0284) 11.89(±0.0371) 94.56%(±0.0007) 94.99%(±0.0007) 94.99%(±0.0005) 95.10%(±0.0007) 13.06(±0.1749) 11.99(±0.1625) 11.98(±0.1350) 11.70(±0.1697) 1.16(±0.3930) 0.500(±0.0037) 11.64(±1.0835) 0.0058(±0.0015) 11.76(±0.9946) 0.007(±0.0016) 11.10(±0.8939) −0.0109(±0.0067) – In FPSB auctions, the GD S agent makes a profit that is not significantly less than that made by the optimal agents. However, the average margin for GD S is higher than the optimal margin. This apparent discrepancy can be explained by the fact that GD S does not in fact attempt to form a linear bid function. Instead it concentrates its memory resources on the most profitable areas, which is consistent with Figure 4. Thus GD S bid at the optimal for higher values where there is a greater chance of winning. For low values the agent has very little information about the best bid since it is very unlikely to win, and hence may make suboptimal bids. – In SPSB auctions, the GD agent learns a margin that is significantly different to the optimal and hence makes a significantly worse profit. Furthermore, GD is outperformed by both ZIP agents. In contrast to ZIP, the GD agent is too fearful and overbids. This is demonstrated by the fact the margin for GD SPSB is often negative, meaning they often bid above their value and hence make a loss if they win. 0.5 µ 0.4 0.3 0.2 0.1 0 –0.1 –0.2 –0.3 –0.4 –0.5 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Auction Figure 8. Example of the co-adaptation towards the optimal strategy of 6 ZIP S1 agents in 5000 FPSB auctions. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 The suboptimal performance of the ZIP agents is caused by the fact that, unless the population is very large, when an agent is bidding close to optimal it is highly unlikely to encounter a scenario where it loses but could have won, and thus is less likely to adjust it’s margin downward. A detailed evaluation of the behaviour of ZIP and GD agents in auctions against optimal agents is given in Toft and Bagnall [47]. The difference in performance of the GD agents in FPSB and SPSB auctions can be explained by the different information available in each type of auction. In FPSB the GD agent knows the winning bid and can hence form an accurate estimate of the belief and reward functions. However, in SPSB the agent has to estimate the winning bid based on its own wins. The occasions when the agent wins and makes a large profit tend to skew the estimated reward function and make the agent bid too high, to the point it often adopts a negative margin. If we impose an artificial budget constraint and set the maximum bid to the agent’s value, GD performs as well as the FPSB counterpart (i.e. it finds the optimal strategy). However, we do not impose this constraint on behaviour. Our aim is to test how well agent algorithms perform over alternative auction structures. Hence we want to minimise the auction specific information included in the learning process. There are auction structures where the optimal strategy is to bid over value (e.g. third price sealed-bid auctions), hence a budget constraint is not always applicable. We wish the agent to learn through experience that it is suboptimal to bid over value in SPSB with the minimum domain specific information included in the learning process and without restricting the space of possible bids it may sample. One of our reasons for using SPSB auctions, despite there being a strictly dominant optimal strategy independent of the value function, is that it provides a good test bed for an algorithm. For an algorithm to be considered of wider general use in more complex auction models, we believe that it should be able to converge to the optimal strategy in scenarios where one is known to exist. A further test we adopt is that only information that is trivially obvious to human competitors may hard coded into bidding algorithms. So, for example, we do not allow negative bids. A budget constraint is not intuitively obvious to most people competing in SPSB auctions (Field and experimental studies have observed that human agents sometimes bid above their value in SPSB [26, 31]). Table 4. Agent and auctioneer statistics for One vs. Many ZI-C scenarios. Agent Profit FPSB ZIP S1 ZIP S2 GD OPT SPSB ZIP S1 ZIP S2 GD OPT Margin Auctioneer Efficiency ZI-C agents Profit 0.1635(±0.0055) 0.1330(±0.0031) 0.1351(±0.0051) 0.0500(±0.0000) 78.80%(±0.0018) 79.24%(±0.0015) 78.80%(±0.0018) 80.66%(±0.0017) 26.16(±0.3087) 25.29(±0.3163) 25.72(±0.3669) 23.21(±0.3411) 207.25(±6.0608) 0.0693(±0.0054) 178.15(±5.0170) 0.1503(±0.0037) 194.94(±6.8186) −0.0048(±0.0211) 217.02(±6.4689) 0.0000(±0.0000) 67.81%(±0.0018) 66.93%(±0.0017) 68.70%(±0.0025) 68.44%(±0.0019) 47.25(±0.5041) 51.60(±0.4853) 42.93(±0.9232) 43.93(±0.5531) 96.59(±3.7295) 91.66(±3.7990) 98.11(±3.0139) 47.97(±1.2700) Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 708 709 710 711 The experiments indicate that when competing against optimal agents a memory based approach is superior when the agent has sufficient information available. They also show that memory-free ZIP agents are more robust in that that achieve a profit close to the optimal in both scenarios. 712 5.3. Single adaptive agent vs multiple ZI-C agents 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 The third set of experiments involve a single adaptive agent competing against a population of ZI-C agents. ZI-C agents follow a greedy strategy, in that they will on average bid much lower than value. We can liken a population of predominantly ZI-C agents to a population of implicitly “colluding" agents. The fact that they all bid much lower than their value means that the winner benefits by paying a lower price. In a homogeneous population of ZI-C agents each agent will win as often as any other agent, and hence they will all make a greater profit than agents following an optimal strategy. This means that the market is much less efficient as shown in Table 4. It also means that an adaptive agent competing against a population of ZI-C agents should be able to exploit this inefficiency in order to gain larger profits for itself. The results in Table 4 show that both ZIP and GD agents are able to modify their bidding in FPSB auctions to exploit the greedy strategy of ZI-C agents to make higher a profit. In FPSB auctions GD makes a significantly greater profit than ZIP S1 , and ZIP S1 makes a significantly greater profit than ZIP S2 . GD does better because it is better able to approximate the optimal non-linear bid function for competing against ZI-C (shown in Figure 1). Figure 9 shows the ZIP S1 and ZIP S2 strategy in relation to the optimal for values greater than 0.5. The lower margin of ZIP S2 means it approximates the optimal strategy better for low values. ZIP S1 is better at exploiting the ZI-C agents for high values. 0.9 0.85 0.8 0.75 Bid 0.7 0.65 0.6 0.55 ZIPs2 0.5 ZIPs1 Optimal 0.45 0.4 0.5 0 .6 0.7 0 .8 0.9 1 Value Figure 9. Bid functions of ZIP agents compared to the optimal strategy when competing against 19 ZI-C agents in FPSB auctions. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 Both ZIP agents win more auctions than would be expected if agents won an equal proportion: on average ZIP S1 won 14.08% and ZIP S2 16.3% of the final 5000 auctions. The OPT agent does worse than the adaptive agents, but still performs better than the ZI-C agents. This reinforces the fact that the ZI-C strategy is not an equilbrium. In SPSB auctions, the optimal strategy is to bid value, even against ZI-C agents. As with the first set of SPSB experiments, none of the adaptive agents find a strategy as good as the optimal, and ZIP S1 outperforms GD S . ZIP S2 now does worse of the three adaptive agent architectures because it adopts too large a margin and hence misses opportunities to win auctions. The GD agent adopts a margin much closer to the optimal, but makes less profit because of the tendency to bid higher than value and risk making a loss. On average the GD won 30% of auctions, but on these wins it made losing transactions totalling 15.38. Without these losses, GD would have made profits not significantly different to OPT. Results show that our adaptive agent strategies are able to exploit greedy agents in first and second price auctions to make large profits. GD better approximates the optimal strategy in both auction types, but makes less profit in SPSB because of the tendency to bid higher than value. In addition to testing the learning abilities of the adaptive agents, these experiments demonstrate some of the intrinsic characteristics of FPSB and SPSB auctions. Faced with more sophisticated agents, the optimal strategy for FPSB auctions can be complex and hard to derive. In contrast, the optimal strategy for SPSB is independent of the bidding function used by the opponents (assuming independence between bidders and auctions). On the other hand, the efficiency results in Table 4 indicate that SPSB auctions are less efficient than FPSB auctions with a population of ZI-C agents. 754 5.4. Head to head experiments 755 756 757 758 759 760 761 762 The experiments with adaptive agents in homogeneous populations and populations of non-adaptive agents allowed us to examine how well the agents perform in scenarios where there is a known optimum strategy. The next step requires examining how the adaptive agents perform in auctions against other adaptive agents. In these cases the environment is dynamic, and there is usually no analytic optimal strategy. Following the methodology of related research [46], we examine One vs. Many and Many vs. Many scenarios. In the following sections we omit results for ZI-C agents as they offer no further insights to those presented in Section 5.3. 763 764 5.4.1. One vs Many agents. Tables 5–7 present the results of the One vs. Many experiments. Against a population of ZIP S1 agents we observe that: 765 766 767 768 769 770 771 – OPT agents perform better than all other agents in both FPSB and SPSB. The symmetric equilibrium strategy is a best response to many ZIP S1 . This confirms the observation made in Section 5.1 that populations of many ZIP S1 agents adopt a strategy close to the symmetric equilibrium. – In both FPSB and SPSB auctions the ZIP S2 agent adopts a greedy strategy and as a result scores lower profits than its ZIP S1 counterparts. This result is consistent with Section 5.2 where ZIP S2 agents were also observed to be greedy. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT Table 5. Agent and auctioneer statistics for One vs. Many ZIP S1 scenarios. FPSB ZIP S2 GD OPT SPSB ZIP S2 GD OPT Agent Profit Margin Auctioneer Efficiency Many ZIP S1 Profit 12.07(±0.8642) 12.29(±0.7242) 12.37(±0.7966) 0.0637(±0.0021) 0.0560(±0.0053) 0.0500(±0.0000) 94.77%(±0.0007) 94.88%(±0.0008) 94.88%(±0.0008) 12.28(±0.1830) 12.00(±0.2008) 11.99(±0.1999) 13.43(±1.0841) 12.68(±1.4329) 13.61(±1.0744) 0.0086(±0.0017) 0.0122(±0.0089) 0.0000(±0.0000) 94.31%(±0.0007) 94.25%(±0.0009) 94.36%(±0.0008) 13.48(±0.1872) 13.65(±0.2294) 13.34(±0.1897) Table 6. Agent and auctioneer statistics for One vs. Many ZIP S2 scenarios. FPSB ZIP S1 GD OPT SPSB ZIP S1 GD OPT Agent Profit Margin Auctioneer Efficiency Many ZIP S2 Profit 16.36(±0.8663) 17.11(±0.9362) 16.22(±0.7085) 0.0611(±0.0029) 0.0562(±0.0052) 0.0500(±0.0000) 93.16%(±0.0005) 93.23%(±0.0005) 93.27%(±0.0005) 16.09(±0.1202) 15.87(±0.1533) 15.81(±0.1240) 13.83(±1.1325) 0.0077(±0.0017) 13.74(±1.1371) −0.0016(±0.0064) 14.09(±1.0283) 0.0000(±0.0000) 94.19%(±0.0010) 94.21%(±0.0009) 94.22%(±0.0010) 13.68(±0.2448) 13.62(±0.2189) 13.59(±0.2437) 772 773 – In FPSB, GD does better than the average of the population of ZIP S1 and not significantly worse than the OPT agents. In SPSB the GD agent performs worse than ZIP S2 . 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 Table 6 shows the results for many ZIP S2 . As we previously observed in Section 5.1, many ZIP S2 agents adopt a greedy strategy. As a result of this, competing single agents are able to exploit ZIP S2 agents in both FPSB and SPSB auctions. In FPSB auctions GD agents make the greatest profit. This reinforces our previous observation in Section 5.3 that GD are the best agents at exploiting populations of greedy agents. In SPSB auctions the OPT strategy is of course dominant. ZIP S1 achieves a profit closer to OPT than GD. Table 7 shows the results for many GD agents. They show that a reactive agent can generally do relatively better than the population of GD agents, although in FPSB this results in lower total profits to the agents. These experiments have generally reinforced our previous observations that, firstly, ZIP S2 is dominated by ZIP S1 and GD S , and secondly, that GD agents perform better in FPSB auctions and ZIP S1 do better in SPSB auctions. However, the position is not completely clear cut. For example, in FPSB auctions a single GD agent does better than a population of ZIP S1 agents (Table 5), but conversely a single ZIP S1 agent makes a larger profit than a population of many GD agents (Table 7). This trade-off between the two agents is investigated further in Section 5.4.2. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS Table 7. Agent and auctioneer statistics for One vs. Many GD scenarios. FPSB ZIP S1 ZIP S2 OPT SPSB ZIP S1 ZIP S2 OPT Agent Profit Margin Auctioneer Efficiency Many GD Profit 15.82(±1.0680) 16.16(±1.4212) 19.83(±0.9362) 0.0597(±0.0031) 0.0703(±0.0023) 0.0500(±0.0000) 93.35%(±0.0034) 93.18%(±0.0039) 91.83%(±0.0023) 15.00(±0.7260) 15.38(±0.8050) 19.19(±0.5489) 12.16(±1.0469) 11.80(±1.0705) 11.92(±1.0425) 0.0115(±0.0023) 0.0084(±0.0018) 0.0000(±0.0000) 95.03%(±0.0018) 95.13%(±0.0016) 95.24%(±0.0015) 11.33(±0.3264) 11.13(±0.3802) 10.81(±0.3432) Table 8. Agent and auctioneer statistics for Many vs. Many FPSB scenarios. 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 A vs. B Agents A Profit Margin Agents B Profit Margin Auctioneer Efficiency Many Many Many Many Many Many 14.31 11.88 13.39 11.71 13.01 11.89 14.02 12.18 14.36 12.04 13.54 11.82 93.96% 94.91% 94.10% 94.98% 94.36% 95.02% ZIP S1 vs. Many ZIP S2 ZIP S1 vs. Many GD ZIP S2 vs. Many GD ZIP S1 vs. Many Optimal ZIP S2 vs. Many Optimal Optimal vs. Many GD 0.057 0.052 0.066 0.052 0.065 0.050 0.067 0.059 0.058 0.050 0.050 0.059 5.4.2. Many vs. Many agents. To investigate further the relative ability of the adaptive agents to perform in mixed populations, we examine how changing the proportion of each agent in the population changed behaviour. Tables 8 and 9 give the results for FPSB and SPSB auctions respectively. The FPSB results shown in Table 8 demonstrate that firstly ZIP S2 is outperformed by all competing strategies as a result of greedy bidding (too high a margin) and secondly that GD agents make a significantly larger profit than both of the ZIP S agents. When considered in conjunction with the results in Section 5.4.1, we can conclude that GD agents perform best in populations where there are competitive strategies from which they may learn. When competing with a population of duplicate GD agents they tend towards a suboptimal equilibrium which is potentially exploitable by an agent using a different learning mechanism, as demonstrated by ZIP S1 against 19 GD S (see Table 7). ZIP S1 is only able to exploit GD S when it is in a minority. When the number of ZIP S1 in the population rises above two GD S improves its performance and makes a profit significantly higher than the ZIP S1 agents. Figure 10 shows the average difference in profit (with standard error bars) between ZIP S1 and GD S for populations of between 1 and 19 ZIP S1 agents. It shows that the ZIP S1 agents makes a greater profit when there are only one or two in the population, but with three or more ZIP S1 the GD S does better. The GD S advantage remains constant when approximately 20% or more of the population is ZIP S1 . From the SPSB results given in Table 9 it is clear that the GD agents are outperformed by the ZIP S . As in previous experiments, GD agents tend to be too fearful and hence bid too high. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 0.4 GD P rofit- ZIP P rofit 0.2 0 0 2 4 6 8 10 12 14 16 18 20 -0.2 -0.4 -0.6 -0.8 -1 Number of ZIP agents Figure 10. Difference in average profit for GD S and ZIP S1 for a increasing number of ZIP S1 agents in FPSB auctions. The bars represent one standard deviation of the estimate of the mean difference. 812 5.5. Free for all 813 814 815 The final experiments involved a mixed population of ZIP S1 , ZIP S2 , GD S and OPT agents. Table 10 shows the results for FPSB and SPSB populations with 5 of each agent. These results confirm our previous observations that 816 817 818 – In FPSB auctions, GD S makes the greatest profit – In SPSB auctions, ZIP S1 and ZIP S2 outperform GD. There is no significant difference between ZIP S1 and ZIP S2 . 819 820 821 822 823 824 825 All the adaptive agents learn a good strategy and the market tends towards the optimal efficiency of 95% (the efficiency was on average 94.61% in FPSB auctions and 94.92% in SPSB auctions). However, despite the theoretical revenue equivalence of FPSB and SPSB auctions, in our experiments there was a significant difference in the efficiency, and hence the revenue, between the two auction formats. A similar difference in behaviour of human agents in the two auction formats has been observed [31]. In our experiments it is caused by the GD agent bidding too high. 826 5.6. Alternative value distributions 827 828 829 830 831 832 In common with the majority of auction studies [26], our experiments have been conducted using a uniform value distribution. However, it could be maintained that the performance of the algorithms (in both absolute and relative terms) is simply an artefact of the uniform values. To test the robustness of the results, we repeated a selection of the experiments with two alternative beta value distributions. The first, Beta(2,4), has a left skew and the second, Beta(4,2) has a right skew (see Figure 11 for an example). Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 0 1 0 1 Figure 11. Example Beta(2,4) and Beta(4,2) value distributions. Table 9. Agent and auctioneer statistics for Many vs. Many SPSB scenarios. A vs. B Agents A Profit Margin Agents B Profit Margin Auctioneer Efficiency Many Many Many Many Many Many 13.65 14.20 12.11 12.48 12.50 10.52 13.49 0.009 13.09 0.000 12.10 −0.004 12.68 0.000 12.85 0.000 10.00 −0.011 94.25% 94.14% 94.83% 94.71% 94.65% 95.61% ZIP S1 vs Many ZIP S2 ZIP S1 vs Many GD ZIP S2 vs Many GD ZIP S1 vs Many Optimal ZIP S2 vs Many Optimal Optimal vs Many GD 0.008 0.011 0.008 0.006 0.008 0.000 833 834 835 The experiments we ran are summarised in Tables 11 and 12. This subset of the previously conducted experiments was selected to demonstrate that the following general conclusions drawn with a uniform distribution are still valid with a beta value function. 836 837 838 839 1. In Section 5.2 we conclude that ZIP S1 finds a close to optimal strategy against a population of optimal agents in SPSB auctions, whereas GD agents tend to overbid and hence bid suboptimally. Table 11 shows that this pattern of results is also demonstrated when Beta(2,4) and Beta(4,2) value distributions are used. Table 10. Profit results for experiments with 5 of each agent type. FPSB ZIP S1 ZIP S2 GD S OPT SPSB ZIP S1 ZIP S2 GD S OPT Journal: AGNT MS.: PIPS: NO00004948 Average profit Percentage of total 12.59 12.31 12.97 12.88 24.80% 24.26% 25.55% 25.38% 11.89 11.88 11.67 12.14 24.99% 24.97% 24.53% 25.51% TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT Table 11. Results for a single ZIP S1 and GD S competing against 19 optimal agents in SPSB auctions when all agents have Beta value distributions. Agent Profit Beta(2,4) ZIP S1 GD Beta(4,2) ZIP S1 Margin 21.284 (±1.852) 0.011 (±0.004) 19.248 (±1.927) −0.011 (±0.009) 9.171 (±0.788) 0.006 (±0.001) Auctioneer Efficiency Optimal agents Profit 88.15%(±0.0013) 88.33%(±0.0013) 21.405 (±0.269) 21.086 (±0.282) 96.02%(±0.0005) 9.363 (±0.120) Table 12. Results for 10 vs. 10 scenarios with Beta value distributions. 10 ZIP S1 vs. 10 GD ZIP S1 Profit Margin GD Profit FPSB Beta(2,4) FPSB Beta(4,2) SPSB Beta(2,4) 19.964 9.219 22.118 0.118 0.042 0.017 22.823 10.001 20.217 Margin 0.085 0.03 −0.012 Auctioneer Efficiency 87.61% 95.78% 87.89% 840 841 842 2. In Section 5.4.2 we observe that GD agents make a significantly larger profit than ZIP agents in mixed population FPSB auctions. The results shown in Table 12 demonstrate that GD also outperforms ZIP when values from a Beta distribution are used. 843 844 845 Although not exhaustive, the results with values following a Beta distribution give us confidence that the conclusions drawn in Section 5 will hold for a wide class of alternative value functions. 846 6. Conclusions 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 In this paper we have presented a model of the class of single seller auctions and have described how we have adapted popular agent architectures for the CDA to single seller auctions: we include two versions of Cliff’s ZIP algorithm [9], ZIP S1 and ZIP S2 , and a hybridised version of the Gjerstad Dickhaut agent [13], GD S . Single seller auctions present a quite different learning challenge to that of double auctions. This is due to the differences in information revelation and allocation processes in these auction types. We describe two commonly used single seller auction structures, first price and second price sealed-bid auctions. We then present results from an extensive series of experiments (approximately 80 million auctions were simulated for the results presented in this paper) designed to evaluate the ability of the three different learning mechanisms to learn a strategy. The first series of experiments were testing how well each agent type could learn in a more controlled, static environment with a known optimal strategy. The second set of experiments involved evaluating how agents performed in competition with each other. One of the objectives of agent research into economic scenarios is to gain insight into the market mechanisms being modelled. Our experiments have reinforced observations made in the auction theory literature. Firstly, under the PVM, the FPSB and SPSB market Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 mechanisms drive very simple learning agents towards the symmetric equilibrium solution. Secondly, the experiments with a population of ZI-C agents demonstrated that with irrational agents, a FPSB is more efficient than a SPSB, and that it is easier to learn to exploit irrational agents in a SPSB due to the dominant strategy of bidding value. Thirdly, under controlledconditions the theoretical revenue equivalence of FPSB and SPSB can be observed, but in auctions where there are many different kinds of learning agent competing, SPSB auctions are more efficient. The other objective of this research is to assess how well alternative agent architectures perform in sealed-bid auction simulations. Against a population of agents following the optimal strategy, ZIP S agents learn a good, but suboptimal, strategy in both FPSB and SPSB auctions. GD S agents learn an optimal strategy in FPSB but not in SPSB auctions unless a budget constraint is imposed. ZIP S and GD S exhibit the same pattern of performance against populations of (suboptimal) ZI-C agents. The adaptive agents mirror the tendency in experimental studies of human competitors to bid above the dominant or Nash equilibrium [26]. In a homogeneous population, ZIP S1 agents co-adapt towards the symmetric equilibrium solution in FPSB and SPSB auctions, whereas ZIP S2 and GD S converged to a suboptimal – but mutually more profitable – equilibrium. The performance of ZIP S1 in single seller auctions directly mirrors the fact that ZIP agents tend towards the market equilibrium in CDA auctions (see [9]). The convergence of ZIP S2 and GD S to a more profitable equilibrium may seem a desirable outcome, but the equilibrium they reach is not stable and if another agent were to enter the market it could exploit this fact to make extra profits. ZIP S1 outperforms ZIP S2 in the large majority of scenarios considered. The major difference between ZIP S1 and ZIP S2 is that, unlike ZIP S1 , ZIP S2 uses the magnitude of profits made to alter strategy. The fact that using this extra information to alter strategy results in a worse solution is in itself interesting. ZIP S2 tend to overreact to a single extreme result, and as such converge too quickly and get stuck in a local optimum in the strategy space. The fact ZIP S1 ignores the level of profit means it can explore a greater proportion of the strategy space, ignoring what would seem like highly important recent information, in order to get closer to the global optimal. Unlike ZIP S2 , neither GD nor ZIP S1 require information about the distribution function, F, and the number of agents, N , and yet GD nor ZIP S1 consistently outperform ZIP S2 . This suggests that, at least with a uniform distribution of values, the information required to theoretically derive the optimal strategy is not necessary to learn the optimal through experience. Indeed, the extra information about what would have been optimal in the past may actually reduce the quality of the overall strategy. In competition against each other, the general pattern of performance observed in competition against the non-adaptive agents is repeated. ZIP S1 performs robustly, producing good results in FPSB and SPSB auctions, both in relation to a known optimal strategy and to other agents. ZIP S1 performs better than the other agents in mixed SPSB auctions. GD S generally does better than ZIP S1 in FPSB, reaching the optimal strategy when one is known to exist and performing better than the other agents in the mixed agent auctions, except for the case when a small number of ZIP S1 agents can exploit the tendency of a population of GD S to converge to a suboptimal solution. In SPSB auctions, GD S is Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 outperformed by ZIP S1 , because of the fact GD S is required to estimate it’s belief and reward function from a set of results that may be unrepresentative of the true trends. Generally, GD S agents perform best in populations where there are competitive strategies from which they may learn, whereas ZIP S1 tend to be more competitive, but in some environments cannot fine tune their strategy to do quite as well as agents with a greater memory resource. This work could be extended in many ways. Given that GD S has a general trend to learn a fearful strategy (i.e. overbid) and ZIP S1 tend to be too greedy (i.e. underbid), an obvious extension would be to attempt to marry GD S and ZIP S1 using an ensemble approach. Currently we are experimenting with alternative architectures to do this. We have also experimented with alternative methods for GD S to estimate the belief function in SPSB auctions. There are also many market scenarios we could consider. Porter and Shoham [35] show that the symmetric equilibrium of FPSB auctions is robust even when competing against cheating agents. It would be interesting if this robustness is also evident in populations of learning agents. Another important scenario worth investigation is cases where agents may enter or leave the market place. The greater dynamism this scenario presents would provide a new challenge for the agent architectures. We will also consider alternative auction structures such as English auctions under models other than the PVM. Recently there have been many architectures proposed for CDA that take into account the timing of bids (see, for example [19, 45]). These methods could be used as a basis for designing agents for single seller auctions that attempt to model competitors’ strategies. We would also like to bridge the gap between our work and the body of research into agents for multiple English auctions and combinatorial auctions (for example, see [1, 25, 44]). In order to gain widespread credibility as a potential real world application, we believe that it is a necessary condition for any architecture proposed for multiple or combinatorial auctions to be shown to perform well in simple auctions against known optimal strategies. Our approach will allow us to test any new algorithms developed on series of problems of increasing complexity. 937 Notes 938 939 940 1. A Walrasian tatonnement mechanism is a protocol by which an auctioneer attempts to engineer convergence to equilibria. 2. A Marshallian path is a sequence of trades such that the last trade is necessarily at the equilibrium. 941 References 942 943 944 945 946 947 948 1. P. Anthony and N. R Jennings, “Developing a bidding agent for multiple heterogeneous auctions,” ACM Trans. Internet Technol., vol. 3, no. 3, pp. 185–217, 2003. 2. D. Ariely, A. Ockenfels, and A. E. Roth, An experimental analysis of ending rules in internet auctions. Published by Max Planck Institute for Research into Economic Systems, Strategic Interaction Group, Discussion Papers on Strategic Interaction, 2002. 3. A. J. Bagnall and I. E Toft, “An agent model for first price and second price private value auctions,” in Proceedings of the 6th International Conference on Artificial Evolution, pp. 145–156, 2003. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 4. A. J. Bagnall and I. E Toft, “Zero intelligence plus and Gjerstad–Dickhaut agents for sealed bid auctions,” in Workshop on Trading Agent Design and Analysis, part of international conference on autonomous agents and multiagent systems (AAMAS-2004), pp. 59–64, 2004. 5. F. Brandt and G. Weiss, “Antisocial agents and Vickrey auctions,” in Revised Papers from the 8th International Workshop on Intelligent Agents VIII, Springer-Verlag, pp. 335–347, 2002. 6. P. J. Brewer, M. Huang, B. Nelson, and C. R. Plott, On the Behavioral Foundations of the Law of Supply and Demand: Human Convergence and Robot Randomness California Institute of Technology, Social Science Working Paper 1079, 1999. 7. S.-F. Cheng, E. Leung, K. M. Lochner, K. O’Malley, D. M. Reeves, L. J. Schvartzman, and M. P. Wellman, “Walverine: a Walrasian trading agent,” Decision Support Syst., vol. 39, pp. 169–184, 2005. 8. S. H. Clearwater, Market-Based Control: A Paradigm for Distributed Resource Allocation. World Scientific Publishing Co., Inc., 1996. 9. D. Cliff and J. Bruten, Minimal-Intelligence Agents for Bargaining Behaviours in Market-Based Environments. Technical Report, HP Labs, June 1997. 10. K. E. Drexler and M. S. Miller, “Incentive engineering for computational resource management,” in B. A. Huberman (ed.), The Ecology of Computation, 1988. 11. J. D. Farmer, P. Patelli, and I. I. Zovko, The Predictive Power of Zero Intelligence in Financial Markets AFA 2004 San Diego Meetings, 2004. 12. M. A. Gibney, N. R. Jennings, N. J. Vriend, and J. M. Griffiths, “Market-based call routing in telecommunications networks using adaptive pricing and real bidding,” in Proceedings of the Third International Workshop on Intelligent Agents for Telecommunications Applications, pp. 50–65, 1999. 13. S. Gjerstad and J. Dickhaut, “Price formation in double auctions,” Games Econ. Behav., vol. 22, no. 1, pp. 1–29, 1998. 14. D. K. Gode and S. Sunder, “Allocative efficiency of markets with zero intelligence traders: market as a partial substitute for individual rationality,” J. of Political Econ., vol. 101, no. 1, pp. 119–137, 1993. 15. A. Greenwald and P. Stone, “Autonomous bidding agents in the trading agent competition,” IEEE Internet Comput., vol. 5, no. 2, pp. 52–60, 2001. 16. J. Grossklags, C. Schmidt, and J. Siegel, Dumb Software Agents on an Experimental Asset Market Working Paper, School of Information and Management Systems, UC Berkeley, 2000. 17. M. He, H. Leung, and N. R. Jennings, “A fuzzy logic based bidding strategy for autonomous agents in continuous double auctions,” IEEE Trans. Knowledge Data Eng., vol. 15, no. 6, pp. 1345–1363, 2003. 18. M. He and N. R. Jennings, “SouthamptonTAC: an adaptive autonomous trading agent,” ACM Trans. Internet Technol., vol. 3, no. 3, pp. 218–235, 2003. 19. M. He and N. R. Jennings, “Designing a successful trading agent: a fuzzy set approach,” IEEE Trans. Fuzzy Syst., vol. 12, no. 3, pp. 389–410, 2004. 20. M. He, N. R. Jennings, and H. Leung, “On agent-mediated electronic commerce,” IEEE Trans. Knowledge Data Eng., vol. 15, no. 4, pp. 985–1003, 2003. 21. W. Hsu and V. Soo, “Market performance of adaptive trading agents in synchronous double auctions,” in Proceedings of the 4th Pacific Rim International Workshop on Multi-Agents, Intelligent Agents, SpringerVerlag, pp. 108–121, 2001. 22. J. Hu and M. P. Wellman, “Conjectural equilibrium in multiagent learning,” Machine Learning, vol. 33, 1998. 23. J. Hu and M. P. Wellman, “Online learning about other agents in a dynamic multiagent system,” in Proceedings of the Second International Conference on Autonomous Agents, ACM Press, pp. 239–246, 1998. 24. B. Huberman and S. H. Clearwater, “A multiagent system for controlling building environments,” in Proceedings of the First International Conference on Multiagent Systems, pp. 171–176, 1995. 25. N. R. Jennings, M. He, and A. Prugel-Bennett, “An adaptive bidding agent for multiple english auctions: a neuro-fuzzy approach,” in Proceedings of the IEEE Conference on Fuzzy Systems, Budapest, Hungary, 2004. 26. J. H. Kagel and A. E. Roth (eds.), The Handbook of Experimental Economics. Princeton University Press, 1995. 27. P. Klemperer, “Auction theory: a guide to the literature,” J. Econ. Surveys, vol. 13, no. 3, pp. 227–286, 1999. 28. V. Krishna, Auction Theory, Academic Press: San Diego, California, 2002. Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34 BAGNALL AND TOFT 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 29. L. Li and S. F. Smith, “Speculation agents for dynamic multi-period continuous double auctions in B2B exchanges,” in Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS’04), 2004. 30. M. Luck, P. McBurney, and C. Priest, Agent Technology: Enabling Next Generation Computing. A Roadmap for Agent Based Computing. AgentLink, England, 2003. 31. D. Lucking-Reiley, “Using field experiments to test equivalence between auction formats: magic on the internet,” Am. Econ. Rev., 1999. 32. R. P. McAfee and J. McMillan, “Auctions and bidding,” J. Econ. Literature, vol. 25, pp. 699–738, 1987. 33. S. Park, E. H. Durfee, and W. P. Birmingham, “An adaptive agent bidding strategy based on stochastic modelling,” in Proceedings of the Third International Conference on Autonomous Agents, pp. 147–153, 1999. 34. C. W. Pawlowski, A. M. Bell, S. Crawford, W. A. Sethares, and C. Finn, “An adaptive agent bidding strategy based on stochastic modelling,” in Proceedings of 31st International Conference on Environmental Systems, 2001. 35. R. Porter and Y. Shoham, “On cheating in sealed-bid auctions,” J. Decision Support Syst., vol. 35, pp. 41– 54, 2004. 36. C. Preist, A. Byde, and C. Bartolini, “Economic dynamics of agents in multiple auctions,” in Proceedings of the 5th International conference on Autonomous Agents, 2001, pp. 545–551. 37. C. Preist and M. van Tol, “Adaptive agents in a persistent shout double auction,” in Proceedings of the First International Conference on Information and Computation Economies, ACM Press, pp. 11–18, 1998. 38. A. E. Roth and A. Ockenfels, “Last-minute bidding and the rules for ending second-price auctions: evidence from eBay and Amazon on the internet Am. Econ. Rev., 2003 (Forthcoming). 39. J. Rust, J. Miller, and R. Palmer, “The double auction market institution: a survey,” in D. Friedman and J. Rust, (eds.), The Double Auction Market Institutions, Theories, and Evidence: Proceedings of the Workshop on Double Auction Markets Held June, 1991 in Santa Fe, New Mexico. Santa Fe Institute, Addison-Wesley, 1991. 40. J. Rust, J. Miller, and R. Palmer, “Behaviour of trading automata in a computerized double auction market,” in D. Friedman and J. Rust, (eds.), The Double Auction Market Institutions, Theories, and Evidence: Proceedings of the Workshop on Double Auction Markets Held June, 1991 in Santa Fe, New Mexico. Santa Fe Institute, Addison-Wesley, 1991. 41. R. Smith, “The Contract Net Protocol: high-level communication and control in a distributed problem solver,” IEEE Trans. Computers, vol. C-29, no. 12, pp. 1104–1113, 1980. 42. V. Smith, “An experimental study of market behavior,” J. Polit. Econ., vol. 70, no. 2, pp. 111–137, 1962. 43. M. D. Springer, The Algebra of Random Variables, John Wiley & Sons, 1979. 44. P. Stone, M. L. Littman, S. Singh, and M. Kearns, “ATTac-2000: an adaptive autonomous bidding agent,” J. Artif. Intell. Res., vol. 15, pp. 189–206, 2001. 45. G. Tesauro and D. Bredin, “Strategic sequential bidding in auctions using dynamic programming,” in AAMAS, Bolgna, Italy, 2002. ACM. 46. G. Tesauro and R. Das, “High-performance bidding agents for the continuous double auction,” in Third ACM Conference on Electronic Commerce, pp. 206–209, 2001. 47. I. E. Toft and A. J. Bagnall, Adaptive Agents for Simulated Sealed Bid Auctions. Technical Report CMPC04-03, School of Computing Sciences, University of East Anglia, 2004. 48. W. Vickrey, “Counterspeculation, auctions, and competitive sealed tenders,” J. Finance, vol. 16, pp. 8–37, 1961. 49. J. M. Vidal and E. H. Durfee, “The impact of nested agent models in an information economy,” in Proceedings of the Second International Conference on Multi-Agent Systems, pp. 377–384, 1996. 50. P. Vytelingum, R. K. Dash, E. David, and N. R. Jennings, “A risk-based bidding strategy for continuous double auctions,” in Proceedings of the 16th European Conference on Artificial Intelligence, Valencia, Spain, 2004. 51. C. A. Waldsburger, T. Hogg, B. Huberman, J. O. Kephart, and S. Stornetta, “Spawn: a distributed computational economy,” IEEE Trans. Software Eng., vol. 18, no. 2, pp. 103–117, 1992. 52. M. Wooldridge and N. R. Jennings, “Intelligent agents: theory and practice,” Knowledge Eng. Rev., vol. 10, no. 2, pp. 115–152, 1995. 53. G. Zuckerman, “Google shares prove winners at least for a day,” Wall Street J., 20 Aug. 2004: (C1, C3). Journal: AGNT MS.: PIPS: NO00004948 TYPESET DISK LE CP Disp.: 8/10/2005 Pages: 34
© Copyright 2026 Paperzz