Autonomous Adaptive Agents for Single Seller

Autonomous Agents and Multi-Agent Systems
© 2005 Springer Science+Business Media, Inc. Manufactured in The Netherlands.
DOI: 10.1007/s10458-005-4948-2
2
Autonomous Adaptive Agents for Single Seller Sealed
Bid Auctions
3
4
ANTHONY BAGNALL AND IAIN TOFT
School of Computing Sciences, University of East Anglia, Norwich, UK
5
Published online: XXX
1
ajb,[email protected]
6
7
8
9
10
11
12
13
14
15
16
17
18
Abstract. In developing open, heterogeneous and distributed multi-agent systems researchers often face a
problem of facilitating negotiation and bargaining amongst agents. It is increasingly common to use auction
mechanisms for negotiation in multi-agent systems. The choice of auction mechanism and the bidding strategy of an agent are of central importance to the success of the agent model. Our aim is to determine the best
agent learning algorithm for bidding in a variety of single seller auction structures in both static environments
where a known optimal strategy exists and in complex environments where the optimal strategy may be constantly changing. In this paper we present a model of single seller auctions and describe three adaptive agent
algorithms to learn strategies through repeated competition. We experiment in a range of auction environments
of increasing complexity to determine how well each agent performs, in relation to an optimal strategy in cases
where one can be deduced, or in relation to each other in other cases. We find that, with a uniform value distribution, a purely reactive agent based on Cliff’s ZIP algorithm for continuous double auctions (CDA) performs
well, although is outperformed in some cases by a memory based agent based on the Gjerstad Dickhaut agent
for CDA.
19
Keywords:
20
1. Introduction
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
The dramatic increase in the quantity of goods and services sold via auctions has fuelled
greater interest in the study of protocols for auction structure and strategies for agent bidding. The potential for software agents in e-commerce, and in auctions in particular, is
enormous [20, 52]. Agent systems can be employed as a practical mechanism by which
individuals and companies may more usefully engage in online commercial activity. The
applications for autonomous, adaptive agents that can compete and learn effectively in
real time online auctions are numerous. For example, Market Based Control (MBC) [8]
systems have been applied to a variety of applications, including air conditioning [24],
network bandwidth, telecommunications [12] and Advanced Life Support Systems (ALS)
[34]. Agent simulations may also serve as a theoretical economic testbed to study the effect
of alternative market mechanisms on competitor behaviour, as demonstrated by Grossklags
et al. [16], Farmer et al. [11] and Brewer et al. [6].
The majority of adaptive agent research in simulated auctions has focused on algorithms for bidding in double auctions, i.e. auctions with multiple buyers and sellers
[9, 13, 20, 46]. Double auctions and particularly continuous double auctions (CDA)
are an economic mechanism known to be very efficient at allocating resources [42]
and are widely used in online and offline markets. Agents for simulated CDA can
adaptive agents, auctions, zero intelligence plus.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
offer insights into the effect of market structure on behaviour and offer the possibility of automated traders in real world markets [40]. However, CDA are not the only
form of auction gaining in popularity. Auctions with a single seller or buyer, which
we refer to as single-sided auctions (or simply as auctions when the meaning is unambiguous), are a format more familiar to the majority of people. The success of consumer to consumer auction sites such as eBay has meant that many individuals have
competed in a single sided auction and are aware of the strategic issues involved in bidding. Also, the growth of business to business auction services such as Freight Traders
(Freight Traders Ltd, http://www.freight-traders.com/.) has meant that many companies
are considering shifting their procurement methods from a request to tender system to
one that employs auctions. Business to consumer auctions are also growing in popularity, as witnessed by the decision by Google to handle their sale of shares via an
online single-sided Dutch auction [53]. Single sided auctions are also a crucial mechanism in many agent based systems such as the Contract Net Protocol [41]. Singlesided auctions can have many formats. The four most commonly researched formats
are:
54
55
56
57
58
59
60
61
62
– The open ascending price or English auction, where bidders submit increasing bids
until no bidders wish to submit a higher bid;
– The open descending price or Dutch auction, where the price moves down from a high
starting point until a bidder bids, at which point the auction terminates;
– The first-price sealed-bid auction (FPSB), where each bidder submits a single bid, the
highest bidder gets the object and pays the amount he bid;
– A second-price sealed-bid auction (SPSB), where each bidder submits a single bid, the
highest bidder gets the object and pays the second highest bid (also known as a Vickrey auction).
63
64
65
66
The strategies and learning issues in single-sided auctions are fundamentally different
to those in a CDA. Our research aim is to study the behaviour of autonomous adaptive
agents in alternative protocols for single-sided auctions. Some of the key issues in the
study of auctions are:
67
68
69
– what are the optimal strategies for a given auction structure;
– how do agents learn the optimal strategy; and
– how does the restriction of information prevent agents from learning a strategy?
70
71
72
73
74
75
76
77
78
These questions have been addressed through auction theory [48], field studies [31,
38], experimental lab studies [2, 26, 42], and, in CDA, agent simulations [9, 13, 14, 20,
22, 23, 46]. Our broad aim is to extend the study of agent simulations into the area of
single-sided auctions. It is our belief that prior to examining agent behaviour in complex,
dynamic, multi-adaptive agent systems, a proposed agent architecture should be tested in
learning environments where a known optimal strategy exists. Agents should be able to
solve simpler learning problems before being applied to complex problems of the same
type. Auction theory provides us with a class of single-sided auctions where, under some
fundamental assumptions, there is provably optimal behaviour. This class of auction has
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
79
80
81
82
83
also been extensively used in experimental studies with human agents. In Section 2 we
describe the sealed-bid auction mechanism and the private values model commonly used
in theoretical analysis and experimental studies. The question we address is, how well do
learning mechanisms developed for the CDA perform in single-sided auctions? More specifically, the objectives of this paper are to:
84
85
86
87
88
89
90
91
92
93
– specify a sealed-bid auction format often used in economics experiments (for example,
see [31, 38]) as an agent problem, and simulate auctions with agents following a provably optimal strategy;
– adapt popular agent architectures for the CDA, Cliff’s Zero Intelligence Plus agents [9]
and Gjerstad Dickhaut agents [13], for FPSB and SPSB auctions;
– determine the level of complexity of agents required to learn the optimal strategy when
competing against a population of non-adaptive agents following the known optimal
strategy;
– evaluate how well the agent algorithms perform when competing against each other in
One vs. Many and Many vs. Many scenarios.
94
95
96
97
98
99
100
101
In Section 3 we review some of the adaptive agent mechanisms used in the CDA. In
Section 4 we describe how we have adapted these algorithms for sealed-bid auctions to
form three new adaptive agent types. Section 5 presents the results of the first set of
experiments using four different adaptive agent architectures competing against a population of optimal agents (agents following the optimal strategy, described in Section 2.3).
Section 5 describes a sequence of five sets of experiments in environments of increasing
complexity. Finally, in Section 6, we discuss the differences in performance of the three
adaptive agents and discuss how we intend to extend this research.
102
2. The auction model
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
The first game theory analysis of auctions by Vickrey [48] concentrated on describing
behaviour in a small number of basic models. In the 40 years since Vickrey’s seminal
work, auction theory has developed to consider a wider class of models and behaviours
([27, 28, 32] for reviews of auction theory). The key elements in describing an auction are: a specification of the assumptions about the parameters applicable to an agent;
a description of the protocol, or market rules, under which the auction operates; and a
description of the information available to the agent before, during and after an auction.
We adopt a commonly adopted specification, the Private Values Model (PVM), decribed
in Section 2.1. The PVM was initially proposed by Vickrey [48] and has been commonly
adopted in auction theory [28] and experimental studies [42]. Under the PVM first and
second price sealed-bid auctions are strategically equivalent to Dutch and English auctions, respectively [48]. Hence we restrict our attention to market rules that define FPSB
and SPSB auctions (given in Section 2.2). The optimal strategies for agents under the
PVM are presented in Section 2.3. The information made available to an agent in order
to reach this optimal behaviour is described in Section 2.4
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
118
2.1. Private value model (PVM)
119
120
121
122
123
124
125
126
127
The PVM involves an auction of N interested bidders. Each bidder i has a valuation xi
of the single object. Each xi is an observation of an independent, identically distributed
random variable X i with range [0, φ] (φ is the universal maximum price) and distribution function F. The benefits of this model are that for certain auction mechanisms and
assumptions there is provably optimal behaviour. Hence, the PVM allows us to measure
the performance of adaptive agents and assess under what conditions learning is most
effective. This is a necessary condition to studying more interesting (and realistic) scenarios where the assumptions under the PVM concerning the competitors behaviour do
not necessarily hold true.
128
2.2. Auction protocols
129
130
The fact we restrict our attention to FPSB and SPSB auctions means that each agent can
submit at most one bid for any auction. An agent i forms a bid bi with a bid function
131
132
133
134
135
136
137
138
139
140
βi : [0, φ] → + , βi (xi ) = bi
The set of all bids for a particular auction is denoted B = {b1 , b2 , . . . , b N }. For both FPSB
and SPSB, the winning agent, w, is the highest bidder,
w = arg max bi .
(1)
0<i≤N
So the bid of the winning agent is bw .
The price paid by the winning agent, p, is dependent on auction structure. In a FPSB,
the price paid is the highest bid,
p = max (bi ∈ B),
(2)
0<i≤N
hence p = bw . In a SPSB, the price paid is the second highest bid,
p=
max
0<i≤N ,i=w
(bi ∈ B).
(3)
141
142
143
144
145
146
Proponents of multi-agent systems believe that trading agents will first be used where
the allocation problem is simple, interactions are repeated frequently and the goods
traded of relatively low value [30]. Sealed bid auctions have the benefit of being fast, well
studied and private (in that the agents are unaware of the other bids). Sealed bid auctions
have been used in a variety of contexts such as e-commerce and networking (e.g. [10, 24,
51]).
147
2.3. Optimal strategies in FPSB and SPSB auctions
148
In a sealed-bid auction an agent’s profit (or reward) is
x − p if i = w
r (xi ) = i
0 otherwise.
149
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
(4)
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
150
151
152
153
154
155
156
157
158
159
160
161
All agents are risk neutral, i.e. they are attempting to find the strategy that maximises
their profit. A dominant equilibrium strategy is a bid function that maximised the profit
of the agent independent of the other agents’ bids. Thus if a dominant equilibrium strategy exists, it clearly provides the optimal bid strategy. However, generally, the optimal
strategy will differ depending on the strategy of the opponents and hence no dominant
equilibrium exists. In this case we loosen our definition of optimal to include symmetric
equilibrium strategies. A symmetric equilibrium is a Nash equilibrium in which all bidders follow the same strategy. Hence a bid function, β ∗ , is a symmetric equilibrium if any
one agent can do no better (in terms of maximising expected reward) than follow β ∗ if all
other agents use β ∗ . We define the optimal strategy for an auction as the dominant equilibrium, if such as strategy exists. If there is no dominant strategy, we define the optimal
as a symmetric equilibrium strategy, if such an equilibrium exists.
162
163
164
165
166
167
168
2.3.1. First Price Sealed Bid. The symmetric equilibrium strategy in a FPSB auction
for agent i is to select a bid equal to the expected value of the largest of the other agents’
private values, given that the largest of these values is less than the private value of agent
i, i.e. the agent should bid as high as all the other agents’ values (not bids) without
exceeding its own value. More formally, let Y1 , Y2 , . . . , Y N −1 be the order statistics of the
agent values X 1 , X 2 , . . . , X N −1 . The symmetric equilibrium strategy for another agent
with value x is to bid the expected value of the largest of the other values, i.e.
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
β(x) = E(Y N −1 |Y N −1 < x)
(5)
The optimal strategy is dependent on the form and assumed commonality of the value
distribution function F and the independence of the bidders’ values. When F is a uniform distribution on [0, 1] the symmetric equilibrium strategy is
N −1
·x
(6)
N
The optimal strategy when competing against agents known to be following a suboptimal strategy is to bid the expected value of the highest of the other bids. This optimal
bidding strategy may be a highly complex function of value with no analytical solution.
For example, consider the Zero Intelligence Constrained (ZI-C) agents used by Gode and
Sunder in the CDA [14]. Each ZI-C agent bids on a uniform range from 0 to its value x.
We can thus consider the bid from a ZI-C agent as a random variable defined as the product of two uniformly distributed random variables with range [0, 1],
β(x) =
W =Y · X
From Springer [43] we can deduce that the distribution of bids is
h(w) = − ln(w)
(7)
The distribution of the largest bid of n ZI-C agents is the cumulative distribution function
(cdf) of the largest order statistic of n independent, identically distributed random variables W1 , W2 , . . . , Wn with density given by Equation 7. The cdf of the largest order statistic, Yn , is given by
G(yn ) = (F(yn ))n = −(yn (ln(yn ) − 1))n .
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
189
The optimal strategy is
190
191
β Z∗ I −C (x) = E(Yn |Yn < x) =
x
0
y · g(y)
dy
G(x)
which is
β Z∗ I −C (x) =
192
n
x · (ln(x) − 1)
0
x
y · ln(y) · (y · (ln(y) − 1))n−1 dy.
(8)
193
194
195
196
The optimal strategy against ZI-C agents (that we could have derived from expected
profits in SPSBs via the revenue equivalence theorem) is not a linear function of value.
Figure 1 shows how function β Z∗ I −C tails off for higher values, indicating that a greedier
strategy should be adopted.
197
198
2.3.2. Second Price Sealed Bid. The symmetric equilibrium strategies in a SPSB auction for an agent with value x is simply to bid x, i.e.
β(x) = x.
199
200
201
202
203
204
(9)
The optimal strategy for a SPSB auction is independent of the form of the distribution
function F and does not require that all bidders have the same value function. So, for
example, the optimal strategy against a population of ZI-C agents is also to bid your
value. Proofs and a more complete description of auction formats are given in Krishna
[28].
0.8
0.7
0.6
Bid
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Value
Figure 1. Optimal bid function when competing in a FPSB auction against a population of 19 ZI-C agents.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
205
2.4. Simulation structure
206
207
208
209
Agents compete in a series of k auctions. For any auction each bidder i is assigned a
value xi by sampling F. Allocation of trading rights and private values occurs prior to
trade commencing. This practice is consistent with early experimental economics [42].
Prior to bidding in an auction the PVM assumes that bidder i is aware of:
210
211
212
213
1.
2.
3.
4.
214
Once the auction is complete, any agent i knows:
215
216
217
1. whether they won the auction or not, a boolean variable q;
2. the price the winning agent will pay, p.
3. Their own reward, as given Equation 4.
218
219
220
221
222
223
224
225
No other information relating to the other agents’ bids is made available. In fact, GD
and ZIP S1 do not use F and N , and none of the agents require the universal maximum
value, φ. We allow the agents to know the winning price (but not necessarily the winning bid) but not the identity of the winner because we believe this is the most commonly
adopted real world model, particularly in on-line auctions.
We discuss in Section 4 how this restriction of information affects an agent’s ability
to learn a good strategy. The agent is also unaware of k, the number of auctions in any
experiment.
226
3. Agents in double auctions
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
The first prominent research into software agents in auctions was by Gode and Sunder [14], who used agents bidding randomly within a budget constraint (ZI-C agents)
to demonstrate that, under certain conditions, the CDA mechanism was able to drive
a market consisting of agents operating under no intelligence towards a competitive
equilibrium. The ZI-C experiments were effectively a demonstration of non tatonnement1 mechanism of the CDA driving agents along the Marshallian path2 to equilibrium. Despite zero intelligence having been shown to effectively model real world financial
markets [11], it has been demonstrated that in many market scenarios zero intelligence is
not enough to drive the market to equilibrium [6, 9]. This has lead to a wide variety of
agent mechanisms being proposed for double auction simulations. There has also been a
shift in research objective from the economics driven goal of showing under what conditions the market behaves optimally towards the agent based objective of determining
which agent structure performs best in competition with other algorithms. A tournament
of competing agent algorithms for the CDA is described in Rust et al. [39, 40]. Agents
have also been developed for competing in multiple auctions simultaneously [1, 25, 36]
and a regular competition, The Trading Agent Competition (TAC) [15], is now held for
agents in this type of market.
the
the
the
the
observed value xi of random variable X i ;
distribution function common to all bidders, F;
universal maximum value, φ;
number of competitive bidders, N .
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
Many of the proposed algorithms for CDA and multiple simultaneous auctions are
non-adaptive, in that they follow a fixed, though not necessarily pure, strategy independent of experience. For example, ZI-C agents for CDA is non-adaptive as is the SouthamptonTAC agent used for the 2001 TAC [18]. In contrast, the SouthamptonTAC agent
for the 2002 TAC is adaptive [19]. Several non-adaptive strategies have been proposed
for CDA, principally for benchmarking adaptive algorithms (see, for example, [21, 33]).
Our first objective is to evaluate whether adaptive agents can learn to compete effectively
against non-adaptive agents in sealed-bid auctions. We base the adaptive agent architectures for FPSB and SPSB auctions on those developed for the CDA. There are three categories of adaptive agent that have been proposed for CDA.
The simplest type of agent stores no explicit history of information about past auctions
and adapts its behaviour based purely on the outcome of the previous auction. Examples
of these history free reactive agents used in CDA simulations include Cliff’s Zero Intelligence Plus (ZIP) agents [9], Preist and Tol’s Persistent Shout agent [37], the Q-learning agent (QLA) of Hsu and Soo [21] and the modified ZIP agents of Li and Smith [29]
and Tesauro and Das [46]. Walverine [7] uses a reactive form of adjustment for the CDA
component of TAC-03.
These agents differ in how they utilise the information from the market. The key common feature for our research is that reactive agents adopt a bid function of the form given
in Equation 10, and attempt to learn the optimal value of the parameter µ using reinforcement learning based on information about the outcome of the previous auction.
βi (xi ) = (1 − µ) · xi
(10)
In Section 4.1 we describe how we adapt this reactive architecture for sealed bid single
sided auctions with a family of architectures we denote ZIP S .
The second type of agent for CDA store some historical information about past auctions
and adjust their strategy based on an estimate of a global picture of auction outcomes. We
call these type of agent history based agents. Gjerstad and Dickhaut [13] propose an agent
structure that forms an estimate of the probability of each bid winning, a belief function,
q(b), and the profit to be obtained from each possible bid for a given value, a surplus function s(x, b). Based on these functions, it chooses the bid that maximises its expected profit.
The GD agents have been modified by Tesauro and Das [46] (MGD) so that the belief function is truncated based on an estimate of the global minimum and maximum price derived
from the previous trading period. GD has been further extended by Tesauro and Bredin [45]
to include a mechanism to optimise long term profits using a forward estimate of the profitability of bid success and a dynamic programming formulation of the value function. He
et al. [20] and He and Jennings [19] present a fuzzy logic approach towards history based
agents. The Fuzzy Logic (FL) agents form fuzzy rules based on the current ask and bid
and the median price of the auctions stored in the history. The Speculation agent of Li and
Smith [29] also uses a rule based approach to determine a bid based on price predictions
derived from historical information. Vytelingum et al. [50] propose an algorithm involving
a two stage process of modelling the risk of trades and estimating the equilibrium based on
the history of transactions. Stone et al. [44] present an algorithm for the Hotel component
of TAC-2000 that uses a form of logistic regression to predict prices based on the previous
auction data.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
288
289
290
291
292
293
294
295
296
297
298
299
300
301
History based agents require greater amounts of memory and take longer to decide on
a bid than reactive agents. The third type of agent we consider is even more complex.
This class of agent, which we call modelling agent, also store historic market information, but use it to form models of the behaviour of other agents and hence find the optimal
bid. Hu and Wellman [22, 23] and Vidal and Durfee [49] describe frameworks for alternative levels of modelling, based on the assumed complexity of the other bidders. The
p-strategy of Park et al. [33] models the CDA as a Markov chain of states. We do not
consider a model based approach for single seller sealed-bid auctions, primarily because
in most sealed-bid auction formats an agent is not informed of the identity of the winner or of the bids of the other agents. This means it is impossible to model the competitors, even if the pool of agents remains constant. We restrict our attention to reactive and
history based architectures. In Section 4 we give an overview of ZIP and GD agents and
describe how these reactive and history based agents can be adapted for FPSB and SPSB
single seller auctions.
302
4. Agents for sealed bid single auctions
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
The majority of work on agents in auctions has concentrated on double auction, multiple single auction and combinatorial auction models. An exception is Brandt and Weiss
[5], where a model including “antisocial” reactive agents is proposed to demonstrate the
potential problems with SPSB auctions.
In this Section we provide an overview of a class of reactive agents based on the ZIP
algorithm [9], and a history based agent, GD [13] from the perspective of a buyer in a
CDA. We describe how we adapt these architectures for single auctions. We cannot simply consider single auctions as a special case of CDAs, because the information available
to the agent, and hence the learning problem, is fundamentally different. In double auctions, the agent receives a stream of data of asks, bids, and transactions. For a single auction, an agent is only aware of whether it won the auction or not (a boolean variable q)
and the price the winner must pay, p. It is not told the bids of any of the other agents or
even necessarily the winning bid. This means it may have to make guesses, estimates or
inferences from the data in order to assess the quality of the bids it makes.
The estimates and inferences the agent can make from ( p, q) are dependent on the auction rules. In a FPSB auction when the agent did not win (q is false), the agent may infer
that the largest bid of the other agents was p. If the agent wins a FPSB auction, it may
only infer that all the other bids were less than its own, bi ≤ p, ∀i ∈ N .
In a SPSB auction when the agent did not win (q is false), the agent may infer that the
second largest bid was p, but it can only conclude that the highest bid must be larger than
or equal to p. If the agent wins a SPSB auction it can infer that the largest bid of the other
agents was p.
These inferences may be used in the adaptive method of determining strategy described
in Sections 4.1 and 4.2.
327
4.1. Reactive agents for sealed bid auctions
328
329
Commonly, reactive agents for CDA adopt a linear bid function (Equation 10), and
attempt to learn a margin, µ, representing the fraction above or below value x at which
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
the agent bids. Broadly speaking, reactive agents update their margin in a two step process. The first step is to determine (or estimate) what the best bid in the previous auction
would have been, and hence the best margin. The second step, if deemed appropriate, is
to adjust the margin to be closer to the optimal margin from the previous auction.
We denote the agent’s a posteri estimate of the “best" bid for auction j as o j . The
desired margin, d j , is the margin that would have lead to the optimal bid,
dj =1−
oj
.
xj
(11)
The difference between the desired margin and the actual margin for auction j, denoted
j , is used to update the margin for auction j + 1. If the learning rate is α, then
j = β(d j − µ j ).
(12)
A normal Widrow–Hoff update would involve updating the margin directly with j ,
µ j+1 = µ j + j .
However, large variations in margin can result from the wide range of observable optimal bids. To counter this effect, Cliff’s ZIP agents employ a momentum co-efficient γ to
smooth the update variable j . This new update, Γ j , is defined as
Γ j+1 = γ Γ j + (1 − γ ) j .
(13)
where γ ∈ [0, 1]. Larger values of γ result in greater smoothing (i.e. reduce the effect on
the margin of the current update). The margin is then updated with Γ j as given in Equation 14.
µ j+1 = µ j + Γ j .
(14)
So, the key procedures to defining a reactive agent specify:
351
352
353
1. how to estimate the optimal bid from the current information about the market;
2. how to adjust margin (e.g. whether to use momentum); and
3. under what market conditions to update the margin.
354
355
356
357
358
359
360
361
362
363
364
365
For example, Cliff’s ZIP agents: estimate the optimal bid, or target price, from the
price of the last transaction perturbed by uniform random noise; use momentum to
smooth the update; and only update the margin downwards when an agent has a good to
trade [9]. Preist and van Tol’s PS agents are very similar, except they adopt a target price
based on the lowest offer or highest bid so far seen, perturbed by noise [37].
In single seller auctions, the procedures required are identical, but because of the
different nature of the information available to the agent, the same solutions cannot be
employed. The key issue of how to estimate the optimal bid for auction j from the data
( p j , q j ) is dependent on the inferences that can be made from the data and the current
parameter values for margin, bid and value (µ j , b j , x j ).
Firstly, if p j is greater than the agent’s value there is no bid that could have yielded a
positive reward. The choice of what to do with the information ( p j , q j ) when p j > x is
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
366
367
368
equivalent to the third CDA design choice of under what market conditions to update the
margin.
Secondly, when p j ≤ x j , there are two distinct cases requiring different approaches:
369
370
371
372
373
374
1. The agent wins (q is true). We characterise this situation as the agent being greedy,
hence it increases its margin by estimating the optimal bid to be lower than the bid it
made (o j < b j ).
2. The agent loses (q is false) but could have made a profit ( p j ≤ x j ). In this situation
the agent is fearful and becomes more cautious by reducing its margin. It estimates the
optimal bid to have been greater than the bid it made in the auction (o j > b j ).
375
376
377
378
379
380
381
382
Under these broad requirements, there are many schemes we could adopt to estimate
the optimal bid and adjust the margin. A full investigation of the effects of alternative
mechanisms such as momentum and parameter settings e.g. population size and learning
rate, can be found in Toft and Bagnall [47]. Here we present two reactive agents. Both
agents adjust their margin using momentum, and only update their margin when the price
is less than value. They differ only in how they estimate the optimal bid o j . The first,
ZIP S1 is closely modelled on ZIP for CDA. The second agent, ZIP S2 attempts to more
accurately estimate the optimal bid using the PVM assumptions.
383
384
4.1.1. ZIP S1 optimal bid estimates. ZIP S1 agents select their optimal bid by randomly
sampling a range of values given by
385
o j = b j · Ra + A a
(15)
386
387
388
where R and A are observations of independent random variables with a uniform distribution. The range of R and A are determined by whether the agent is in a state of greed
or fear.
389
390
391
392
393
394
1. Fear: If the agent loses, but possibly could have won, fear directs the agent to
decrease the margin by estimating the optimal bid to be higher than the current bid,
hence A ∈ [0.0, Amax ] and R ∈ [1.0, Rmax ].
2. Greed: When the agent wins, its greed makes it increase the margin, hence the optimal bid is lower than the current bid (o j < b j ). To achieve this we set A ∈ [Amin , 0.0]
and R ∈ [Rmin , 1.0].
395
396
397
398
399
400
4.1.2. ZIP S2 optimal bid estimates. ZIP S1 agents increase or decrease their margin
based on whether the best bid would have been higher or lower than their actual bid.
However, the agent can sometimes infer more than this about the optimal bid. The ability
of an agent to more accurately estimate what the best bid would have been is dependent
firstly on auction structure and secondly on the assumptions of the PVM. Based purely
on the auction structure, the agent can determine the following.
401
402
403
1. Fear: under a FPSB auction, if the agent loses (q is false) but could have made a profit
( p j < x j ), the optimal bid would have been p j + δ, where δ is the smallest bid increment. Under a SPSB auction the agent only knows the second highest bid of the other
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
404
405
406
407
408
409
410
411
412
413
agents was p j . Thus the highest bid of the other agents, and hence the optimal bid,
must be in the range [ p j , x].
2. Greed: under a FPSB auction, if the agent wins (q is true) and pays price p j , it knows
only that the highest bid of the other agents is less than p j , hence o j ∈ [0, p j ]. In a
SPSB auction, it knows that the highest bid of the other agents is p j . Because of the
SPSB auction rules, the optimal bid would have been any bid greater than p j . However, the agent makes the assertion that profit could have been maximised by bidding
less. We do this because we do not want to give the ZIP S2 agent a priori knowledge
of the optimal strategy and we wish to adopt a common structure between FPSB and
SPSB.
414
415
416
In the FPSB fear and SPSB greed scenarios the agent has perfect information in that
it knows what bid would have been optimal. In these cases we set the optimal bid to be
the maximum bid of the other agents (i.e. the price),
417
oj = pj
418
419
420
421
422
In the FPSB greed and SPSB fear scenarios the agent is faced with imperfect information, since it is not able to calculate exactly the optimal bid. Instead, it estimates the optimal bid by making broad assumptions. We assume: the agent knows the number of bidders, N ; the bidders are symmetric; the distribution of private values F is uniform; and
the universal maximum value it may take is φ = 1.0.
423
424
425
FPSB Greed (q true, p j = b).
The agent assumes all losing bids are uniformly distributed over the range [0, p j ]. The
expected value of the second highest bid is estimated as
426
427
428
429
430
oj =
N −1
pj
N
SPSB Fear (q false, p j < b).
The agent estimates the highest and winning bid to be in the range [ p j , φ] by using the
expected value of the highest order statistic, given by
oj =
N
pj.
N −1
431
432
433
434
435
In Bagnall and Toft [3, 4] and Toft and Bagnall [47] we present experiments with alternative mechanisms for estimating the optimal bid. ZIP S2 is the architecture that makes
the most use of the information commonly assumed to be available under the private
value model, and as such provides the greatest contrast to ZIP S1 , which utilises very little
of the market information to direct learning behaviour.
436
4.2. GD agents for sealed bid auctions
437
GD agents for CDA [13] form a strategy based on the recent history of bids and offers.
Because we deal with auctions with a single seller and many buyers, we describe GD
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
agents for CDA in terms of buyers only. The premise of GD agents is that if the agent
knew the probability of any bid b being accepted (described by function q(b)) and the
surplus for making an accepted bid b when the agent’s value is x (given by a surplus
function s(x, b)) then the natural risk-neutral strategy to adopt would be to choose the bid
that maximised the expected surplus
E(x, b) = q(b) · s(x, b).
GD agents do not know q(b). Instead, they estimate q(b) with a belief function q̂(b)
derived from the auction history using Equation 16.
q̂(b) =
B(b) + A(b)
B(b) + A(b) + R(b),
(16)
where B(b) is the number of transacted bids at or below b, A(b) is the number of transacted asks at or below b and R(b) is the number of rejected bids at or below b.
The surplus is a linear function of price and value, as given by Equation 4. GD agents
for CDA assume that the price they will pay will be equal to their bid, hence the surplus
function is simply
s(x, b) = x − b
(17)
The estimation of expected surplus is then given by the function
E(x, b) = q̂(b) · s(x, b)
(18)
455
456
457
458
and the strategy of a GD agent is to select the bid that maximises E. The key features of
a GD agent of interest in sealed-bid auctions are the method of estimating the probability
of bid success, q(b) and the problem of estimating the surplus for a given bid and value,
s(x, b).
459
460
461
462
463
464
465
466
467
4.2.1. Sealed bid GD traders, GD S . Sealed bid auctions present an intrinsically different problem to the CDA for GD agents. Sealed bid GD agents, GD S , maintain a history, H , of length m of auction outcomes. Each history entry i consists of a h i = ( pi , bi ),
where pi is the price the winner paid, bi the bid. The problem for a GD agent is to form
q(b) and s(x, b) from H .
The rules of allocation and information revelation in sealed-bid auctions restrict the
quantity of information with which agents may learn about the seller or competing buyers. These restrictions on information are caused by the structure of sealed-bid auctions,
primarily
468
469
470
471
472
473
1. Sealed Bidding: The sealed nature of bids prevents agents from observing competitors’ bids hence learning about their competitors valuation or strategy.
2. One Sidedness: Sealed-bid auctions are one sided, the seller is inactive and agents
are only concerned with their competing bidders. In CDA agents may observe sellers
offers and adjust their strategy to increase the likelihood of a profitable trade occurring.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
474
475
476
477
3. One Shot Deal: A sealed bid auction is a one shot deal unlike other one sided auctions. For example, in an English auction bidders may observe all rejected bids.
Agents in sealed-bid auctions are only able to gain useful information once the auction has ended.
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
These restrictions mean that alternative mechanisms need to be designed for first price
and second price GD agent. A further complexity is added to our simulations by the fact
that we sample values on a continuous interval for each auction. CDA experiments are
usually conducted with a small number of fixed values (in order to predetermine the supply and demand curve). The effect of this is that the belief function only need be estimated at a fixed finite number of points to evaluate all possible bids. Typically cubic
spline interpolation is used over five to eight points in the memory [13, 45]. We require
the agent to estimate the belief function and the surplus function on the entire possible
range of bids, and this is a harder learning problem. The agent objective is to learn a bidding function that maps values onto bids. If the domain of this bidding function is a small
number of fixed points then the learning problem is easier, even though the range of the
bid function is continuous. So, for example, if the values in a FPSB auction are fixed,
ZIP S1 will converge to the optimal much faster than if the values are sampled on [0, 1].
This is demonstrated in Figure 2, which shows the results for a single ZIP S1 competing
against four optimal agents with both fixed and interval values. The optimal strategy is
found much faster, and there is much less variation around it.
To specify a GD S agent we describe how it estimates the surplus function and the
belief function. Assume, momentarily, that an agent knows both the winning bid and
price for any past auction.
0.5
0.45
Bidding Margin
0.4
0.35
0.3
0.25
0.2
Fixed values
0.15
Uniform random value
Optimal
0.1
1
501
1001
1501
2001
2501
Number of Auctions
Figure 2. Two runs of a five bidder FPSB auction. The lines show the margins when the values are sampled uniformly on [0,1] and when the values are randomly selected from either 0.2, 0.4, 0.6 or 0.8. The straight line at
0.2 is the optimal strategy.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
497
498
4.2.1.1. Surplus function. If an agent submits a bid b and then assumes they will win
and pay a price p, the surplus function is given by
s(x, p) = x − p
499
500
501
502
503
504
(19)
4.2.1.2. Belief function. Let the winning bid bw for auction j be denoted bw, j . If an
agent is aware of the winning bids, then an obvious estimate for the probability of a bid
b winning is the proportion of winning bids in the history at or below b. More formally,
if we let
1 if bw, j ≤ b
(20)
T (b, bw, j ) =
0 otherwise.
506
then, assuming the history is full and of length m,
m
j=1 T (b, bw, j )
q̂(b) =
.
m
507
508
509
q̂(b) is the empirical cumulative distribution function found from the order statistics of
the winning bids in H . So q̂(b) is calculated by sorting the winning bids in H into
ascending order
505
Y = y1 , y2 , . . . , ym 510
511
512
(21)
then estimating q̂(b) at each point yi using the formula

 0 if b < y1
q̂(b) = mj if y j ≤ b < y j+1

1 if b > ym
1
0.6
(a)
0.9
(b)
0.5
0.8
0.7
0.4
0.6
0.3
r*(b)
q(b)
(22)
0.5
0.4
0.3
0.2
0.1
0.2
0
0.1
-0.1
0
0
0.2
0.4
0.6
0.8
1
bid
0
0.2
0.4
0.6
0.8
1
bid
Figure 3. (a) Example belief function, q̂(b), formed from a sample history of length 255 in a FPSB auction. (b)
Example surplus function, s(x, b) formed from the same history shown in (a), x = 0.76.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
0.05
0.04
0.03
0.02
E(b)
0.01
0
-0.01
-0.02
-0.03
-0.04
-0.05
0
0.2
0.4
0.6
0.8
1
bid
Figure 4. Example surplus, E formed with belief and surplus function shown in Figure 3 a, b. The estimate of
the optimal bid is approximately 0.6, with an expected profit of just over 0.04.
513
514
515
516
4.2.1.3. Example. Figure 3 a, b shows the belief and surplus functions for bids on the
range [0.0, 1.0] based on a history of length of 255 and a value x = 0.76 when both
the winning bid and price are available. Figure 4 shows the expected surplus function E
resulting from the belief and surplus function shown in Figure 3.
517
518
519
520
521
522
523
524
525
526
4.2.1.4. GD implementation. The GD S procedure we have described assumes that the
agent knows the winning bid and the price for all the auctions in the history. However,
the agent is not explicitly told the winning bid, merely the price. In FPSB auctions, an
agent pays a price equal to the winning bid, hence bw, j = p j . The same cannot be applied
to SPSB auctions. The price in a SPSB auction is the second highest bid, and unless an
agent won the auction it cannot know bw . In addition, the surplus function is dependent
on the auction rules. For a FPSB auction, the winner pays their bid, hence the surplus is
as given in Equation 17. However, in SPSB, the price is dependent on the other agents’
bids. In SPSB auctions GD agents face problems in forming both belief and payoff
functions.
527
528
529
530
531
532
533
534
535
4.2.1.5. SPSB GD agent. In SPSB auctions, the agent is required to estimate the payoff given a bid b, assuming b wins. An agent can only learn about the relationship
between winning bids and prices and hence estimate payoff when it actually wins a second price auction. The agent records in a separate personal history structure, H p , data for
auctions it has itself won. Consider the agent storing the history of 10 auctions and the
personal history of five auctions shown in Table 1.
Using the personal history of pairs, the agent is able to determine a least squares
regression of bid against payoff. For each pair in the personal history the agent estimates
the surplus using its current value and forms a regression model on the resulting points.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
Table 1. Small example history and personal history.
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
History H
Price p
Bid b
0.79
0.68
0.64
0.55
0.84
0.95
0.86
0.94
0.88
0.75
0.83
0.70
0.66
0.56
0.90
1.00
0.90
0.99
0.92
0.78
Personal history H p
Price p
Bid b
0.68
0.84
0.85
0.76
0.88
0.70
0.90
0.85
0.80
0.95
This regression model can be used to estimate the surplus for any given bid considering
the agents current valuation. Plots of the personal history pairs and a regression of bid
against surplus for the example given in Table 1 can be found in Figure 5.
To form a belief function an agent must have knowledge of the winning bids. When the
winning bid is unknown an agent estimates it using a regression on price and bid formed
from the personal history. Where a sufficient quantity of personal history is unavailable
an agent estimates the winning bid to be NN−1 · p j . Figure 6 shows the belief function (a)
and a payoff function (b) for the personal history shown in Table 1.
With a measure of both belief and estimated payoff for all known and estimated winning bids the agent is able to determine expected surplus for each bid. Figure 7 shows
expected surplus for the example in Table 1.
It is common when an agent has a low private value for the maximum expected surplus to be zero. This occurs when the minimum winning bid in the history is higher
than the agent’s current value. In this situation the agent could simply not bid. However, this would mean that it may not participate in auctions where a profit could be
1.0
0.50
(a)
(b)
y= -0.8135x+ 0.7414
0.9
0.40
0.8
0.7
0.30
s(x,b)
p
0.6
0.5
0.4
0.20
0.3
0.10
0.2
0.1
0.00
0.0
0.0
0.0
0.2
0.4
0.6
b
0.8
1.0
-0.10
0.2
0.4
0.6
0.8
1.0
b
Figure 5. (a) Personal history bid and price pairs. (b) Regression model of payoff function for personal history
pairs, x = 0.86.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
1
0.35
(a)
0.9
0.8
0.25
0.7
0.2
s(x,b)
0.6
q(b)
(b)
0.3
0.5
0.4
0.15
0.1
0.3
0.05
0.2
0
0.1
-0.05
0
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
1
-0.1
1
b
b
Figure 6. (a) Belief function (b) Payoff function derived from sampling points in (a) on the regression model in
Figure 5.
0.04
0.02
E(x,b)
0
0
0.2
0.4
0.6
0.8
1
-0.02
-0.04
-0.06
-0.08
b
Figure 7. Expected profit plot from the belief and payoff functions given in Figure 6 a, b.
551
552
553
554
555
556
557
made. Instead of simply ignoring auctions when the history is of no use, the GD S agent
adopts a reactive strategy to form a margin. In cases where the maximum expected profit
is zero, the agent uses an smoothed estimate of the historical margins adopted in auctions where max (E(x, b)) > 0. It does this by updating a margin estimate, µ j+1 using
Equations 12–14 with the margin at which it bids as the desired output, d j . If the agent
then encounters a situation where all the winning bids in the history are above it’s current
value, it bids using the margin µ j instead.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
558
5. Results
559
560
561
The aim of these experiments is to determine the ability of the reactive and history based
agent architectures described in Section 4 to learn a good strategy in sealed-bid auctions.
We simulate auctions with five types of agents:
562
563
564
565
566
567
568
569
– ZI-C (Zero Intelligence Constrained) bid uniformly between 0 and their value;
– ZIP S1 (Zero Intelligence Plus for single auctions) bid according to the modified ZIP
algorithm, described in Section 4.1.1;
– ZIP S2 (Estimating ZIP) bid according to the modified ZIP algorithm and utilise more
information when estimating the optimal bid, described in Section 4.1.2;
– GD S (Gjerstad and Dickhaut for single auctions) bid according to the modified GD
algorithm described in Section 4.2;
– OPT (Optimal) adopt the symmetric equilibrium optimal margin, given in Section 2.3.
570
571
572
573
574
575
576
577
578
579
580
581
582
583
A single run consists of 10,000 auctions (either FPSB or SPSB) with 20 agents bidding
for a single good. In each auction each agent is assigned a private value. Every agents’
value is an observation of an independent, identically distributed random variable with
a uniform density on [0, 1]. An experiment consists of 100 runs with identical settings.
Learning parameters are re-initialised at the beginning of each run. In order to understand
how each agent performs in alternative scenarios, we alter the number of each type of
agent competing in each experiment. By changing the number of each type of agent in
the auction we can present the adaptive agents with learning tasks of different levels of
complexity. The experiments start with static, non-adaptive problems where the optimal
strategy can be deduced. In these more controlled environments we can objectively evaluate an algorithm by measuring the behaviour against the optimal strategy. We then progress to more complex, dynamic environments with no obvious optimal strategy. In these
environments we can assess how the learning algorithms perform relative to each other.
More specifically, we perform the following five sets of experiments.
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
1. Homogeneous. The first set of experiments involve homogeneous populations (i.e.
populations of agents of the same type) (Section 5.1). These experiments serve to
demonstrate some fundamental characteristics of FPSB and SPSB auctions and to
test whether each architecture can reach the symmetric equilibrium solution whilst
co-evolving with other agents of the same kind.
2. Against Optimal. The second set evaluates how well a single adaptive agent performs
when competing against a population of OPT agents (Section 5.2). The objective is to
determine whether the adaptive agents can learn the strategy that is provably optimal.
3. Against ZI-C. In contrast to the second experiments, the third set examines the performance of the five agent architectures when competing against a population of ZI-C
agents (Section 5.3). The objective here is to measure performance when competing
against a population of irrational, suboptimal agents, but which are nevertheless still
non-adaptive.
4. Head to Head. The fourth set of experiments involve a series of head to head runs
with populations made up of two of the five types of agent (Section 5.4). These exper-
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
599
600
601
602
603
604
605
606
iments are an extension of the previous sets because we examine how adaptive agents
perform against other adaptive agents and we also consider populations with different
weightings of agent type. We can hence evaluate which agent type performs best in
more complex, dynamic environments in paired environments.
5. Free for All. The fifth set involves a mixed population of ZIP S , GD S and OPT agents
competing against each other (Section 5.5). This final set of experiments involves the
most complex problem to the adaptive agents and hence gives us the clearest indication of how the alternative architectures may perform in the real world.
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
To assess performance of the agents we use three metrics, all calculated over the final
5000 auctions. The first two are the total profit achieved and the average bidding margin. Profit and margin provide a measure of an individual agent’s performance and allow
comparison with other agents and the optimal strategy. The third performance measure is
the revenue efficiency, defined as the average of the ratio of the price paid and the maximum value of the private values over the last 5000 auctions. Efficiency can be used to
gauge the effect of bidding strategies on the auctioneers revenue.
The agent parameters used for all experiments are: Amax = 0.01, Amin = −0.01,
Rmax = 1.05, Rmin = 0.95, β = 0.1, γ = 0.7, m = 1000 and µ0 = 0.5. These parameters
were set to be consistent with previously published research. The effect of changes to
population size, learning parameters, initial conditions and memory size are evaluated
through extensive experimentation in Toft and Bagnall [47]. To summarise, ZIP was
found to be insensitive to population size and initial condition, but sensitive to momentum; low momentum results in ‘knee-jerk’ reactions to auction outcomes and reduced
performance. Larger memory size was found to improve the performance of GD. Generally, the parameter settings that lead to faster convergence tend to result in worse bidding
strategies and hence lower profit.
624
5.1. Homogeneous agent populations
625
626
Table 2 shows the results for homogeneous populations of the five agent types. Table 2
demonstrates several characteristics of the auctions.
627
628
629
630
631
– The theoretical revenue equivalence of FPSB and SPSB auctions is demonstrated by
the average efficiencies and profits of the OPT agents, which are not significantly
different.
– ZI-C agents adopt a consistently greedy bidding strategy, which results in greater profits and the lowest market efficiency.
632
These experiments also illustrate several characteristics concerning the adaptive agents.
633
634
635
636
637
– In FPSB auctions ZIP S1 agents are able to co-adapt towards the symmetric equilibrium
strategy with no external guidance or pre-existing equilibrium to guide them. In SPSB
ZIP S1 learn a strategy that is very close to the optimal. In both auction types they learn
a margin that is slightly too greedy. This has a greater impact on profit in SPSB than
FPSB.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
638
639
640
641
642
643
644
645
646
647
– In FPSB auctions both the ZIP S2 and GD agents adopt a margin significantly above
that of the symmetric equilibrium strategy, and hence the overall average profit is
higher. The fact that ZIP S2 and GD learn a suboptimal strategy (reflected in the lower
market efficiency), means that they could potentially be exploited by an agent that
adopts a better strategy.
– In SPSB auctions the GD agents learn a strategy close to the symmetric equilibrium,
but tend to adopt a negative margin and bid slightly above their value. This results in
the occasional negative profit. Hence, SPSB GD agents have a lower average profit
compared to other agent strategies (including OPT), and the market of GD agents is
the most efficient.
648
649
650
651
652
653
654
The fact that ZIP agents can co-adapt towards the optimal strategy demonstrates the
robustness of the ZIP algorithm. Figure 8 shows the margins of six ZIP S1 in a series of
5000 FPSB auctions. It clearly demonstrates the trend towards optimality, and also shows
how this evolution towards the equilibrium is independent of the initial margin.
However, the ability of an agent to compete efficiently against duplicates of itself
does not provide enough evidence to conclude that ZIP S1 is better for auctions generally.
Hence we extend our experiments to consider non-homogeneous populations.
655
5.2. Single adaptive agent vs multiple optimal agents
656
657
658
The results for a single agent competing against 19 optimal agents in FPSB and SPSB
auctions are shown in Table 3. The main conclusions that can be drawn from the results
presented in Table 3 are as follows:
659
660
661
662
663
– In both FPSB and SPSB auctions, reactive agents (ZIP S1 and ZIP S2 ) learn a margin
close to the optimal. However, the profits of the ZIP S1 and ZIP S2 are in fact significantly worse than the optimal (at the 1% level, using both a two sample paired t-test
and a nonparametric Mann–Whitney test). Hence we conclude that the reactive agents
tend to be slightly too greedy when competing against optimal agents.
Table 2. Agent and auctioneer statistics for homogeneous scenarios.
FPSB
ZI-C
ZIP S1
ZIP S2
GD
Optimal
SPSB
ZI-C
ZIP S1
ZIP S2
GD
Optimal
Agents
Profit
Margin
Auctioneer
Efficiency
30.19(±0.3097)
12.04(±0.1971)
16.29(±0.1127)
15.85(±1.2018)
11.91(±0.0077)
0.5001(±0.0009)
0.0525(±0.0008)
0.0702(±0.0005)
0.0716(±0.0094)
0.0500(±0.0000)
77.85%(±0.0016)
94.87%(±0.0008)
93.09%(±0.0005)
92.97%(±0.0056)
95.02%(±0.0000)
60.38(±0.3685)
13.48(±0.1653)
13.78(±0.2439)
10.64(±0.4014)
11.93(±0.1496)
0.5000(±0.0010)
0.0074(±0.0003)
0.0090(±0.0004)
−0.0062(±0.0020)
0.0000(±0.0000)
65.20%(±0.0014)
94.31%(±0.0007)
94.15%(±0.0010)
95.36%(±0.0019)
95.01%(±0.0006)
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
Table 3. Agent and auctioneer statistics for One vs. Many optimal scenarios.
FPSB
ZI-C
ZIP S1
ZIP S2
GD
SPSB
ZI-C
ZIP S1
ZIP S2
GD S
664
665
666
667
668
669
670
671
672
673
674
675
676
Agent
Profit
Margin
Auctioneer
Efficiency
Optimal agents
Profit
1.39(±0.3421)
11.53(±0.7215)
11.49(±0.7788)
11.82(±0.6622)
0.500(±0.0036)
0.051(±0.0026)
0.063(±0.0019)
0.055(±0.0042)
94.81%(±0.0002)
95.02%(±0.0001)
94.97%(±0.0001)
95.02%(±0.0000)
12.43(±0.0165)
11.92(±0.0301)
12.04(±0.0284)
11.89(±0.0371)
94.56%(±0.0007)
94.99%(±0.0007)
94.99%(±0.0005)
95.10%(±0.0007)
13.06(±0.1749)
11.99(±0.1625)
11.98(±0.1350)
11.70(±0.1697)
1.16(±0.3930)
0.500(±0.0037)
11.64(±1.0835)
0.0058(±0.0015)
11.76(±0.9946)
0.007(±0.0016)
11.10(±0.8939) −0.0109(±0.0067)
– In FPSB auctions, the GD S agent makes a profit that is not significantly less than that
made by the optimal agents. However, the average margin for GD S is higher than the
optimal margin. This apparent discrepancy can be explained by the fact that GD S does
not in fact attempt to form a linear bid function. Instead it concentrates its memory
resources on the most profitable areas, which is consistent with Figure 4. Thus GD S
bid at the optimal for higher values where there is a greater chance of winning. For low
values the agent has very little information about the best bid since it is very unlikely
to win, and hence may make suboptimal bids.
– In SPSB auctions, the GD agent learns a margin that is significantly different to the
optimal and hence makes a significantly worse profit. Furthermore, GD is outperformed by both ZIP agents. In contrast to ZIP, the GD agent is too fearful and overbids.
This is demonstrated by the fact the margin for GD SPSB is often negative, meaning
they often bid above their value and hence make a loss if they win.
0.5
µ
0.4
0.3
0.2
0.1
0
–0.1
–0.2
–0.3
–0.4
–0.5
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Auction
Figure 8. Example of the co-adaptation towards the optimal strategy of 6 ZIP S1 agents in 5000 FPSB auctions.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
The suboptimal performance of the ZIP agents is caused by the fact that, unless the
population is very large, when an agent is bidding close to optimal it is highly unlikely to
encounter a scenario where it loses but could have won, and thus is less likely to adjust
it’s margin downward. A detailed evaluation of the behaviour of ZIP and GD agents in
auctions against optimal agents is given in Toft and Bagnall [47].
The difference in performance of the GD agents in FPSB and SPSB auctions can be
explained by the different information available in each type of auction. In FPSB the
GD agent knows the winning bid and can hence form an accurate estimate of the belief
and reward functions. However, in SPSB the agent has to estimate the winning bid based
on its own wins. The occasions when the agent wins and makes a large profit tend to
skew the estimated reward function and make the agent bid too high, to the point it often
adopts a negative margin.
If we impose an artificial budget constraint and set the maximum bid to the agent’s
value, GD performs as well as the FPSB counterpart (i.e. it finds the optimal strategy).
However, we do not impose this constraint on behaviour. Our aim is to test how well
agent algorithms perform over alternative auction structures. Hence we want to minimise
the auction specific information included in the learning process. There are auction structures where the optimal strategy is to bid over value (e.g. third price sealed-bid auctions),
hence a budget constraint is not always applicable. We wish the agent to learn through
experience that it is suboptimal to bid over value in SPSB with the minimum domain
specific information included in the learning process and without restricting the space of
possible bids it may sample. One of our reasons for using SPSB auctions, despite there
being a strictly dominant optimal strategy independent of the value function, is that it
provides a good test bed for an algorithm. For an algorithm to be considered of wider
general use in more complex auction models, we believe that it should be able to converge to the optimal strategy in scenarios where one is known to exist. A further test we
adopt is that only information that is trivially obvious to human competitors may hard
coded into bidding algorithms. So, for example, we do not allow negative bids. A budget
constraint is not intuitively obvious to most people competing in SPSB auctions (Field
and experimental studies have observed that human agents sometimes bid above their
value in SPSB [26, 31]).
Table 4. Agent and auctioneer statistics for One vs. Many ZI-C scenarios.
Agent
Profit
FPSB
ZIP S1
ZIP S2
GD
OPT
SPSB
ZIP S1
ZIP S2
GD
OPT
Margin
Auctioneer
Efficiency
ZI-C agents
Profit
0.1635(±0.0055)
0.1330(±0.0031)
0.1351(±0.0051)
0.0500(±0.0000)
78.80%(±0.0018)
79.24%(±0.0015)
78.80%(±0.0018)
80.66%(±0.0017)
26.16(±0.3087)
25.29(±0.3163)
25.72(±0.3669)
23.21(±0.3411)
207.25(±6.0608) 0.0693(±0.0054)
178.15(±5.0170) 0.1503(±0.0037)
194.94(±6.8186) −0.0048(±0.0211)
217.02(±6.4689) 0.0000(±0.0000)
67.81%(±0.0018)
66.93%(±0.0017)
68.70%(±0.0025)
68.44%(±0.0019)
47.25(±0.5041)
51.60(±0.4853)
42.93(±0.9232)
43.93(±0.5531)
96.59(±3.7295)
91.66(±3.7990)
98.11(±3.0139)
47.97(±1.2700)
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
708
709
710
711
The experiments indicate that when competing against optimal agents a memory based
approach is superior when the agent has sufficient information available. They also show
that memory-free ZIP agents are more robust in that that achieve a profit close to the optimal in both scenarios.
712
5.3. Single adaptive agent vs multiple ZI-C agents
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
The third set of experiments involve a single adaptive agent competing against a population of ZI-C agents. ZI-C agents follow a greedy strategy, in that they will on average
bid much lower than value. We can liken a population of predominantly ZI-C agents to
a population of implicitly “colluding" agents. The fact that they all bid much lower than
their value means that the winner benefits by paying a lower price. In a homogeneous
population of ZI-C agents each agent will win as often as any other agent, and hence they
will all make a greater profit than agents following an optimal strategy. This means that
the market is much less efficient as shown in Table 4. It also means that an adaptive agent
competing against a population of ZI-C agents should be able to exploit this inefficiency
in order to gain larger profits for itself.
The results in Table 4 show that both ZIP and GD agents are able to modify their bidding in FPSB auctions to exploit the greedy strategy of ZI-C agents to make higher a
profit. In FPSB auctions GD makes a significantly greater profit than ZIP S1 , and ZIP S1
makes a significantly greater profit than ZIP S2 . GD does better because it is better able
to approximate the optimal non-linear bid function for competing against ZI-C (shown in
Figure 1). Figure 9 shows the ZIP S1 and ZIP S2 strategy in relation to the optimal for values greater than 0.5. The lower margin of ZIP S2 means it approximates the optimal strategy better for low values. ZIP S1 is better at exploiting the ZI-C agents for high values.
0.9
0.85
0.8
0.75
Bid
0.7
0.65
0.6
0.55
ZIPs2
0.5
ZIPs1
Optimal
0.45
0.4
0.5
0 .6
0.7
0 .8
0.9
1
Value
Figure 9. Bid functions of ZIP agents compared to the optimal strategy when competing against 19 ZI-C agents
in FPSB auctions.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
Both ZIP agents win more auctions than would be expected if agents won an equal proportion: on average ZIP S1 won 14.08% and ZIP S2 16.3% of the final 5000 auctions. The
OPT agent does worse than the adaptive agents, but still performs better than the ZI-C
agents. This reinforces the fact that the ZI-C strategy is not an equilbrium.
In SPSB auctions, the optimal strategy is to bid value, even against ZI-C agents. As
with the first set of SPSB experiments, none of the adaptive agents find a strategy as good
as the optimal, and ZIP S1 outperforms GD S . ZIP S2 now does worse of the three adaptive
agent architectures because it adopts too large a margin and hence misses opportunities
to win auctions. The GD agent adopts a margin much closer to the optimal, but makes
less profit because of the tendency to bid higher than value and risk making a loss. On
average the GD won 30% of auctions, but on these wins it made losing transactions totalling 15.38. Without these losses, GD would have made profits not significantly different
to OPT.
Results show that our adaptive agent strategies are able to exploit greedy agents in
first and second price auctions to make large profits. GD better approximates the optimal
strategy in both auction types, but makes less profit in SPSB because of the tendency to
bid higher than value. In addition to testing the learning abilities of the adaptive agents,
these experiments demonstrate some of the intrinsic characteristics of FPSB and SPSB
auctions. Faced with more sophisticated agents, the optimal strategy for FPSB auctions
can be complex and hard to derive. In contrast, the optimal strategy for SPSB is independent of the bidding function used by the opponents (assuming independence between bidders and auctions). On the other hand, the efficiency results in Table 4 indicate that SPSB
auctions are less efficient than FPSB auctions with a population of ZI-C agents.
754
5.4. Head to head experiments
755
756
757
758
759
760
761
762
The experiments with adaptive agents in homogeneous populations and populations of
non-adaptive agents allowed us to examine how well the agents perform in scenarios
where there is a known optimum strategy. The next step requires examining how the
adaptive agents perform in auctions against other adaptive agents. In these cases the environment is dynamic, and there is usually no analytic optimal strategy. Following the
methodology of related research [46], we examine One vs. Many and Many vs. Many
scenarios. In the following sections we omit results for ZI-C agents as they offer no further insights to those presented in Section 5.3.
763
764
5.4.1. One vs Many agents. Tables 5–7 present the results of the One vs. Many experiments. Against a population of ZIP S1 agents we observe that:
765
766
767
768
769
770
771
– OPT agents perform better than all other agents in both FPSB and SPSB. The symmetric equilibrium strategy is a best response to many ZIP S1 . This confirms the observation made in Section 5.1 that populations of many ZIP S1 agents adopt a strategy close
to the symmetric equilibrium.
– In both FPSB and SPSB auctions the ZIP S2 agent adopts a greedy strategy and as a
result scores lower profits than its ZIP S1 counterparts. This result is consistent with
Section 5.2 where ZIP S2 agents were also observed to be greedy.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
Table 5. Agent and auctioneer statistics for One vs. Many ZIP S1 scenarios.
FPSB
ZIP S2
GD
OPT
SPSB
ZIP S2
GD
OPT
Agent
Profit
Margin
Auctioneer
Efficiency
Many ZIP S1
Profit
12.07(±0.8642)
12.29(±0.7242)
12.37(±0.7966)
0.0637(±0.0021)
0.0560(±0.0053)
0.0500(±0.0000)
94.77%(±0.0007)
94.88%(±0.0008)
94.88%(±0.0008)
12.28(±0.1830)
12.00(±0.2008)
11.99(±0.1999)
13.43(±1.0841)
12.68(±1.4329)
13.61(±1.0744)
0.0086(±0.0017)
0.0122(±0.0089)
0.0000(±0.0000)
94.31%(±0.0007)
94.25%(±0.0009)
94.36%(±0.0008)
13.48(±0.1872)
13.65(±0.2294)
13.34(±0.1897)
Table 6. Agent and auctioneer statistics for One vs. Many ZIP S2 scenarios.
FPSB
ZIP S1
GD
OPT
SPSB
ZIP S1
GD
OPT
Agent
Profit
Margin
Auctioneer
Efficiency
Many ZIP S2
Profit
16.36(±0.8663)
17.11(±0.9362)
16.22(±0.7085)
0.0611(±0.0029)
0.0562(±0.0052)
0.0500(±0.0000)
93.16%(±0.0005)
93.23%(±0.0005)
93.27%(±0.0005)
16.09(±0.1202)
15.87(±0.1533)
15.81(±0.1240)
13.83(±1.1325) 0.0077(±0.0017)
13.74(±1.1371) −0.0016(±0.0064)
14.09(±1.0283) 0.0000(±0.0000)
94.19%(±0.0010)
94.21%(±0.0009)
94.22%(±0.0010)
13.68(±0.2448)
13.62(±0.2189)
13.59(±0.2437)
772
773
– In FPSB, GD does better than the average of the population of ZIP S1 and not significantly worse than the OPT agents. In SPSB the GD agent performs worse than ZIP S2 .
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
Table 6 shows the results for many ZIP S2 . As we previously observed in Section 5.1,
many ZIP S2 agents adopt a greedy strategy. As a result of this, competing single agents
are able to exploit ZIP S2 agents in both FPSB and SPSB auctions. In FPSB auctions GD
agents make the greatest profit. This reinforces our previous observation in Section 5.3
that GD are the best agents at exploiting populations of greedy agents. In SPSB auctions
the OPT strategy is of course dominant. ZIP S1 achieves a profit closer to OPT than GD.
Table 7 shows the results for many GD agents. They show that a reactive agent can
generally do relatively better than the population of GD agents, although in FPSB this
results in lower total profits to the agents.
These experiments have generally reinforced our previous observations that, firstly,
ZIP S2 is dominated by ZIP S1 and GD S , and secondly, that GD agents perform better in
FPSB auctions and ZIP S1 do better in SPSB auctions. However, the position is not completely clear cut. For example, in FPSB auctions a single GD agent does better than a
population of ZIP S1 agents (Table 5), but conversely a single ZIP S1 agent makes a larger
profit than a population of many GD agents (Table 7). This trade-off between the two
agents is investigated further in Section 5.4.2.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
Table 7. Agent and auctioneer statistics for One vs. Many GD scenarios.
FPSB
ZIP S1
ZIP S2
OPT
SPSB
ZIP S1
ZIP S2
OPT
Agent
Profit
Margin
Auctioneer
Efficiency
Many GD
Profit
15.82(±1.0680)
16.16(±1.4212)
19.83(±0.9362)
0.0597(±0.0031)
0.0703(±0.0023)
0.0500(±0.0000)
93.35%(±0.0034)
93.18%(±0.0039)
91.83%(±0.0023)
15.00(±0.7260)
15.38(±0.8050)
19.19(±0.5489)
12.16(±1.0469)
11.80(±1.0705)
11.92(±1.0425)
0.0115(±0.0023)
0.0084(±0.0018)
0.0000(±0.0000)
95.03%(±0.0018)
95.13%(±0.0016)
95.24%(±0.0015)
11.33(±0.3264)
11.13(±0.3802)
10.81(±0.3432)
Table 8. Agent and auctioneer statistics for Many vs. Many FPSB scenarios.
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
A vs. B
Agents A
Profit Margin
Agents B
Profit Margin
Auctioneer
Efficiency
Many
Many
Many
Many
Many
Many
14.31
11.88
13.39
11.71
13.01
11.89
14.02
12.18
14.36
12.04
13.54
11.82
93.96%
94.91%
94.10%
94.98%
94.36%
95.02%
ZIP S1 vs. Many ZIP S2
ZIP S1 vs. Many GD
ZIP S2 vs. Many GD
ZIP S1 vs. Many Optimal
ZIP S2 vs. Many Optimal
Optimal vs. Many GD
0.057
0.052
0.066
0.052
0.065
0.050
0.067
0.059
0.058
0.050
0.050
0.059
5.4.2. Many vs. Many agents. To investigate further the relative ability of the adaptive agents to perform in mixed populations, we examine how changing the proportion of
each agent in the population changed behaviour. Tables 8 and 9 give the results for FPSB
and SPSB auctions respectively.
The FPSB results shown in Table 8 demonstrate that firstly ZIP S2 is outperformed by
all competing strategies as a result of greedy bidding (too high a margin) and secondly
that GD agents make a significantly larger profit than both of the ZIP S agents.
When considered in conjunction with the results in Section 5.4.1, we can conclude that
GD agents perform best in populations where there are competitive strategies from which
they may learn. When competing with a population of duplicate GD agents they tend
towards a suboptimal equilibrium which is potentially exploitable by an agent using a
different learning mechanism, as demonstrated by ZIP S1 against 19 GD S (see Table 7).
ZIP S1 is only able to exploit GD S when it is in a minority. When the number of ZIP S1
in the population rises above two GD S improves its performance and makes a profit significantly higher than the ZIP S1 agents. Figure 10 shows the average difference in profit
(with standard error bars) between ZIP S1 and GD S for populations of between 1 and 19
ZIP S1 agents. It shows that the ZIP S1 agents makes a greater profit when there are only
one or two in the population, but with three or more ZIP S1 the GD S does better. The GD S
advantage remains constant when approximately 20% or more of the population is ZIP S1 .
From the SPSB results given in Table 9 it is clear that the GD agents are outperformed
by the ZIP S . As in previous experiments, GD agents tend to be too fearful and hence bid
too high.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
0.4
GD P rofit- ZIP P rofit
0.2
0
0
2
4
6
8
10
12
14
16
18
20
-0.2
-0.4
-0.6
-0.8
-1
Number of ZIP agents
Figure 10. Difference in average profit for GD S and ZIP S1 for a increasing number of ZIP S1 agents in FPSB
auctions. The bars represent one standard deviation of the estimate of the mean difference.
812
5.5. Free for all
813
814
815
The final experiments involved a mixed population of ZIP S1 , ZIP S2 , GD S and OPT
agents. Table 10 shows the results for FPSB and SPSB populations with 5 of each agent.
These results confirm our previous observations that
816
817
818
– In FPSB auctions, GD S makes the greatest profit
– In SPSB auctions, ZIP S1 and ZIP S2 outperform GD. There is no significant difference
between ZIP S1 and ZIP S2 .
819
820
821
822
823
824
825
All the adaptive agents learn a good strategy and the market tends towards the optimal
efficiency of 95% (the efficiency was on average 94.61% in FPSB auctions and 94.92%
in SPSB auctions). However, despite the theoretical revenue equivalence of FPSB and
SPSB auctions, in our experiments there was a significant difference in the efficiency, and
hence the revenue, between the two auction formats. A similar difference in behaviour of
human agents in the two auction formats has been observed [31]. In our experiments it is
caused by the GD agent bidding too high.
826
5.6. Alternative value distributions
827
828
829
830
831
832
In common with the majority of auction studies [26], our experiments have been conducted using a uniform value distribution. However, it could be maintained that the performance of the algorithms (in both absolute and relative terms) is simply an artefact of
the uniform values. To test the robustness of the results, we repeated a selection of the
experiments with two alternative beta value distributions. The first, Beta(2,4), has a left
skew and the second, Beta(4,2) has a right skew (see Figure 11 for an example).
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
0
1
0
1
Figure 11. Example Beta(2,4) and Beta(4,2) value distributions.
Table 9. Agent and auctioneer statistics for Many vs. Many SPSB scenarios.
A vs. B
Agents A
Profit Margin
Agents B
Profit Margin
Auctioneer
Efficiency
Many
Many
Many
Many
Many
Many
13.65
14.20
12.11
12.48
12.50
10.52
13.49 0.009
13.09 0.000
12.10 −0.004
12.68 0.000
12.85 0.000
10.00 −0.011
94.25%
94.14%
94.83%
94.71%
94.65%
95.61%
ZIP S1 vs Many ZIP S2
ZIP S1 vs Many GD
ZIP S2 vs Many GD
ZIP S1 vs Many Optimal
ZIP S2 vs Many Optimal
Optimal vs Many GD
0.008
0.011
0.008
0.006
0.008
0.000
833
834
835
The experiments we ran are summarised in Tables 11 and 12. This subset of the previously conducted experiments was selected to demonstrate that the following general conclusions drawn with a uniform distribution are still valid with a beta value function.
836
837
838
839
1. In Section 5.2 we conclude that ZIP S1 finds a close to optimal strategy against a population of optimal agents in SPSB auctions, whereas GD agents tend to overbid and
hence bid suboptimally. Table 11 shows that this pattern of results is also demonstrated
when Beta(2,4) and Beta(4,2) value distributions are used.
Table 10. Profit results for experiments with 5
of each agent type.
FPSB
ZIP S1
ZIP S2
GD S
OPT
SPSB
ZIP S1
ZIP S2
GD S
OPT
Journal: AGNT MS.: PIPS: NO00004948
Average profit
Percentage of total
12.59
12.31
12.97
12.88
24.80%
24.26%
25.55%
25.38%
11.89
11.88
11.67
12.14
24.99%
24.97%
24.53%
25.51%
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
Table 11. Results for a single ZIP S1 and GD S competing against 19 optimal agents in SPSB auctions
when all agents have Beta value distributions.
Agent
Profit
Beta(2,4)
ZIP S1
GD
Beta(4,2)
ZIP S1
Margin
21.284 (±1.852)
0.011 (±0.004)
19.248 (±1.927) −0.011 (±0.009)
9.171 (±0.788)
0.006 (±0.001)
Auctioneer
Efficiency
Optimal agents
Profit
88.15%(±0.0013)
88.33%(±0.0013)
21.405 (±0.269)
21.086 (±0.282)
96.02%(±0.0005)
9.363 (±0.120)
Table 12. Results for 10 vs. 10 scenarios with Beta value distributions.
10 ZIP S1 vs. 10 GD
ZIP S1
Profit
Margin
GD
Profit
FPSB Beta(2,4)
FPSB Beta(4,2)
SPSB Beta(2,4)
19.964
9.219
22.118
0.118
0.042
0.017
22.823
10.001
20.217
Margin
0.085
0.03
−0.012
Auctioneer
Efficiency
87.61%
95.78%
87.89%
840
841
842
2. In Section 5.4.2 we observe that GD agents make a significantly larger profit than ZIP
agents in mixed population FPSB auctions. The results shown in Table 12 demonstrate
that GD also outperforms ZIP when values from a Beta distribution are used.
843
844
845
Although not exhaustive, the results with values following a Beta distribution give us
confidence that the conclusions drawn in Section 5 will hold for a wide class of alternative value functions.
846
6. Conclusions
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
In this paper we have presented a model of the class of single seller auctions and have
described how we have adapted popular agent architectures for the CDA to single seller
auctions: we include two versions of Cliff’s ZIP algorithm [9], ZIP S1 and ZIP S2 , and
a hybridised version of the Gjerstad Dickhaut agent [13], GD S . Single seller auctions
present a quite different learning challenge to that of double auctions. This is due to
the differences in information revelation and allocation processes in these auction types.
We describe two commonly used single seller auction structures, first price and second
price sealed-bid auctions. We then present results from an extensive series of experiments
(approximately 80 million auctions were simulated for the results presented in this paper)
designed to evaluate the ability of the three different learning mechanisms to learn a strategy. The first series of experiments were testing how well each agent type could learn in
a more controlled, static environment with a known optimal strategy. The second set of
experiments involved evaluating how agents performed in competition with each other.
One of the objectives of agent research into economic scenarios is to gain insight into
the market mechanisms being modelled. Our experiments have reinforced observations
made in the auction theory literature. Firstly, under the PVM, the FPSB and SPSB market
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
mechanisms drive very simple learning agents towards the symmetric equilibrium solution. Secondly, the experiments with a population of ZI-C agents demonstrated that with
irrational agents, a FPSB is more efficient than a SPSB, and that it is easier to learn to
exploit irrational agents in a SPSB due to the dominant strategy of bidding value. Thirdly,
under controlledconditions the theoretical revenue equivalence of FPSB and SPSB can be
observed, but in auctions where there are many different kinds of learning agent competing, SPSB auctions are more efficient.
The other objective of this research is to assess how well alternative agent architectures perform in sealed-bid auction simulations. Against a population of agents following the optimal strategy, ZIP S agents learn a good, but suboptimal, strategy in both FPSB
and SPSB auctions. GD S agents learn an optimal strategy in FPSB but not in SPSB auctions unless a budget constraint is imposed. ZIP S and GD S exhibit the same pattern of
performance against populations of (suboptimal) ZI-C agents. The adaptive agents mirror
the tendency in experimental studies of human competitors to bid above the dominant or
Nash equilibrium [26].
In a homogeneous population, ZIP S1 agents co-adapt towards the symmetric equilibrium solution in FPSB and SPSB auctions, whereas ZIP S2 and GD S converged to a suboptimal – but mutually more profitable – equilibrium. The performance of ZIP S1 in single seller auctions directly mirrors the fact that ZIP agents tend towards the market equilibrium in CDA auctions (see [9]). The convergence of ZIP S2 and GD S to a more profitable equilibrium may seem a desirable outcome, but the equilibrium they reach is not stable and if another agent were to enter the market it could exploit this fact to make extra
profits.
ZIP S1 outperforms ZIP S2 in the large majority of scenarios considered. The major
difference between ZIP S1 and ZIP S2 is that, unlike ZIP S1 , ZIP S2 uses the magnitude of
profits made to alter strategy. The fact that using this extra information to alter strategy results in a worse solution is in itself interesting. ZIP S2 tend to overreact to a single extreme result, and as such converge too quickly and get stuck in a local optimum
in the strategy space. The fact ZIP S1 ignores the level of profit means it can explore a
greater proportion of the strategy space, ignoring what would seem like highly important recent information, in order to get closer to the global optimal. Unlike ZIP S2 , neither GD nor ZIP S1 require information about the distribution function, F, and the number of agents, N , and yet GD nor ZIP S1 consistently outperform ZIP S2 . This suggests
that, at least with a uniform distribution of values, the information required to theoretically derive the optimal strategy is not necessary to learn the optimal through experience.
Indeed, the extra information about what would have been optimal in the past may actually reduce the quality of the overall strategy.
In competition against each other, the general pattern of performance observed in competition against the non-adaptive agents is repeated. ZIP S1 performs robustly, producing
good results in FPSB and SPSB auctions, both in relation to a known optimal strategy
and to other agents. ZIP S1 performs better than the other agents in mixed SPSB auctions.
GD S generally does better than ZIP S1 in FPSB, reaching the optimal strategy when one
is known to exist and performing better than the other agents in the mixed agent auctions, except for the case when a small number of ZIP S1 agents can exploit the tendency
of a population of GD S to converge to a suboptimal solution. In SPSB auctions, GD S is
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
outperformed by ZIP S1 , because of the fact GD S is required to estimate it’s belief and
reward function from a set of results that may be unrepresentative of the true trends.
Generally, GD S agents perform best in populations where there are competitive strategies from which they may learn, whereas ZIP S1 tend to be more competitive, but in some
environments cannot fine tune their strategy to do quite as well as agents with a greater
memory resource.
This work could be extended in many ways. Given that GD S has a general trend to
learn a fearful strategy (i.e. overbid) and ZIP S1 tend to be too greedy (i.e. underbid),
an obvious extension would be to attempt to marry GD S and ZIP S1 using an ensemble
approach. Currently we are experimenting with alternative architectures to do this. We
have also experimented with alternative methods for GD S to estimate the belief function
in SPSB auctions.
There are also many market scenarios we could consider. Porter and Shoham [35]
show that the symmetric equilibrium of FPSB auctions is robust even when competing
against cheating agents. It would be interesting if this robustness is also evident in populations of learning agents. Another important scenario worth investigation is cases where
agents may enter or leave the market place. The greater dynamism this scenario presents
would provide a new challenge for the agent architectures. We will also consider alternative auction structures such as English auctions under models other than the PVM.
Recently there have been many architectures proposed for CDA that take into account the
timing of bids (see, for example [19, 45]). These methods could be used as a basis for
designing agents for single seller auctions that attempt to model competitors’ strategies.
We would also like to bridge the gap between our work and the body of research into
agents for multiple English auctions and combinatorial auctions (for example, see [1, 25,
44]). In order to gain widespread credibility as a potential real world application, we
believe that it is a necessary condition for any architecture proposed for multiple or combinatorial auctions to be shown to perform well in simple auctions against known optimal
strategies. Our approach will allow us to test any new algorithms developed on series of
problems of increasing complexity.
937
Notes
938
939
940
1. A Walrasian tatonnement mechanism is a protocol by which an auctioneer attempts to engineer convergence
to equilibria.
2. A Marshallian path is a sequence of trades such that the last trade is necessarily at the equilibrium.
941
References
942
943
944
945
946
947
948
1. P. Anthony and N. R Jennings, “Developing a bidding agent for multiple heterogeneous auctions,” ACM
Trans. Internet Technol., vol. 3, no. 3, pp. 185–217, 2003.
2. D. Ariely, A. Ockenfels, and A. E. Roth, An experimental analysis of ending rules in internet auctions. Published by Max Planck Institute for Research into Economic Systems, Strategic Interaction Group, Discussion Papers on Strategic Interaction, 2002.
3. A. J. Bagnall and I. E Toft, “An agent model for first price and second price private value auctions,” in Proceedings of the 6th International Conference on Artificial Evolution, pp. 145–156, 2003.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
AUTONOMOUS ADAPTIVE AGENTS FOR SINGLE SELLER SEALED BID AUCTIONS
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
4. A. J. Bagnall and I. E Toft, “Zero intelligence plus and Gjerstad–Dickhaut agents for sealed bid auctions,”
in Workshop on Trading Agent Design and Analysis, part of international conference on autonomous agents
and multiagent systems (AAMAS-2004), pp. 59–64, 2004.
5. F. Brandt and G. Weiss, “Antisocial agents and Vickrey auctions,” in Revised Papers from the 8th International Workshop on Intelligent Agents VIII, Springer-Verlag, pp. 335–347, 2002.
6. P. J. Brewer, M. Huang, B. Nelson, and C. R. Plott, On the Behavioral Foundations of the Law of Supply
and Demand: Human Convergence and Robot Randomness California Institute of Technology, Social Science Working Paper 1079, 1999.
7. S.-F. Cheng, E. Leung, K. M. Lochner, K. O’Malley, D. M. Reeves, L. J. Schvartzman, and M. P. Wellman,
“Walverine: a Walrasian trading agent,” Decision Support Syst., vol. 39, pp. 169–184, 2005.
8. S. H. Clearwater, Market-Based Control: A Paradigm for Distributed Resource Allocation. World Scientific
Publishing Co., Inc., 1996.
9. D. Cliff and J. Bruten, Minimal-Intelligence Agents for Bargaining Behaviours in Market-Based Environments. Technical Report, HP Labs, June 1997.
10. K. E. Drexler and M. S. Miller, “Incentive engineering for computational resource management,” in B. A.
Huberman (ed.), The Ecology of Computation, 1988.
11. J. D. Farmer, P. Patelli, and I. I. Zovko, The Predictive Power of Zero Intelligence in Financial Markets
AFA 2004 San Diego Meetings, 2004.
12. M. A. Gibney, N. R. Jennings, N. J. Vriend, and J. M. Griffiths, “Market-based call routing in telecommunications networks using adaptive pricing and real bidding,” in Proceedings of the Third International Workshop on Intelligent Agents for Telecommunications Applications, pp. 50–65, 1999.
13. S. Gjerstad and J. Dickhaut, “Price formation in double auctions,” Games Econ. Behav., vol. 22, no. 1, pp.
1–29, 1998.
14. D. K. Gode and S. Sunder, “Allocative efficiency of markets with zero intelligence traders: market as a partial substitute for individual rationality,” J. of Political Econ., vol. 101, no. 1, pp. 119–137, 1993.
15. A. Greenwald and P. Stone, “Autonomous bidding agents in the trading agent competition,” IEEE Internet
Comput., vol. 5, no. 2, pp. 52–60, 2001.
16. J. Grossklags, C. Schmidt, and J. Siegel, Dumb Software Agents on an Experimental Asset Market Working
Paper, School of Information and Management Systems, UC Berkeley, 2000.
17. M. He, H. Leung, and N. R. Jennings, “A fuzzy logic based bidding strategy for autonomous agents in continuous double auctions,” IEEE Trans. Knowledge Data Eng., vol. 15, no. 6, pp. 1345–1363, 2003.
18. M. He and N. R. Jennings, “SouthamptonTAC: an adaptive autonomous trading agent,” ACM Trans. Internet Technol., vol. 3, no. 3, pp. 218–235, 2003.
19. M. He and N. R. Jennings, “Designing a successful trading agent: a fuzzy set approach,” IEEE Trans. Fuzzy
Syst., vol. 12, no. 3, pp. 389–410, 2004.
20. M. He, N. R. Jennings, and H. Leung, “On agent-mediated electronic commerce,” IEEE Trans. Knowledge
Data Eng., vol. 15, no. 4, pp. 985–1003, 2003.
21. W. Hsu and V. Soo, “Market performance of adaptive trading agents in synchronous double auctions,” in
Proceedings of the 4th Pacific Rim International Workshop on Multi-Agents, Intelligent Agents, SpringerVerlag, pp. 108–121, 2001.
22. J. Hu and M. P. Wellman, “Conjectural equilibrium in multiagent learning,” Machine Learning, vol. 33,
1998.
23. J. Hu and M. P. Wellman, “Online learning about other agents in a dynamic multiagent system,” in Proceedings of the Second International Conference on Autonomous Agents, ACM Press, pp. 239–246, 1998.
24. B. Huberman and S. H. Clearwater, “A multiagent system for controlling building environments,” in Proceedings of the First International Conference on Multiagent Systems, pp. 171–176, 1995.
25. N. R. Jennings, M. He, and A. Prugel-Bennett, “An adaptive bidding agent for multiple english auctions:
a neuro-fuzzy approach,” in Proceedings of the IEEE Conference on Fuzzy Systems, Budapest, Hungary,
2004.
26. J. H. Kagel and A. E. Roth (eds.), The Handbook of Experimental Economics. Princeton University Press,
1995.
27. P. Klemperer, “Auction theory: a guide to the literature,” J. Econ. Surveys, vol. 13, no. 3,
pp. 227–286, 1999.
28. V. Krishna, Auction Theory, Academic Press: San Diego, California, 2002.
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34
BAGNALL AND TOFT
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
29. L. Li and S. F. Smith, “Speculation agents for dynamic multi-period continuous double auctions in B2B
exchanges,” in Proceedings of the 37th Annual Hawaii International Conference on System Sciences
(HICSS’04), 2004.
30. M. Luck, P. McBurney, and C. Priest, Agent Technology: Enabling Next Generation Computing. A Roadmap for Agent Based Computing. AgentLink, England, 2003.
31. D. Lucking-Reiley, “Using field experiments to test equivalence between auction formats: magic on the internet,” Am. Econ. Rev., 1999.
32. R. P. McAfee and J. McMillan, “Auctions and bidding,” J. Econ. Literature, vol. 25, pp. 699–738, 1987.
33. S. Park, E. H. Durfee, and W. P. Birmingham, “An adaptive agent bidding strategy based on stochastic modelling,” in Proceedings of the Third International Conference on Autonomous Agents, pp. 147–153, 1999.
34. C. W. Pawlowski, A. M. Bell, S. Crawford, W. A. Sethares, and C. Finn, “An adaptive agent bidding strategy based on stochastic modelling,” in Proceedings of 31st International Conference on Environmental Systems, 2001.
35. R. Porter and Y. Shoham, “On cheating in sealed-bid auctions,” J. Decision Support Syst., vol. 35, pp. 41–
54, 2004.
36. C. Preist, A. Byde, and C. Bartolini, “Economic dynamics of agents in multiple auctions,” in Proceedings of
the 5th International conference on Autonomous Agents, 2001, pp. 545–551.
37. C. Preist and M. van Tol, “Adaptive agents in a persistent shout double auction,” in Proceedings of the First
International Conference on Information and Computation Economies, ACM Press, pp. 11–18, 1998.
38. A. E. Roth and A. Ockenfels, “Last-minute bidding and the rules for ending second-price auctions: evidence
from eBay and Amazon on the internet Am. Econ. Rev., 2003 (Forthcoming).
39. J. Rust, J. Miller, and R. Palmer, “The double auction market institution: a survey,” in D. Friedman and J.
Rust, (eds.), The Double Auction Market Institutions, Theories, and Evidence: Proceedings of the Workshop
on Double Auction Markets Held June, 1991 in Santa Fe, New Mexico. Santa Fe Institute, Addison-Wesley,
1991.
40. J. Rust, J. Miller, and R. Palmer, “Behaviour of trading automata in a computerized double auction market,”
in D. Friedman and J. Rust, (eds.), The Double Auction Market Institutions, Theories, and Evidence: Proceedings of the Workshop on Double Auction Markets Held June, 1991 in Santa Fe, New Mexico. Santa Fe
Institute, Addison-Wesley, 1991.
41. R. Smith, “The Contract Net Protocol: high-level communication and control in a distributed problem
solver,” IEEE Trans. Computers, vol. C-29, no. 12, pp. 1104–1113, 1980.
42. V. Smith, “An experimental study of market behavior,” J. Polit. Econ., vol. 70, no. 2, pp. 111–137, 1962.
43. M. D. Springer, The Algebra of Random Variables, John Wiley & Sons, 1979.
44. P. Stone, M. L. Littman, S. Singh, and M. Kearns, “ATTac-2000: an adaptive autonomous bidding agent,” J.
Artif. Intell. Res., vol. 15, pp. 189–206, 2001.
45. G. Tesauro and D. Bredin, “Strategic sequential bidding in auctions using dynamic programming,” in AAMAS, Bolgna, Italy, 2002. ACM.
46. G. Tesauro and R. Das, “High-performance bidding agents for the continuous double auction,” in Third
ACM Conference on Electronic Commerce, pp. 206–209, 2001.
47. I. E. Toft and A. J. Bagnall, Adaptive Agents for Simulated Sealed Bid Auctions. Technical Report CMPC04-03, School of Computing Sciences, University of East Anglia, 2004.
48. W. Vickrey, “Counterspeculation, auctions, and competitive sealed tenders,” J. Finance, vol. 16, pp. 8–37,
1961.
49. J. M. Vidal and E. H. Durfee, “The impact of nested agent models in an information economy,” in Proceedings of the Second International Conference on Multi-Agent Systems, pp. 377–384, 1996.
50. P. Vytelingum, R. K. Dash, E. David, and N. R. Jennings, “A risk-based bidding strategy for continuous double auctions,” in Proceedings of the 16th European Conference on Artificial Intelligence, Valencia,
Spain, 2004.
51. C. A. Waldsburger, T. Hogg, B. Huberman, J. O. Kephart, and S. Stornetta, “Spawn: a distributed computational economy,” IEEE Trans. Software Eng., vol. 18, no. 2, pp. 103–117, 1992.
52. M. Wooldridge and N. R. Jennings, “Intelligent agents: theory and practice,” Knowledge Eng. Rev., vol. 10,
no. 2, pp. 115–152, 1995.
53. G. Zuckerman, “Google shares prove winners at least for a day,” Wall Street J., 20 Aug. 2004: (C1, C3).
Journal: AGNT MS.: PIPS: NO00004948
TYPESET DISK
LE CP Disp.: 8/10/2005 Pages: 34

Download Report

Autonomous Adaptive Agents for Single Seller

Paperzz.com

Your Paperzz