Decision-theoretic Learning of Agent Models in an Information Economy.

Deision-theoreti Learning of Agent Models in an Information
Eonomy
Christopher H. Brooks
and Edmund
Artiial Intelligene Laboratory
University of Mihigan
Ann Arbor, MI 48109
fhbrooks,durfeegumih.edu
Abstrat
We demonstrate how a produer of information goods
an use a suessively omplex series of models to
learn the preferenes of onsumers eÆiently. We provide metris for estimating the preision, auray, and
learning omplexity of dierent models, thereby providing a produer with the metris needed to apply
deision theory in seleting a sequene of models. We
present experimental results demonstrating the eetiveness of this approah, and disuss urrent researh
on extending this idea to learning preferenes over ategories or strategies of other agents.
Introdution
The ombination of eletroni distribution of information goods via the World Wide Web and the use of
automated agents to buy and sell goods on behalf of
human users has led to the study of agent-based information eonomies. Our previous work in this area
has examined the problem of how sellers in an information eonomy an learn the preferenes of a onsumer
population (Brooks et al. 1999; Brooks, Durfee, & Das
2000). In partiular, we study problems where the onsumer population is nonstationary. We assume that
eah information goods produer is interested in maximizing its aggregate prot. Sine a produer will only
have a limited number of iterations to interat with a
onsumer population, it is faed with a dilemma: what
should it learn, and how long should it learn, so as to
maximize aggregate prot?
In this paper, we present a deision-theoreti approah that an information goods produer an use to
eÆiently learn onsumer preferenes. It is based on
the idea of model renement. Roughly speaking, a produer sequentially onstruts a set of models of varying auray and omplexity. It begins with a simple
model and then swithes to more omplex and aurate models as information is aquired. The diÆult
aspet of this problem is determining how to ompare
models. We introdue three metris, auray, preision
and omplexity, for estimating how many data points
are needed to learn a partiular model to a given degree
This
work was supported in part by an IBM University
Partnership Grant and by the National Siene Foundation
under grant IIS-9872057.
Copyright 2001, Amerian Assoiation for Artiial Intelligene (www.aaai.org). All rights reserved.
H. Durfee of auray. Additionally, sine a produer is making
a sequene of modeling deisions, it must be able to
estimate how data aquired in learning one model an
be applied to other models. We also desribe heuristis
that an information goods produer an use to make
this estimation. We show the deision proedure that
a produer should go through to selet a series of models, and present empirial evidene for this strategy's
eetiveness.
Modeling
We assume that a produer has aess to a set of N
artiles that it an oer for sale to a onsumer population. Consumers may hoose the artiles they want
from this set, subjet to the prie shedule the produer
sets. Produers then have the problem of seleting a
prie shedule and pries within that shedule.
In this work, we have onsidered six prie shedules:
linear priing, where onsumers pay a xed prie for
eah artile, pure bundling, where onsumers pay a
xed prie for all N artiles, two-part tari, where onsumers pay a subsription fee plus a per-artile prie,
mixed bundling, where onsumers have a hoie between a per-artile prie and a bundle prie, blok priing, where onsumers pay a per-artile prie of p1 for the
rst j artiles and per-artile prie p2 for the remaining
artiles, and nonlinear priing, where onsumers pay a
dierent prie for eah artile.
We model onsumers using a model originally introdued by Chuang and Sirbu (1997). The model onsists
of two parameters: w, whih indiates a onsumer's
value for its most-preferred artile, and k , whih indiates the fration of the N artiles available for whih
the onsumer has a positive valuation. The valuation
Vj (i) of the ith most-preferred artile by onsumer j is
a linear funtion of these variables, expressed by:
V (j ) =
w (1
0
j
i
j
1 ) if i 1 k N
k N
if i
j
1 > kj N
(1)
Even though a single onsumer's valuation funtion is
quite simple, if onsumers are heterogeneous in either
w or k , the aggregate demand will ontain nonlinearities. If we plot the number of artiles purhased against
onsumer valuation of that quantity of artiles, we an
visualize prie shedules as tting a nonlinear urve,
with prot being the area under that urve. Figure
family. We refer to this as a model's auray. For a
given family of models MF , let M 0 be the model within
this family that maximizes preision over a olletion of
input sets S . We an then dene the auray aM of a
given model M as:
X jM 0 (s) M (s)j
aM = 1
(3)
jM 0(s)j Pr(s)
s2S
Price paid for n articles
Linear Pricing
Pure Bundling
Two-part tariff
Mixed Bundling
Block Pricing
Nonlinear
Number of articles purchased (n)
Figure 1: An illustration of prie shedules as approximations of a nonlinear urve.
1 gives an illustration of this, with onsumer demand
denoted as 'Nonlinear' (the nonlinear priing shedule
an t demand exatly), and other shedules apturing
dierent portions of the available prot.
Model Renement
We refer to the sequential seletion of progressively
omplex models and aurate models as model renement. To desribe model renement, we must introdue
some terminology.
We begin with the onsumer population's reation
funtion. This is a funtion : p ! R. p is a prie
vetor, indiating the amount harged for eah artile
oered. This funtion indiates the amount the onsumer population is willing to pay for any oering. We
refer to a partiular p as an input set. Typially, a produer will only be onerned with modeling a reation
funtion over some partiular input sets.
A model is a representation of this reation funtion.
Eah model has an input set i that maps to an output
in R. All models whih use the same inputs are referred
to as a model family. For example, there is the family of
linear priing models, the family of pure bundling models, and the family of nonlinear priing models. Within
a family, models dier only in the partiular values assigned to their parameters.
Model families have two harateristis of interest.
The rst is preision. Preision indiates how losely
the best model in a family approximates the reation
funtion over a set of relevant input sets. In order to
measure preision, we must know both how dierent a
model is from the reation funtion for eah relevant
input set, and also how likely those input sets are to
our. Let S be a olletion of input sets, and P r(s) be
the probability of a partiular input set s 2 S ourring.
We an then ompute the preision pM F of a model
family MF as:
X jM (s) (s)j
pM F = argmax 1
Pr(s)
(2)
(s)
M 2M F
s2S
In other words, this is the fration of available prot
that the best model in this family an hope to apture
over the relevant input sets. (If S is ontinuous, we
integrate rather than sum.)
We would also like to be able to measure how good a
partiular model is, relative to the optimal model of its
The third fator we must onsider in omparing model
families is the expeted time (measured in the number
of samples) needed to learn a model from that family. We refer to this as a model family's omplexity.
The omplexity C of a model family estimates how
good a solution is (as a fration of optimal), as a funtion of the number of data points observed. Formally,
C (MF; n; S ) = aM says that, after n prie/prot points
are observed using a model M 2 MF with input sets
from S , the auray of M is aM .
Empirially, we have found that a sigmoid funtion
works well for approximating omplexity, at least when
applied to priing data. The prot landsapes tend to
be omposed of large plateaus, dotted with hills that
are easy to loate and asend initially, but whih have
multiple peaks, so we would like to approximate omplexity with a funtion that inreases steeply initially,
then tails o as it approahes an asymptote. We have
had good results when applying the following formula
as an estimate of the solution quality after n iterations:
pM F
(4)
C (MF; n; S ) =
n
1 + e log2 NdMF
where dM F is the dimensionality of the model family.
For example, a produer that knew it would have 20
iterations to interat with a onsumer population ould
use this formula to predit its performane using twopart tari. Two-part tari has a preision of 0.90 for
the problem desribed previously, and its dimensionality is 2. Let us assume that pries are within the
range [0,100℄. Plugging in these numbers, we nd that
C (T wopart; 20; p ) = 0:736. So, in the 20th iteration, a
produer using 2-part tari would extrat 01::736
152 = 64%
of the available prot (in expetation). By integrating
equation 4 over [0,19℄, we get the total prot (per ustomer). A produer using 2-part tari for 20 iterations
an expet to earn a total undisounted prot of 12.09,
or 52% of the available prot. Similarly, a produer using blok priing would have C (Blok; 20; p ) = 0:695
and a umulative prot of 11.79, so two-part tari
would be a better model. As the number of iterations
inreases, C will inrease for eah model and blok priing's higher preision will make it a more attrative
hoie.
Choosing a model
If the produer is to hoose a shedule and then interat
with a population for n iterations, it would use equation
4 to determine the prot at the urrent iteration and
at n iterations. By integrating this equation, it an
determine the expeted prot for eah model and hoose
the model with the highest expeted aggregate prot.
But what if the produer wants to hange shedules
as it learns? In order to determine the optimal sequene
Experimental Results
In order to test whether a produer using model updating does in fat outperform a produer using a stati
model, we onduted a set of experiments omparing
the two sorts of produers. We reated a set of produers that used stati models (one for eah prie shedule), and had them learn the optimal pries for their
partiular shedule when interating with an unknown
onsumer population having w = 10 and k drawn from
U [0; 0:7℄. Eah produer used Amoeba (Press 1992),
an implementation of the Nelder-Mead algorithm for
nonlinear hilllimbing, to learn the optimal pries. We
ompared these results to a deision-theoreti produer
that simultaneously maintained multiple models, whih
were also learned with Amoeba. A graph omparing the
performane of an adaptive produer to stati produers is shown in Figure 2.
As we an see from the graph, the deision-theoreti
learner (denoted as 'Adaptive Priing') is able to outperform most of the stati shedules for the majority
of the experiment. The shedules used by the adaptive
learner are indiated at the top of the gure. It begins
by trying a number of dierent shedules to ollet data,
and then settles on linear priing for approximately 35
iterations. It then swithes to two-part tari for about
60 iterations before moving on to blok priing. Finally, at around 300 iterations, it swithes to nonlinear
priing.
Also, we note that there is no stati shedule that
dominates adaptive priing over the ourse of the entire
experiment. This implies that if a produer is unsure
of how long it will have to interat with a population,
adaptive priing is an eetive approah. In addition,
adaptive priing performs partiularly well in the 'intermediate' range of 10-200 iterations. In this range,
adjusting
Linear
Two-part
Block
Nonlinear
16000
Linear Pricing
Pure Bundling
Two-part tariff
Mixed Bundling
Block Pricing
Nonlinear
Adaptive Pricing
14000
12000
Cumulative Profit per Iteration
of models to use, a produer must onsider not only
the prot a model an allow it to earn, but also what
data aquired in the ontext of that model allows it to
learn about other models. In the most general ase, a
produer would have to onsider all possible sequenes
of shedules. We avoid this problem by having the produer make a greedy estimate of eah shedule's benet;
at eah iteration, it hooses the shedule whih it believes will yield the greatest umulative prot. To make
this eetive, we make one simple assumption: the inputs of a model family at dimension d are a subset of
the inputs of a model family at dimension d + 1. For
example, linear priing an be thought of as two-part
tari with the subsription fee set to 0. This allows a
learning produer to perform a greedy searh through
the spae of model families. The question to onsider is
how muh is learned in the more omplex model from
data olleted in the simpler model. Our solution is to
weight the number of data points by the ratio of dimensionalities of the two models. That is, if a produer
olleted n data points from a model of dimensionality d, we would treat this as n d+d k data points for the
purposes of evaluating the omplexity of a model of
dimensionality d + k . This reets the fat that, by
learning a d dimensional model, one is also learning a
d-dimensional subspae of the larger d + k -dimensional
model.
10000
8000
6000
4000
2000
0
1
10
100
1000
Iteration
Figure 2: Cumulative prot per artile, per period, per
ustomer (N = 10) (averaged over 10 runs)
learning is able to play an important role. If a produer
has only a few iterations to interat with a population,
there is not enough time to learn muh. Conversely, if a
produer has a long time to interat with a given population, shedules an be learned exatly and long-run
steady-state prots will dominate.
Current Researh
We are also interested in extending this idea beyond
prie shedules to onsumer preferenes over types of
artiles. If a produer must also hoose what artiles
to oer to a population and it has aess to a (possibly
inomplete) taxonomy desribing ategories of artiles,
it ould employ this proedure, initially oering broad
ategories and rening its oerings to more spei ategories as information is aquired. As with prie shedules, the hallenge is in applying knowledge aquired
with one model to the updating of other models.
We would also like to be able to apply this tehnique
to populations that learn or adapt ontinuously. A produer that notied a hange in the population ould
temporarily shift to a simpler model, tune this model
to the new population, and then revert to a more omplex shedule.
Referenes
Brooks, C. H.; Fay, S.; Das, R.; MaKie-Mason, J. K.;
Kephart, J. O.; and Durfee, E. H. 1999. Automated strategy searhes in an eletroni goods market:
Learning and omplex prie shedules. In Proeedings
of ACM EC-99, 31{40.
Brooks, C. H.; Durfee, E. H.; and Das, R. 2000.
Prie wars and nihe disovery in an information eonomy. In Proeedings of ACM Conferene on Eletroni
Commere (EC-00).
Chuang, J. C.-I., and Sirbu, M. A. 1997. Network delivery of information goods: Optimal priing of artiles
and subsriptions. In Proeedings of the Conferene
on Eonomis of Digital Information and Intelletual
Property.
Press, W. H. 1992. Numerial Reipes. Cambridge
University Press.

Download Report

Decision-theoretic Learning of Agent Models in an Information Economy.

Paperzz.com

Your Paperzz