Random Choice over a Continuous Set of Options

Random Choice over a Continuous
Set of Options
Hannes Malmberg
Licentiate Thesis in Mathematical Statistics at Stockholm University, Sweden
Licenciate Thesis – Random Choice over a
Continuous Set of Options∗
Hannes Malmberg†
May 15, 2013
Abstract
Random choice theory has traditionally modeled choices over a finite number of options. This thesis generalizes the literature by studying the limiting behavior of choice models as the number of options
approach a continuum.
The thesis uses the theory of random fields, extreme value theory
and point processes to calculate this limiting behavior. For a number
of distributional assumptions, we can give analytic expressions for
the limiting probability distribution of the characteristics of the best
choice. In addition, we also outline a straightforward extension to our
theory which would significantly relax the distributional assumptions
needed to derive analytical results.
Some examples from commuting research are discussed to illustrate
potential applications of the theory.
∗
Filosofie licenciatavhandling. Avhandlingen kommer att presenteras onsdagen den 5/6
2013, kl 15.15 i rum 5:306, Matematiska institutionen, Stockholms universitet, Kräftriket
†
Department of Mathematics, Div. of Mathematical Statistics, Stockholm University.
1
Acknowledgements
First, I would like to thank my supervisor and co-author Ola Hössjer for all
enjoyable mathematical discussions at the white-board and over email which
have pushed this project towards completion. With his help, I have learned
mathematical research as this project has progressed from a Bachelor Thesis
to a Master Thesis, and lastly to this Licenciate Thesis.
I also would like to thank my second supervisor Dmitrii Silvestrov. Dmitrii’s
courses Probability Theory IV and Stochastic Processes IV led to a quantum leap in my understanding of probability theory. Moreover, during my
research, his thorough reading and constructive criticism have repeatedly
forced me to clarify and restructure my ideas.
I would also like to thank Pieter Trapman for reading and commenting
on the manuscript, as well as my very close friend Zihan Hans Liu for never
failing to expand my mathematical horizons when I talk to him.
2
List of Papers
This thesis consists of two papers
1. MALMBERG, H., HÖSSJER, O.: Argmax over Continuous Indices of
Random Variables – An Approach Using Random Fields, submitted to
Applied Probability Trust.
2. MALMBERG, H., HÖSSJER, O.: Extremal Behaviour, Weak Convergence and Argmax Theory for a Class of Non-Stationary Marked Point
Processes, submitted to Extremes.
In both papers, the authors collaborated on developing the general structure
of the ideas, and discussed to overcome problems arising during the progress
of the works. H. Malmberg developed most of the exact statements and
provided the proofs. O. Hössjer read the papers a large number of times,
and following this readthroughs, theoretical extensions were developed in
joint discussions.
3
Introduction
1
Background
Imagine a person who has decided to start a new job, and who is about to
choose a place of residence. Two counteracting tendencies exist. On the one
hand, living close to the job is preferable as costs associated with transport
increase with distance. On the other hand, the area per radial segment
increases the further away you go from your workplace. There is more area
between 100 m and 110 m from your job than between 0 m and 10 m. Thus,
the probability of finding a good house in a given radial segment increases
with distance from the job. How do these two tendencies interact to shape
the statistical behavior of residential choice?
The residential choice problem belongs to a class of problems which this
thesis addresses. We develop a framework for discussing questions of choice
where the set of choice options is continuous, and where there is a random
element in the choice process.
In this introduction, we will give a brief overview of motivating empirical regularities and previous theory, as well as introducing our setup and
describing our results. We will conclude with a discussion on potential future developments. To allow us to focus on the mathematical intuition, some
technical detail will be left out in the kappa. Interested readers are referred
to the papers for formal definitions and proofs.
1.1
Empirical Motivations
When discussing how people make choices over continuous variables such as
residential location, two important observations stand out.
First, people make very different choices, and all variation cannot be
explained by observed individual characteristics. This suggests that a statistical approach is appropriate. Secondly, there are some statistical regularities
which warrants a search for an explanatory model. Figure 1 summarizes some
salient features. The figure shows commuting distances in the Swedish labor
market from Kungsholmen, Stockholm. We note that the distribution is unimodal with a skew to the right. It has the property of being approximately
Gamma distributed over short distances, with a somewhat thicker tail than
a gamma distribution. The pattern in Figure 1 can be found in other similar
applications, such as when we measure the distance traveled to school.
4
Figure 1: Histogram over commuting distances in Kungsholmen, Stockholm
0.00015
0.00000
0.00005
0.00010
Density
0.00020
0.00025
Histogram of avstkort
0
2000
4000
6000
8000
10000
avstkort
The fact that people make different choices, but that these choices display
regular features when aggregated, suggests that there is a value in attempting
to develop a statistical theory to explain the underlying choice process. This
is the aim of this thesis.
2
2.1
Theoretical preliminaries
Random choice theory
In our model setting, an agent makes a zero-one choice concerning every
point in a continuous space – in that sense we model discrete choices over a
continuum of choice options. We follow the tradition in economics and model
these discrete choice problems as random. This stems from an aim to predict
the proportion of people selecting a specific option (or collection of options
in our case), which differs from traditional demand analysis where we want
to explain how much consumers buy of a particular good.
5
The probabilistic theory of choice started with Luce (1959) who posited
a collection of axioms from which he derived the logit model for choice
probabilities. The axiomatic approach was later partially subsumed under
an approach based on utility maximization with unobservable characteristics/preferences (McFadden, 1980). In this literature, subjects are assumed
to value choices according to the expression
Ui = h(xi ) + εi
i = 1, · · · , n0 ,
(1)
where xi are the (non-random) characteristics of option i. It can P
be shown
0
h(xi )
that in this model, the probability of selecting alternative i is e
/ nj=1
eh(xj )
if the εi ’s are independently Gumbel-distributed. Thus, we can derive logit
probabilities from the assumption of utility maximization by making appropriate distributional assumptions.
This approach to probabilistic choice is called random utility theory and
has been extended to more functional forms, distributional assumptions and
applications since McFadden’s initial contribution (Ben-Akiva and Lerman,
1985, Anderson et al., 1992, Train, 2009).
Our thesis can be viewed as an extension of this framework in two directions. First, the xi ’s are random variables in our setup. Furthermore, we let
the number of choices n0 go to infinity, and study the continuous limit of a
sequence of discrete choice models.
2.2
Extreme Value Theory
Extreme value theory is a branch of mathematics studying the asymptotic
properties of the sequence of random variables
Mn = max Zi ,
1≤i≤n
where {Zi }∞
i=1 is a sequence of random variables. The foundational theorem
in the literature deals with the case when the sequence of random variables
Zi are independent and identically distributed (Fisher and Tippett, 1928,
Gnedenko, 1941). This theorem states that if there exist sequences of real
numbers an and bn such that for all y, we have
Mn − bn
≤ y → G(y),
(2)
P
an
6
then the function G belongs to one of the following parametric families
distributions functions, with functional forms:
z−b
G0 (z) = exp − exp −
, for z ∈ R
a
(
z≤b
n 0 o
G−α (z) =
z−b −α
exp − a
z>b
α
exp − − z−b
z≤b
a
Gα (z) =
1
z>b
F of
(3)
(4)
(5)
where a, b and α are constants, of which a and α are constrained to be positive. Whereas α determines F, a and b are the scale and location parameters
of F. These three functional forms are the Gumbel (3), Weibull (4), and
Frechet (5) families respectively.
The theorem means that insofar the maximum of a collection of random
variable converges after a suitable sequence of affine transformations, the resulting distribution will belong to a small class of distributions. Different
distributions of Zi will yield different limiting distributions G, and by modifying an and bn it is easy to see that any combination of a and b can be
attained as limit in (2) within the family F.
The earlier extreme value theory has been extended in a number of different directions. Most similar to our project in Paper 1 is the work on relaxing
the assumption of the Zi ’s being identically distributed while retaining the assumption of independence (Weissman, 1975, Horowitz, 1980). Our approach
in Paper 1 also connects to the study of Gumbel random fields. For a recent
treatment of the subject of on Gumbel random fields, see Robert (2013).
We refer to Leadbetter et al. (1983) and Resnick (2007) for a more comprehensive treatment of extreme value theory.
2.3
Concomitants of extreme order statistics
The research area in statistics which is most closely related to our problem
(and to random utility theory) is the theory of concomitants of extreme order
statistics (David and Galambos, 1974, Nagaraja and David, 1994, Ledford
and Tawn, 1998). This theory deals with the asymptotic behavior of the
object
X[n:n] = XIn
where (X1 , U1 ), · · · (Xn , Un ) is a sequence of i.i.d. random variables where the
Ui ’s are real-valued, the Xi ’s belong to a general space, and In = arg max1≤i≤n Ui .
A difference from (1) is that not only Ui , but also Xi , is random.
7
3
Description of Papers
The two papers in the thesis answer a similar question with similar results.
The difference between them is the strategies they employ to perform the
main step of the derivation. Thus, all but two subsections in this section will
be common for both papers.
3.1
Problem formulation
For presentational clarity, we describe a single problem setup although it
differs somewhat between the papers. The setup presented here is used in
Paper 2, and although there are some differences to the setup used in Paper
1, these differences are sufficiently non-fundamental so that the methodology
of Paper 1 can be explained using the setup in Paper 2.
There is a set Ω ⊆ Rk of choice characteristics, and a distribution Λ
on Ω giving the relative prevalence of different characteristics. For each
characteristic x ∈ Ω, there is a conditional probability distribution of utility
µ(·; x) = P (U ∈ ·|X = x).
We can define the bivariate distribution of characteristics and utilities on
rectangular sets A × B with A ⊆ Ω and B ⊆ R as
Z
µ(B; x)dΛ(x).
(6)
P ((X, U ) ∈ A × B) =
A
Our basic building block will be a sequence of choice alternatives
(X1 , U1 ), · · · , (Xn , Un ),
(7)
which are independently and identically distributed according to the bivariate
distribution given in (6). For each fixed n, we define
In = arg max Ui
1≤i≤n
as the index of the variable with the highest utility, and XIn = X[n:n] as the
characteristics vector of this random variable. For a fixed n, the probability
distribution of the characteristics vector of the selected alternative is
Cn (·) = P (XIn ∈ ·).
(8)
In both papers, we look for the asymptotic properties of the sequence Cn . In
particular, we will look for a probability measure C on Ω such that
Cn ⇒ C,
8
(9)
where ⇒ stands for weak convergence of probability measures on (Ω, B(Ω)),
where B(Ω) is the σ-algebra of Borel subsets of Ω (for an extensive treatment
of weak convergence, see Billingsley, 1971)
Later in the introduction, we will sometimes write Cn and C to denote
random variables, the laws of which are given by (8) and (9). It will be clear
from the context when this has been done.
3.2
Random fields
In both papers, we study the asymptotic behavior of (8) by studying an
intermediate mathematical object, namely the random field Mn , which in
Paper 1 is defined by
Mn (A) =
sup
Ui
A⊆Ω
(10)
1≤i≤n,Xi ∈A
with the convention that the supremum of the empty set is −∞. We can parse
this definition. This random field takes subsets A of the characteristics space
Ω as arguments, and returns a real number. Supplied with the argument A,
the random field returns the value of the best offer having a characteristic
vector in the set A ⊆ Ω.
We will define an important functional from the set of random fields to
the set of probability measures on Ω by
F (·; M ) = P (M (·) > M (·c )),
(11)
where ·c stands for the complement of · in Ω. Intuitively, F (A; M ) gives the
probability that the best offer, corresponding to max field M , belongs to A.
We connect the random fields (10) and the functional (11) to our problem
by observing that
Cn (·) = P (Mn (·) > Mn (·c )) = F (·; Mn ).
(12)
Equation (12) suggests that we can study the asymptotic behavior of Cn by
studying the asymptotic behavior of Mn . Indeed, we seek to find a random
field M such that Mn , or a monotone transformation of Mn , converges to
M in an appropriate way. The correct sense of convergence is one which
makes (11) continuous as a function from the set of random fields to the set
of probability measures, where the topology on probability measures is that
of weak convergence. If we can demonstrate such continuity, we can conclude
that
F (·; Mn ) ⇒ F (·; M )
(13)
9
on (Ω, B(Ω)), whenever Mn converges to M . Combining (13) with (12), we
then get
Cn (·) ⇒ F (·; M ),
and insofar F (·; M ) is known, we have characterized the limit of Cn .
The method outlined above is the approach used in both papers to solve
our problem. The distinctiveness of the two papers lie in that they use
different ways to find M , and different ways to demonstrate that the sense
of convergence of finite sample max fields to M ensures continuity of F .
3.3
Paper 1
m
In Paper 1, we define a notion of convergence → on the space of random
fields. We say that
m
Mn → M
if there exists a sequence of strictly increasing functions gn such that for each
A ⊆ Ω satisfying some regularity conditions,
gn (Mn (A)) ⇒ M (A).
In Paper 2, we show that this notion of convergence makes (11) a continuous
from the set of random fields to the set of probability measures on Ω.
Having made assumptions on the bivariate distribution of (Xi , Ui ), we
use extreme value theory to derive M . The sequence of strictly increasing
functions gn is defined as
z − bn
,
gn (z) =
an
where an and bn are the normalizing sequences described in Section 2.2. This
method works as the number of offers having characteristics in A increases
to infinity as n → ∞. Because their associated utilities are conditionally
independent, we can apply extreme value theory. The main caveat lies in
that traditional extreme value theory assumes that the Ui are identically
distributed, whereas the distribution of the Ui ’s varies with characteristics in
our setup. A large part of the theoretical work in Paper 1 involves dealing
with this variation in the distribution of Ui .
3.4
Paper 2
In Paper 2, we place a second intermediate object between the sequence (7)
and the quantity (8). In this paper, we observe that (7) can be viewed as a
10
point process on the product space Ω × R. Defining the set function
1
if (x, u) ∈ F
δ(x,u) (F ) =
,
0
if (x, u) ∈
/F
with F a Borel subset of Ω × R, we define our point process as
ξn (·) =
n
X
δ(Xi ,gn (Ui )) (·).
(14)
i=1
where
z − bn
an
will be a sequence of strictly increasing functions, coinciding with the normalization from Section 2.2 for the distribution U |X = x, assuming that the
same normalization gn can be used for all x ∈ Ω.
In this setup, we can define the random field
gn (z) =
Mξn (A) = sup{u ∈ R : ξn (A × [u, ∞)) = 0} A ⊆ Ω,
(15)
that is, the largest number u such that the point process ξn has a point in
the set A × [u, ∞). As the positions of points are random, Mξn (A) will be
a random variable for each A ⊆ Ω, and thus, Mξn is a random field with
subsets of Ω as arguments. Again, we have
Cn (·) = P (Mξn (·) > Mξn (·c )).
(16)
Notice that the max fields in Papers 1 and 2 are slightly different, since
Mξn (A) = gn (Mn (A)), although they only differ by the monotone transformation gn . The fact that Mn and Mξn are related by a montone transformation
also means that (16) is equivalent to the definition in (8).
In Paper 2, we study the limiting behavior of (16) by studying the limiting
behavior of (14). Building on the connection between extreme value theory
and point processes described in Resnick (2007), we find a Poisson process ξ
such that
p
ξn ⇒ ξ,
p
where ⇒ denotes convergence in a point process sense. We show that point
process convergence implies that Mξn converges to Mξ , defined in (15), in
such a way that
F (·; Mξn ) ⇒ F (.; Mξ ).
and we can thus conclude that
Cn (·) ⇒ P (Mξ (·) > Mξ (·c ))
as required. Insofar P (Mξ (·) > Mξ (·c )) is easy to characterize, we have
succeeded with our aim of finding the asymptotic behavior of Cn .
11
3.5
Examples
We apply our theory by making assumptions on the distribution of characteristics Λ and the conditional probability measures of utility µ(·; x) for x ∈ Ω.
In this section, we will write Γ(k, λ) to denote a gamma distribution with
density function λk xk−1 e−xλ /Γ(k), where Γ(·) denotes the gamma function.
We assume that the distribution of characteristics Λ on Ω has a density
function λ(x), and that utility is given by
Ui = m(xi ) + i
with m(x) being a regression function and i ∼ Γ(1, 1), so that errors are
exponentially distributed. In this case, the limit
lim Cn (·),
n
in the sense of weak convergence, has a density function
em(x) λ(x)
em(y) λ(y)dy
Ω
f (x) = R
x ∈ Ω.
This means that the process of taking the best choice in the limit has the
effect of exponentially tilting the initial distribution with em(x) , thereby attaching more weight to points with high deterministic utility component.
In particular, let us return to our example in Section 1 on residential
location choice, with Ω = R2 , x = (x1 , x2 ) and linear cost, that is
m(x) = −c||x||,
where || · || is the Euclidean distance from the origin. In this case, we get the
choice density
e−c||x|| λ(x)
R
.
e−c||s|| λ(s)ds
R2
If, furthermore, we have constant (improper) population density λ ≡ 1, the
density over the distance from job ||x|| is
c2 ||x||e−c||x|| ,
which is a Γ(2, c) distribution. This result agrees with the empirical pattern
of a unimodal commuting length distribution with a skew to the right.
12
4
Discussion
In this thesis, we build two frameworks to analyze the asymptotic behavior
of the choice random variable Cn . However, there are theoretical challenges
remaining. In particular, we have only found tractable results for a small
number of distributional assumptions. Going ahead, the main priority is to
enable us to relax these strict assumptions.
The problem arises because the limiting distribution of Cn is very dependent on the tail behavior of the stochastic utility component. Essentially,
if the tail of i is too thin, the best choice will, in the limit, always be determined by the deterministic component m(x) of utility. Simply put, with
thin tails, the stochastic element can never beat the deterministic element.
In these cases Cn will converge to a degenerate distribution, supported on
the set of characteristics
arg max m(x)
x∈Ω
with a maximal deterministic utility component.
We can illustrate this by going back to our standard residential choice
example with Ω = R2 and m(x) = −c||x||. Assume that the stochastic
component of utility is normally distributed. In that case, Cn → (0, 0)
almost surely as n → ∞. That is, with probability 1, we will live right next
to our job.
The idea going forward is that although Cn converges to a degenerate
distribution, the way in which it converges is interesting. We never have an
infinite amount of residential offers, but merely a large finite number. Thus,
we would like to extend our framework to look for sequences hn such that
hn (Cn )
converges to a non-degenerate random variable.
We believe this is a promising avenue of research, and tentative results
suggest that it for example might be possible to get a gamma distribution for
all stochastic disturbances i which lie in the extremal convergence domain
of the Gumbel distribution, which is a very large family of random variables.
If this aim can be achieved, the theory will have become a very general
tool to map distributional assumptions in random utility theory to outcome
predictions of the characteristics of the best choice.
References
Anderson, S. P., De Palma, A., and Thisse, J.-F. (1992). Discrete choice
theory of product differentiation. MIT press.
13
Ben-Akiva, M. and Lerman, S. (1985). Discrete choice analysis: theory and
application to travel demand, volume 9. MIT press.
Billingsley, P. (1971). Weak convergence of measures. SIAM, Philadelphia.
David, H. and Galambos, J. (1974). The asymptotic theory of concomitants
of order statistics. Journal of Applied Probability, pages 762–770.
Fisher, R. A. and Tippett, L. H. C. (1928). Limiting forms of the frequency
distribution of the largest or smallest member of a sample. In Mathematical
Proceedings of the Cambridge Philosophical Society, volume 24, pages 180–
190. Cambridge Univ Press.
Gnedenko, B. (1941). Limit theorems for the maximal term of a variational
series. In Doklady Akad. Nauk SSSR.
Horowitz, J. (1980). Extreme values from a nonstationary stochastic process:
an application to air quality analysis. Technometrics, 22(4):469–478.
Leadbetter, M. R., Lindgren, G., and Rootzén, H. (1983). Extremes and
related properties of random sequences and processes. Springer Verlag.
Ledford, A. W. and Tawn, J. A. (1998). Concomitant tail behaviour for
extremes. Advances in Applied Probability, 30(1):197–215.
Luce, R. D. (1959). Individual Choice Behavior a Theoretical Analysis. John
Wiley and sons, New York.
McFadden, D. (1980). Econometric models for probabilistic choice among
products. Journal of Business, 53(3):13–29.
Nagaraja, H. N. and David, H. A. (1994). Distribution of the maximum
of concomitants of selected order statistics. The Annals of Statistics,
22(1):478–494.
Resnick, S. I. (2007). Extreme values, regular variation, and point processes.
Springer, New York.
Robert, C. Y. (2013). Some new classes of stationary max-stable random
fields. Statistics & Probability Letters, 83(6):1496–1503.
Train, K. E. (2009). Discrete choice methods with simulation. Cambridge
University Press, Cambridge, 2nd edition.
Weissman, I. (1975). Extremal processes generated by independent nonidentically distributed random variables. The Annals of Probability, pages
172–177.
14
Argmax over Continuous Indices of Random
Variables - An Approach Using Random
Fields
Hannes Malmberg∗
Ola Hössjer†
May 13, 2013
Abstract
In commuting research, it is common to model choices as optimization over a discrete number of random variables. In this paper
we extend this theory from the discrete to the continuous case, and
consider the limiting distribution of the location of the best offer as
the number of offers tends to infinity.
Given a set Ω ⊂ Rd of possible offers we seek a distribution over Ω,
the argmax measure of the best offer. It depends on Λ, the sampling
distribution of offer locations, and a measure index µ, which assigns
to each point x ∈ Ω a probability distribution of offers.
This problem is closely related to argmax theory of marked point
processes, altough we consider deterministic sequences of points in
space, to allow for greater generality. We first define a finite sample
argmax measure and then give conditions under which it converges as
the number of offers tends to infinity.
To this end, we introduce a max-field of best offers and use continuity properties of this field to calculate the argmax measure. We
demonstrate the usefulness of the method by giving explicit formulas
for the limiting argmax distribution for a large class of models, including exponential independent offers with a deterministic, additive
disturbance term. Finally, we illustrate the theory by simulations.
Key words: Argmax distribution, commuting, extreme value theory,
exponential offers, marked point processes, max field.
∗
†
Department of Mathematics, Div. of Mathematical Statistics, Stockholm University.
Department of Mathematics, Div. of Mathematical Statistics, Stockholm University.
1
1
Introduction
In commuting research, choices are offered at various points in space and are
assumed to have random value. It is of interest to determine which offer is
optimal as well as deriving the statistical properties of the choice, both in
terms of the value of the offer and its position in space. This is related to
random utility theory, the the branch of economics that has dealt most with
commuting decisions, and random utility theory postulates that we value
options according to a deterministic component and a stochastic disturbance
term, see Manski and McFadden (1981).
In this paper, we focus in particular on the positional distribution of the best
offer, continuing studies initiated in Malmberg (2011, 2012). To this end, we
create a mathematical formalism of maximization over a potentially infinite
number of random offers. To put it more formally, let Ω ⊆ Rk be a Borel
measurable set, and let P R denote the set of probability measures on R. We
index a set of distributions by µ : Ω → P R , where µ(x) is the distribution
of offers at location x ∈ Ω. Such an indexation can for example state that
the distribution of offers become shifted to the left the further away from
the origin we are, due to travelling costs. Secondly, we have a population
distribution Λ on Ω, giving us the relative number of offers we can expect
from different locations.
The task is to define the probability distribution of the location of the best
offer when the relative intensity of offers is provided by Λ, and the relative
quality of offers by µ. We build the theory by first defining the probability
distribution of the location of the best offer for finite samples and then define
a limiting distribution when the number of offers tends to infinity.
It turns out that the distribution has very interesting mathematical properties, and that for particular choices of µ, including exponential distributions
with deterministic additive disturbances, this limit is also very explicit and
interpretable. In the process of answering our posed question, some theoretical tools are developed and results are derived that are interesting in
their own right. In the end, we show that that the theory can potentially be
extended in a number of interesting directions.
There is quite an extensive literature on ranom utility theory for finite choice
sets, see for instance Marley and Colonius (1992), Mattsson et al. (2011) and
references therein. Our paper represents a generalization to non-finite choice
sets. In this, it has similar aims as Ben-Akiva et al. (1985), Resnick and Roy
(1989), and Dagsvik (1994). The main difference is that our approach does
not depend on the "independence of irrelevant alternatives" assumption on
2
the final distribution of choices. Instead, we let the numbers of alternatives
grow where each alternative has a random utility with arbitrary distribution.
In our case, the continuous logit model is a special case when this arbitrary
distribution is an exponential distribution with determinstic, spatially varying, shifts.
The location of offers can be viewed as the realization of a point process,
cf. Cox and Isham (1980) and Diggle (2003). The value associated to each
offer is a mark, and hence the joint sequence of locations and values of all
offers becomes a marked point process (Jacobsen, 2006). Our results are
closely related to an asymptotic theory for the argmax (or the position of
the largest record) of a marked point processes as the intensity of the point
process tends to infinity. The limiting argmax distribution coincides with
Λ when the offer distribution µ(x) ≡ H is independent of location. On
the other hand, when µ(x) varies with x we get a non-stationary sequence
of marks, which, under certain conditions, yields an associated non-trivial
limiting argmax distribution.
The theory which is most closely related mathematically to the one presented
in this paper is the theory of concomitants of extreme order statistics, see
for example Ledford and Tawn (1998) and the references therein. The main
difference is that we consider limits of deterministic point processes in contrast to large samples from explicitly bivariate distributions. Moreover, we
treat the case when µ corresponds to homoscedastic regression in particular
detail. To the best of our knowledge ours is the first attempt to apply an approach using random fields to the analysis of concomitants of extreme order
statistics.
2
Defining the argmax measure
In this section, we provide the definition of the argmax measure with respect
to µ and Λ. We will first introduce some relevant concepts needed to state
the definition.
Definition 1 Let Ω ⊆ Rk and let
µ : Ω → PR
where P R is the space of probability measures on on R. Then µ is called
an absolutely continuous measure index on Ω if, for each x ∈ Ω, µ(x) is
an absolutely continuous probability measure on R with respect to Lebesgue
measure.
3
Unless otherwise stated, µ refers to an absolutely continuous measure index
and Ω is a subset of Rk .
We will now introduce the basic building block of our theory: the argmax
measure associated with a deterministic set of points. Throughout the discussion, elements of point sequences N n = {xn1 , xn2 , ..., xnn } will be multi-sets,
i.e. the xn,i ’s are not necessarily distinct for identical n.
n
Definition 2 An indexed random vector Y N with respect to µ is a random vector on Rn with independent components, where each component has
marginal distribution µ(xni ).
Unless there is ambiguity, we omit the superscript N n .
n
Definition 3 The point process argmax-measure T̃µN is defined as
n
T̃µN (A)
=P
max
1≤i≤n:xni ∈N n ∩A
Yni ≥ max Yni
1≤i≤n
= P(Xn ∈ A),
(1)
for all Borel measurable sets A ⊆ Ω, and
Xn = arg maxn Yni
xni ∈N
(2)
is the almost surely unique argmax of {Yni }.
We use the convention of putting a ∼ on top of objects having (deterministic)
empirical distributions as arguments, and drop ∼ for their large sample limits.
We will write QΩ to denote the set of finite multisets on Ω. With this
B(Ω)
notation, T̃µ is a function from QΩ to R+ , the family of non-negative set
functions on the Borel sigma algebra on Ω. We use the family of non-negative
set functions as we want to be able to consider mappings which possibly take
values which are not probability measures.
Even though N n is a deterministic set of points, it can typically be thought of
as the realization of a point process. If so, we condition on the randomness
associated with that process. In any case, it is convenient to define the
empirical distribution function
n
P N (A) =
#{A ∩ N n }
n
for all Borel sets A ⊂ Ω.
4
Definition 4 For a probability distribution Λ, we define the point sequence
domain of convergence as
n
N Λ = {N n } : P N ⇒ Λ
i.e. the class of point sequences whose empirical distributions converge to Λ.
We have now introduced the concepts needed to define the argmax measure.
Definition 5 (Limiting argmax measure.) A probability measure TµΛ such
that
n
T̃µN ⇒ TµΛ
(3)
for all {N n }n∈N ∈ N Λ will be called an argmax measure with respect to µ and
Λ. Here (and everywhere else in the paper), ⇒ refers to weak convergence.
3
Calculating the argmax measure
In this section, we will develop a method for calculating the argmax measure.
n
For each N n , we attach a particular random field M̃ N . Thereafter, we
n
derive asymptotic properties of T̃ N by considering the asymptotic behavior
n
of M̃ N . We will first introduce random fields and define the relevant terms.
Thereafter, a notion of convergence in random fields is introduced, and we
prove a result connecting this convergence with the convergence to an argmax
measure.
3.1
Random fields in an argmax context
We write a random field over the sigma algebra of Ω as
M : S × B(Ω) → R
where S is a generic sample space and B(Ω) denotes the Borel σ-algebra on
Ω. Thus, for fixed s, M (s, ·) is a set function on B(Ω) and for fixed A ∈ B(Ω),
M (·, A) is a random variable taking values in R. We sometimes write M (A)
as short-hand for M (·, A) and we write M (s, A) for a particular realization
of the random variable M (·, A). We will write MB(Ω) to denote the set of all
random fields over B(Ω).
5
Our most important random field will be
n
Nn
M̃ = M̃ N (A) =
max
1≤i≤n:xni ∈N n ∩A
Yni , A ∈ B(Ω)
(4)
with the convention that the maximum over the empty set is −∞. We note
that this is a function from QΩ to MB(Ω) . The following operator on random
fields is also important.
B(Ω)
Definition 6 The pseudo-argmax measure F : MB(Ω) → R+
is defined by
F (A, M ) = P (M (A) ≥ M (Ω))
for all A ∈ B(Ω)
B(Ω)
We note that F (·, M ) is a set function in R+ . It is clear from our definitions
that
n
n
F (·; M̃ N ) = T̃ N
which is illustrated in the following commutative diagram.
M̃-
PΩ
MB(Ω)
T̃
F
-
?
B(Ω)
R+
We will use this commutative property to derive convergence in T̃ in Definition 3 from convergence in M̃ .
3.2
Max-fields
n
When considering the asymptotic properties of M̃ N , we have to worry about
n
two things. Firstly, although we know that F (·, M̃ N ) is a probability measure on Ω, we do not know that this is true for any candidate limiting random
field. Thus, we need a set of conditions on M to ensure that F (·; M ) is a
probability measure. Secondly, to discuss limiting behavior we need a notion
of convergence, and it should have the property that F is continuous under
B(Ω)
this definition with respect to the weak topology on R+ .
Definition 7 Let M : S × B(Ω) → R be a random field over B(Ω). We call
M an (independence) max-field if the following seven properties hold:
6
1. M (A) and M (B) are independent random variables whenever A ∩ B =
∅;
2. If I = A ∪ B then M (I) = max{M (A), M (B)};
3. |M (A)| < ∞ almost surely or M (A) = −∞ almost surely;
T
4. If A1 ⊇ A2 . . . , and n An = ∅, then M (An ) → −∞ almost surely;
5. M (∅) = −∞;
6. If M (A) = −∞ almost surely, M (Ω \ A) > −∞ almost surely;
7. If M (A) > −∞ almost surely, M (·, A) is an absolutely continuous
probability measure on R with respect to Lebesgue measure
The set of assumptions is not minimal, for example, 7 implies 6. They are
included as they are used independently to prove the following Lemma.
Lemma 1 If the random field M is a max-field, then the pseudo argmax
measure F (·; M ) is a probability measure over B(Ω).
Proof. To prove that F (. . . ; M ) is a probability measure, we first note that
F (A; M ) ∈ [0, 1]
for all A ∈ B(Ω). Furthermore, M (∅) = −∞ and M (Ω) > −∞ by property
5 and 6. Hence,
F (Ω; M ) = P (M (Ω) > M (∅)) = 1.
We need to demonstrate countable additivity. As a first step, we establish
finite
S additivity. We introduce a new notation for the residual set An+1 =
Ω \ ni=1 Ai , and the events
Bi = {M (Ai ) > M (Ω \ Ai )} for i = 1, 2, ..., n + 1.
It is evident that F (Ai ; M ) = 0 if M (Ai ) = −∞ so let us assume they are
not. By absolute continuity, the Bi ’s are almost surely disjoint. Hence,
Sn
F (A; M ) = P
(
Pn i=1 Bi )
= Pi=1 P(Bi )
n
=
i=1 F (Ai ; M )
7
For countable additivity, it suffices to show that if A1 ⊇ A2 ⊇ A3 . . . such
that ∩n An = ∅, then F (An ; M ) → 0. However, by Definition 7, M (An ) →
−∞ almost surely. Furthermore,
max{M (An ), M (Ω \ An )} = M (Ω) > −∞
almost surely. Hence,
F (An ; M ) = P (M (An ) > M (Ω \ An )) → 0,
2
and the proof is complete.
3.3
Derivation of calculation methods
We define a notion of convergence on MB(Ω) under which the pseudo argmaxmeasure map
F : MB(Ω) → RB(Ω)
is continuous with respect to the weak topology on RB(Ω) . Later, this will
give us a method to calculate the argmax measure.
Definition 8 A sequence of max-fields Mn on B(Ω) is said to m-converge
to the max-field M if there exists a sequence gn : R → R of strictly increasing
functions such that
gn (Mn (A)) ⇒ M (A).
(5)
for all A with
F (∂A, M ) = 0
Theorem 1 Let {Mn } and M be max-fields such that
m
Mn → M
Then
F (·, Mn ) ⇒ F (·, M )
where F (·, M ) is the pseudo argmax-measure.
Before proving the theorem, we state an important corollary illustrating how
it can be used.
8
Corollary 1 Suppose there exists a max-field M Λ such that for all N n ∈ N Λ
n
m
M̃ N → M Λ
Then, the argmax measure T Λ exists and is given by
T Λ = F (·; M Λ ).
(6)
Proof of corollary. We note that
n
n
T̃ N = F (·; M̃ N )
and apply Theorem 1 to conclude that
n
T̃ N ⇒ F (·; M Λ )
for all {N n } ∈ N Λ . By Definition 5, T Λ is the argmax measure.
2
Proof of theorem 1. Let A ⊆ Ω be measurable with F (∂A; M ) = 0. We
seek to show that F (A; Mn ) → F (A; M ), and consider three cases.
Case 1. M (A), M (Ac ) > −∞ a.s.. By the assumption of m-convergence and
F (∂A; M ) = 0, we can find a sequence of strictly increasing functions gn such
that
n
gn (M̃µN (A)) ⇒ M (A)
n
gn (M̃µN (Ac )) ⇒ M (Ac )
n
n
hold simultaneously. As gn (M̃µN (A)) and gn (M̃µN (Ac )) are independent for
all n, this means that
n
n
gn (M̃µN (A)) − gn (M̃µN (Ac )) ⇒ M (A) − M (Ac )
By Definition 7, M (A) and M (Ac ) are absolutely continuous with respect to
lesbesgue measure and independent, and therefore their difference is absolutely continuous. Hence,
F (A; Mn ) =
=
=
→
=
n
n
P(M̃µN (A) > M̃µN (Ac ))
n
n
P(gn (M̃µN (A)) > gn (M̃µN (Ac )))
n
n
P(gn (M̃µN (A)) − gn (M̃µN (Ac )) > 0)
P(M (A) − M (Ac ) > 0)
F (A; M )
where we use absolute continuity to conclude that 0 is a point of continuity
of M (A) − M (Ac ). Therefore, we get
F (A; Mn ) → F (A; M )
9
Case 2. M (A) = −∞ a.s. From Definition 7, M (Ac ) > −∞ almost surely,
which means that F (A; M ) = 0. Furthermore,
gn (Mn (A)) ⇒ −∞.
gn (Mn (Ac )) ⇒ M (Ac ) > −∞.
We can find K such that P(M (Ac ) > K) = 1 − , and n0 such that for all
n ≥ n0 , P (gn (Mn (A)) < K) > 1 − and P (gn (Mn (Ac )) > K) > 1 − 2.
Then, for all n ≥ n0 , P(Mn (A) > Mn (Ac )) < 3. As was arbitrary, we get
F (A; Mn ) → 0 = F (A; M ).
Case 3. M (Ac ) = −∞. We use F (A, Mn ) = 1 − F (Ac , Mn ) to conclude from
Case 2 that
F (A, Mn ) → 1.
Furthermore, F (A; M ) = 1 as
F (A; M ) = P (M (A) > M (Ac ))
= 1.
and we get that
F (A; Mn ) → F (A; M )
in this case as well
2
4
Argmax measure for homoscedastic regression models
The result in Corollary 1 shows that the methods developed in the previous
section give a method for calculating the argmax measure that is workable
n
insofar it is possible to find a max-field MµΛ to which M̃µN m-converges for
all N n ∈ N Λ .
In this section we make a particular choice
Yni = m(xni ) + ni ,
(7)
for i = 1, . . . , n, where m : Ω → R is a given deterministic regression function
and {ni } are independent and identically distributed (i.i.d.) error terms with
10
a common distribution function H. This is a homoscedastic regression model,
corresponding to a measure index
µ(x) = H(· − m(x)).
(8)
n
In order to to find the limiting behavior of the empirical max-field M̃µN
defined in (4), we note that for all A with Λ(A) > 0, |A∩N n | → ∞ as n → ∞,
which means that maximum is taken over a large number of independent
random variables. Thus, the natural choice is to apply extreme value theory.
We will divide the exposition into four subsections. First we state a classical
result in extreme value theory for m ≡ 0, and its specific counterpart related
to offers H ∼ Exp(s) having an exponential distribution with mean s. The
second subsection develops the extreme value theory for exponential offers
n
with varying m(x), in order to calculate a max-field MµΛ to which M̃µN
m-converges for an appropriate sequence gn of monotone transformations.
Then Corollary 1 is applied in order to calculate the argmax measure TµΛ .
The fourth subsection considers more briefly other distributions H than the
exponential.
4.1
Some extreme value theory
The following theorem is a key result in extreme value theory, see for instance
Fisher and Tippett (1928), Gnedenko (1943), Leadbetter et al. (1983), Gumbel (2004) and Resnick (2008).
Theorem 2 (Fisher-Tippet-Gnedenko Theorem.) Let {Yn } be a sequence
of independent and identically distributed (i.i.d.) random variables and let
M n = max{Y1 , Y2 , . . . , Yn }. If there exist sequences {an } and {bn } with
an > 0 such that
n
M − bn
≤ x = G(x)
lim P
n→∞
an
for all x ∈ R, then G belongs to either the Gumbel, the Frechet, or the Weibull
family.
Under a wide range of distributions of Yn , convergence does occur, and
for most common distributions the convergence is to the Gumbel(γ, β) law,
whose distribution function has the form
x−γ
)
G(x; γ, β) = exp − exp(−
β
11
for some parameters γ and β and x ∈ R. We can give a more precise
statement of Gumbel convergence with an = 1 and bn = log(n) when the
random variables Yi have a standard exponential distribution, see for instance
Resnick (2008) for a proof.
Proposition 1 Let {Yi }ni=1 be a sequence of i.i.d. random variables with
Yi ∼ Exp(s). Then
max Yi − s log(n) ⇒ Gumbel(0, s).
1≤i≤n
4.2
Exponential offers
It turns out that the argmax theory for homoscedastic regression models depends crucially on the error distribution H, and the exponential distribution
is an important boundary between more light and heavy tailed distributions.
Therefore, we treat H ∼ Exp(s) separately in this subsection.
4.2.1
Limiting max-field with varying m(x)
Ordinary extreme value theory assumes that random variables are independently and identically distributed. However, in our case we do not have
identically distributed random variables, as the additive term m(x) varies
over space (for references on the theory of extremes with non-identically
distributed random variables, see for example Weissman (1975), Horowitz
(1980) and Hüsler (1986)). Thus, we prove a result characterizing the maxfield with H ∼ Exp(s) and m(x) varying.
n
Theorem 3 Let M̃µN (A) be as defined in (4), with Yni − m(xni ) ∼ Exp(s)
independently for i = 1, . . . , n and s > 0. Suppose Λ is a probability measure
on the Borel σ-algebra on Ω and that the following properties hold:
1. m is bounded
2. {N n }n≥1 ∈ N Λ
3. Λ(D̄m ) = 0, where Dm = {x ∈ Ω : m(x) is discontinuous at x} and
D̄m = closure(Dm ).
Then (5) holds with gn (y) = y/s − log(n), i.e.
0
n
n
M̃µN (A) = M̃µN (A)/s − log(n) ⇒ MµΛ (A)
12
for all A with Λ(∂A) = 0, where
Z
m(x)/s
Λ
e
Λ(dx) + Gumbel(A).
Mµ (A) ∼ log
(9)
A
The Gumbel(A) means that the marginal distribution of MµΛ (A) is a constant plus a standard Gumbel(0,1) random variable. We give the marginal
distributions of the limiting random field, and do not specify the full set of
the finite dimensional distributions, as the only property on the joint distributions we will need is that the random variables MµΛ (A) and MµΛ (Ac ) are
0 n
0 n
independent. This is true as M̃µN (A) and M̃µN (Ac ) are independent for
all pre-limiting random variables . For more discussion on Gumbel random
fields, see Robert (2013).
Proof. After a standardization Yni ← Yni /s, we may without loss of generality assume s = 1.
n
Let A ⊂ Ω with Λ(∂A) = 0. We note that we have weak convergence of P N
c
, and that on this set m is
to Λ when both measures are restricted to A ∩ D̄m
a continuous bounded function. Thus, by the properties of weak convergence
(cf. e.g. Billingsley 1999), we get
P
1
m(xin )
1≤i≤n:xin ∈A e
n
R
P
n
m(x)
=
dP N (x) + n1 1≤i≤n:xin ∈A∩D̄m em(xin )
c e
A∩
D̄
m
R
(10)
m(x)
→
dΛ(x) + 0
c e
A∩
D̄
m
R m(x)
=
e
dΛ(x).
A
The last sum on the first line tends to 0 as we can write
m̄ = sup m(x)
x∈Ω
and get
1
n
P
1≤i≤n:xin ∈A∩D̄m
P
em(xin ) ≤ n1 1≤i≤n:xin ∈A∩D̄m em̄
n
= n1 nP N (A ∩ D̄m )em̄
→ Λ(A ∩ D̄m )em̄
≤ Λ(D̄m )em̄
= 0,
where in the second last step we utilized that
Λ ∂(A ∩ D̄m ) ≤ Λ(∂A) + Λ(∂ D̄m ) ≤ Λ(∂A) + Λ(D̄m ) = 0 + 0 = 0,
13
(11)
since D̄m is a closed set. We can use (10) to derive the
max-field
directly.
Nn
With gn (y) = y − log(n) we get that if Zn = log(P(gn M̃µ (A) ≤ y)) it
holds that
n
Zn = log(P(
M̃µN (A) ≤ y + log(n)))
P
=
log(1 − exp(−y − log(n) + m(xni )))
1≤i≤n;xni ∈AP
1
= − exp(−y) nR 1≤i≤n;xni ∈A exp(m(xni ) + e(n)
→ − exp(−y) A exp(m(x))Λ(dx)
R
= − exp(−y + log A exp(m(x))Λ(dx) )
where we recognize the last line as the logarithm
of a Gumbel
distribu
R
tion function with an additive term log A exp(m(x))Λ(dx) as required.
Thus, we have proved our result provided we can verify that the error term
e(n) → 0.
To show this we note that
X
log(1−exp(−y−log(n)+m(xni )))+exp(−y−log(n)+m(xni ))
e(n) =
1≤i≤n;xni ∈A
Indeed, using the well-known result that
|log(1 − x) + x| ≤
x2
1−x
we get that
|e(n)| ≤
X
1≤i≤n;xni
exp(−2y − 2 log(n) + 2m(xni ))
→0
1
−
exp(−y
−
log(n)
+
m(x
ni ))
∈A
2
and we have proved our result.
Proposition 2 The random field defined by
Z
m(x)/s
M (A) = log
e
Λ(dx) + Gumbel(A)
A
is a max-field in the sense of Definition 7 when m and Λ satisfy the conditions
of Theorem 3.
Proof. We note that property 1 clearly holds as the M (A) and M (B) are
measurable with respect to independent σ-algebras. Property 2 can be shown
to hold by the properties of the Gumbel distribution. Property 3 holds as m
is bounded. Property 4 and 5 hold as limx→0 log(x) = −∞. Property 6 can
be verified directly from the expression of M , and Property 7 is true as the
Gumbel distribution is absolutely continuous.
2
14
4.2.2
Argmax distribution
n
In Corollary 1, it was shown that the limiting behavior of M̃µN determines the
argmax measure. Thus, we can use the limit derived in Theorem 3 together
with Proposition 2 and Corollary 1 to derive the argmax measure associated
with µ and Λ.
Theorem 4 Let µ(x) = m(x) + Exp(s) and let Λ be a probability measure
on Ω. Suppose that Λ and m jointly satisfy the conditions in Theorem 3.
Then the argmax measure TµΛ exists and is given by the exponentially tilted
distribution
Z
Λ
em(x)/s Λ(dx),
(12)
Tµ (A) = C
A
where
Z
C=
e
m(x)/s
−1
Λ(dx)
(13)
Ω
is a normalizing constant. In particular, if Λ has a density function λ with
respect to Lebesgue measure ν on Ω, then TµΛ has the density function
tΛµ (x) = Cλ(x) exp(m(x)/s)
R
for x ∈ Ω, i.e. TµΛ (A) = A tΛµ (x)ν(dx) for all Borel sets A ⊂ Ω.
(14)
Proof. After standardizing data Yni ← Yni /s, we may, without loss of
generality, assume that s = 1. Proposition 2 states that MµΛ , defined as in
Theorem 3. is a max-field, and in order to find its pseudo argmax measure we
−x
let G(x) = G(x; 0, 1) = e−e denote the distribution
function
of a standard
R m(x)
Gumbel distribution and put L(A) = log A e
dΛ(x) . Then
F (A; MµΛ ) = RP MµΛ (A) > MµΛ (Ω \ A)
∞
= R−∞ P (M (A) ∈ dr) P (M (Ω \ A) < r)
∞
= −∞ G0 (r − L(A)) G (r − L(Ω \ A)) dr
R ∞ −r+L(A) −e−r+L(A) −e−r+L(Ω\A)
= −∞ eR
e
e
dr
∞
−r+L(Ω)
= eL(A)
exp(−r)
exp
−e
dr
−∞
R m(x)
= C Ae
Λ(dx)
for all Borel sets A.
Then note that Theorem 3 implies that
n
M̃µN (A) − log(n) ⇒ MµΛ (A)
(15)
holds for {N n }n≥1 ∈ N Λ and all Borel sets A with Λ(∂A) = 0. It can be
n m
shown that if Λ(∂A) > 0, we have F (∂A, MµΛ ) > 0. Consequently, M̃µN →
15
MµΛ . Finally, Corollary 1 implies that the argmax measure TµΛ = F (·; MµΛ )
exists and is given by (12)
2
Theorem 4 is remarkably simple and explicit. It turns out that this is due
to the memoryless property of the exponential distribution. Indeed, suppose
{xni }ni=1 is an i.i.d. sample from Λ, with n large. Recall definition (11) of
m̄, put I = arg max1≤i≤n Yni and assume for simplicity s = 1. Then, for any
i = 1, . . . , n,
P(I = i) ≈
≈
∝
≈
P(Yni ≥ m̄)P(I = i|Yni ≥ m̄)
e−(m̄−m(xni ))R/ (nP(m(X) + ≥ m̄))
em(xni ) / n RΩ em(x) P Nn (dx)
em(xni ) / n Ω em(x) Λ(dx) .
In the first step we utilized that max1≤i≤n Yni ≥ m̄ holds with probability
close to 1 when n is large, and in the second step approximated the number
of i for which Yni = m(xni ) + ni ≥ m̄ as
n
X
1{m(xni )+ni ≥m̄} ≈ nP(m(X) + ≥ m̄),
i=1
where {xni , ni }ni=1 is an i.i.d. sample from Λ × Exp(1). Finally, we used
the memoryless property of the exponential distribution to deduce that all
indeces i with Yni ≥ m̄ have the same conditional probability of being the
argmax, i.e. I = i.
4.3
Non-exponential offers
In the previous subsection, we found that with m fixed, exponentially distributed offers gave us a one-parameter family of argmax distributions, indexed by s > 0. We will now provide arguments for other error distributions
and find that the exponential case provides the borderline between more
light- and heavy-tailed distributions. Loosely speaking, for light-tailed distribution, it is only the extremal behavior of m that determines the asymptotic
argmax distribution, whereas m has no asymptotic impact for heavy-tailed
distributions.
4.3.1
Light-tailed error distributions
Formally, the light-tailed case corresponds to the class of distribution for
which the moment generating function of the disturbance function is finite
16
for the whole real line. For simplicity, we assume that the support of the
continuous distribution H has an upper bound
K = sup{x; H(x) < 1} < ∞,
and that m is not constant. Applying the identity transformation gn (y) = y,
we deduce that
n
M̃µN (A) ⇒ MµΛ (A) = K + sup m(x).
x∈A
The limiting max field MµΛ is a degenerate in the sense that MµΛ (A) has a one
point distribution, so that the absolute continuity Property 7 of Definition
7 is violated. Therefore we cannot use Theorem 1 in order to deduce the
argmax measure, but have to employ a more direct argument.
Given any ε > 0, we let h(x) = H 0 (x) and define the measure
Z
h (K − ε + m̄ − m(x))
Λε (A) = C
Λ(dx),
A H[K − ε + m̄ − m(x), K]
with h(x) = 0 if x > K, m̄ as in (11), the convention H([K 0 , K]) = 0 when
K 0 > K, and C = C(ε) a normalizing constant chosen so that Λε (Ω) = 1.
Assume further that a limit measure Λmax exists, supported on the set
Ωmax = {x ∈ Ω; m(x) = m̄}
where m is maximal, such that
Λε ⇒ Λmax as ε → 0.
(16)
It is reasonable to assume that Λε should approximate the conditional distribution of Xn given that Yn:n = max1≤i≤n Yni = m̄ + K − ε. (A more formal
argument is provided below). Hence (16) suggests that
TµΛ = Λmax ,
(17)
since Yn:n tends in probability to m̄ + K as n grows. In order to establish
(17) according to Defintion 5, we need a slightly stronger condition though
than (16), as the following theorem reveals:
Theorem 5 For any ε > 0, put
Z
h (K − ε + m̄ − m(x))
n
Nn
P N (dx),
Pε (A) = Cn
A H[K − ε + m̄ − m(x), K]
17
n
where Cn = Cn (ε) is a normalizing constant assuring that PεN (Ω) = 1, and
Z
n
Qn (ε) =
H ([K − ε + m̄ − m(x), K]) P N (dx).
Ω
Assume that
n
⇒ Λmax as n → ∞
PQN−1
n (c/n)
(18)
uniformly for all c ∈ (0, c̄], for any c̄ > 0, with Q−1
n the inverse function of
Qn . Then (17) holds.
n
Proof: According to Definition 5, we need to prove T̃µN ⇒ Λmax for any
{N n }n≥1 ∈ N Λ . Let Zn = m̄ + K − Yn:n . We first note that
Q
P (Xn = xni |Zn ) = h(m̄ + K − m(xni ) − Zn ) j6=i H(m̄ + K − m(xnj ) − Zn )
h(m̄+K−m(xni )−Zn )
∝ H(
m̄+K−m(xni )−Zn )
where Xn is defined as in (2). By conditioning on Zn we notice that
Z ∞
n
Nn
T̃µ (A) =
PεN (A)FZn (dε).
(19)
0
Zn furthermore has the property that
nQn (Zn ) ⇒ Exp(1)
Indeed, for x > 0, we can use the monotonicity of Qn to deduce that
−1
P (nQn (Zn ) ≤ x) = P (ZQ
n ≤ Qn (x/n))
= 1 − ni=1 (1 − H[K + m̄ − m(xni − Q−1
n (x/n), K])
−x
→ 1−e
where the last step uses the well known fact
n
Y
(1 − an,i ) → e−a
i=1
if an,i ≥ 0 and
lim
n→∞
n
X
an,i = a
i=1
and limn→∞ max an,i = 0. These conditions hold in our case as
Pn
−1
−1
i=1 H[K + m̄ − m(xni − Qn (x/n), K] = nQn (Qn (x/n))
= x
18
and limn→∞ max H[K + m̄ − m(xni − Q−1
n (x/n), K] = 0 assuming that H has
no point mass on K.
Thus, nQn (Zn ) ⇒ Exp(1), and we conclude the proof by performing a change
of variable c = nQn () on (20) to get
Z ∞
n
Nn
(A)FnQn (Zn ) (dc).
(20)
PQN−1
T̃µ (A) =
n (c/n)
0
n
Letting e(c, n) = |Λmax (A) − PQN−1 (c/n) (A)| which tends uniformly to 0 on
n
[0, c̄) for any c̄, we get that
n
|T̃µN (A)−Λmax (A)| ≤ sup e(c, n)P (nQn (Zn ) ∈ [0, c̄))+P (nQn (Zn ) ∈
/ [0, c̄))
c∈[0,c̄)
which can be made arbitrarily small. Thus, our proof is completed.
2
4.3.2
Heavy-tailed error distributions
It can be shown that the class of heavy-tailed distributions correspond to
those for which the moment-generating function is undefined for positive
values. For simplicity, we consider the class of Pareto distributions with
shape parameter α > 0 and scale parameter 1, i.e.
H(x) = Pareto(x; α, 1) = 1 − x−α
for x ≥ 1. Then Theorem 2 holds with bn = 0, an = n1/α , and
G(x) = Frechet(x; α, 1, 0) = exp(−x−α )
for x > 0 has a Frechet distribution with shape parameter α, scale parameter
1 and location parameter 0. Since an increases with n at polynomial rate, it
turns out that any local variation of the bounded function m has no impact
on the asymptotic max field, as the following result reveals:
n
Theorem 6 Let M̃µN (A) be as defined in (4), with Yni −m(xni ) ∼ Pareto(α, 1)
independently for i = 1, . . . , n. Suppose Λ is a probability measure on the
Borel σ-algebra on Ω and that properties 1-3 of Theorem 3 hold.
Then (5) holds with gn (y) = y/n1/α , i.e.
n
M̃µN (A)/n1/α ⇒ MµΛ (A) = Λ(A)1/α Frechetα (A)
(21)
for all A with Λ(∂A) = 0. Moreover the argmax measure exists and is given
by
TµΛ = Λ.
(22)
19
In the notation, Frechetα (A) refers to a Frechet(α, 1, 0) distributed random
variable for any Borel set A ⊂ Ω, which is independent of Weibullα (B) for B
such that A ∩ B = ∅.
Proof. We begin by (21). Let A be a measurable set with Λ(∂A) = 0. Then,
n
if Fn,A is the distribution function of M̃µN (A)/n1/α we have
Q
1/α
log Fn,A (y) = log
P
(Y
+
m(x
)
≤
n
y)
n,i
n,i
xn,i ∈A
P
1/α
−α
=
log
1
−
(n
y
−
m(x
))
n,i
1≤i≤n,xn,i ∈A
n
(23)
P
(y−n−1/α m(xn,i ))−α
I(xn,i ∈A)
=
log
1
−
n
n
P1≤i≤n
=
f
(n,
i)h(n,
i)
1≤i≤n
As m is bounded, f (n, i) → −y −α uniformly over i. Therefore, we get
P
limn→∞ log Fn,A (y) = limn→∞ 1≤i≤n f (n, i)h(n, i)
P
I(x ∈A)
(24)
= −y −α limn→∞ 1≤i≤n n,in
−α
= −y Λ(A)
n
where the last step uses weak convergence of P N to Λ. After exponentiation
we recognize the right-hand side, as required, as the distribution function of
Λ(A)1/α Frechetα (A).
It remains to prove (22). To this end, we notice that pseudo argmax measure
of MµΛ equals
F (A; MµΛ ) = P Λ(A)1/α Frechetα (A) > Λ(Ac )1/α Frechetα (Ac )
(25)
= P (Λ(A)Frechet1 (A) > Λ(Ac )Frechet1 (Ac ))
= Λ(A),
where the last line follows from the properties of the Frechet distribution.
Indeed, if X, Y ∼ Frechet1 independently,
R∞
Exp(−Λ(A)y −1 )Exp (−Λ(Ac )y −1 ) dy
P (Λ(A)X > Λ(Ac )Y ) = 0 Λ(A)
yR2
∞
= Λ(A) 0 1/y 2 exp(−y −1 )dy
= Λ(A),
(26)
Λ
Nn m
Λ
Since F (·; Mµ ) = Λ, it follows from (21) that M̃µ → Mµ . Hence, by
Corollary 1, TµΛ = F (·; MµΛ ) = Λ exists.
2
5
Examples
We will investigate the accuracy of the asymptotic results for the homoscedastic regression model by simulation, generating n random points xi on R or
20
R2 according to some probability distribution Λ. Then we generate
Yi = m(xi ) + i ,
i = 1, . . . , n,
for some predefined function m, where {i }ni=1 are i.i.d. random variabels with
distribution H. We then return the max M = Yn:n = max1≤i≤n Yi and the
argmax X = x[n:n] . We repeat the procedure 10, 000 times and draw either
histograms or QQ-plots for X and/or M together with their theoretically
predicted densities. In the first two examples we consider exponential offers
with H ∼ Exp(1). Throughout the examples session, s will refer to the mean
of the exponential distribution.
0.0
0.2
0.4
0.6
0.8
Example 1 (Optimal exponential offers in one dimension.) We display
density plots of X for three one-dimensional examples, when Λ ∼ Weibull(2, 1)
√
and m(x) = x + 1 (Figure 1), and Λ ∼ LogN(0, 1) and m(x) = −x2 (Figure
2). In both cases our theoretical prediction (14) bears out.
2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 1: Histogram of argmax distribution when the sampling distribution
√
is Λ ∼ Weibull(2, 1), m(x) = x + 1, s = 1 and n = 1000. The solid curve
is the asymptotic density (14).
Example 2 (Commuting with exponential offers.) The second illustration is the commuting example that motivated this work, as discussed in
21
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.5
1.0
1.5
2.0
2.5
Figure 2: Histogram of argmax distribution when the sampling distribution
Λ is a standard lognormal distribution, m(x) = −x2 , s = 1 and n = 1000.
The solid curve is the asymptotic density (14).
the introduction. We sample from a uniform distribution over a disc Ω =
B100 (0, 0) ⊂ R2 of radius 100, i.e.
λ(x) =
with r = ||x|| =
p
1
1{||x||<100} ,
1002 π
x21 + x22 the Euclidean distance. We let
m(x) = −0.05 × ||x||
be a function that describes travel costs and record the distance ||X|| to the
origin of the best offer, see Figure 4. We note that the argmax density (14)
of X is
tΛµ (x) = Cλ(x) exp(−cr)1{r<100}
for c = 0.05 and a normalizing constant C. By integrating, we get a truncated
gamma density
2r exp(−cr)
1{0<r<100}
(27)
1002
for the distance to the best offer. We also plot the density of the best value M
in Figure 5, corresponding to the distribution MµΛ (Ω) in (9), which simplifies
to
Z 100
2re−cr
Gumbel log
dr , 1 ,
(28)
1002
0
f||X|| (r) = 2πrtΛµ (x) =
22
0.000
0.005
0.010
0.015
0.020
R
R 100
using the fact that Ω λ(x)em(x) dν(x) = C −1 = 0 2r exp(−cr)
dr. It is seen
1002
that the finite sample distributions of kXk and M are well approximated by
their asymptotic limits.
2
0
20
40
60
80
100
r
Figure 3: Histogram of distance to origin r for argmax of a uniform sample
on B100 (0, 0) when m(x) = −0.05 × r, s = 1 and n = 1000. The solid curve
is the asymptotic truncated gamma density (27).
Example 3 (General offer distributions.) We will now consider more
general error distributions H. In particular, we will contrast the behavior
when H is light-tailed and heavy-tailed respectively. In both cases we let
Ω = [0, 1] and Λ the uniform distribution on [0, 1]. For the light-tailed case
we consider a uniform H ∼ U (0, s) for various choices of s and
m(x) = 1{x∈[0.5,1]} .
(29)
According to the theory for light-tailed distribution, the limiting argmax distribution (17) should be uniformly distributed on [0.5, 1].
For the heavy-tailed case we consider H ∼ Pareto(α, 1) and m(x) = 0.5x.
In this case the theory (22) predicts that despite the varying m, the limiting
argmax distribution should equal Λ ∼ U (0, 1).
The results are displayed in Figures 5 and 6, with different plots for different
n. The various colors show different parameters of the H-distribution and
23
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-4
-2
0
2
4
6
Figure 4: Histogram of best value M on B100 (0, 0), when m(x) = −0.05×||x||,
s = 1 and n = 1000. The solid curve is the asymptotic density corresponding
to (28).
illustrate that the rate of convergence depends negatively on the spread of the
distribution H.
2
Example 4 (A counterxample.) Let Ω = {0, 1}, µ(0) ∼ U (−1, 0), µ(1) ∼
U (0, 1) and Λ = δ0 , the point mass at 0. In this case an argmax distribution
TµΛ does not exist. Indeed, consider two different sequences of points N n and
N̄n , with empirical distributions
n
P N = 1 − √1n δ0 + √1n δ1 ,
P N̄
n
= δ0
respectively that both converge weakly to Λ. However, it is easy to see, either
n
n
directly, or through max fields, that T̃µN = δ1 and T̃µN̄ = δ0 for all n. Hence,
according to Definition 5, TµΛ does not exist. The problem arises since µ(0)
and µ(1) have disjoint supports. More generally, it suffices that µ(0) and
µ(1) have different supports to the right for Definition 5 to fail.
2
24
0.8
0.6
0.2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
Theoretical quantile
n=500
n=5000
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
Data quantile
0.8
1.0
Theoretical quantile
1.0
0.0
Data quantile
0.4
Data quantile
0.6
0.4
0.0
0.2
Data quantile
0.8
1.0
n=100
1.0
n=10
0.0
0.2
0.4
0.6
0.8
1.0
0.0
Theoretical quantile
0.2
0.4
0.6
0.8
1.0
Theoretical quantile
Figure 5: QQ-plots for varying n when Λ ∼ U [0, 1], m(x) is given by (29)
and H ∼ U [0, s], with s = 1 thick, s = 2 dashed, and s = 5 dotted. The
asymptotic argmax distribution, given by (17), equals U (0.5, 1).
6
Discussion and Extensions
In this paper we set out to define and prove limit results about the concept of
an argmax measure over a continuous index of probability distributions. A
reasonable definition has been provided, and we have expanded the toolbox
available to address these types of problems by introducing the max-field
25
0.8
0.6
0.2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
Theoretical quantile
n=500
n=5000
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
Data quantile
0.8
1.0
Theoretical quantile
1.0
0.0
Data quantile
0.4
Data quantile
0.6
0.4
0.0
0.2
Data quantile
0.8
1.0
n=100
1.0
n=10
0.0
0.2
0.4
0.6
0.8
1.0
0.0
Theoretical quantile
0.2
0.4
0.6
0.8
1.0
Theoretical quantile
Figure 6: QQ-plots for varying n when Λ ∼ U [0, 1], m(x) = 0.5x and H ∼
Pareto(α, 1), with α = 0.5 thick, α = 1 dashed, and α = 5 dotted. The
asymptotic argmax distribution (22) thus equals U (0, 1).
concept. The usefulness of the developed method is shown when applied to a
regression model with homoscedastic error terms. We found that the limiting
argmax distribution is nontrivial for exponential white noise, which provides
a borderline between more light- and heavy distributions.
There are plenty of potential generalizations and extensions of the theory
available on the basis of the work done in this paper, as discussed in Malm26
berg (2012). Firstly, it is possible to construct a theory where the locations
xni of offers are not deterministic, but rather allow there to be stochasticity
in the selection of points, leading to a doubly stochastic problem. When
{xni }ni=1 is a point process, this yields an argmax theory of marked point
process as the intensity of the underlying point process tends to infinity. In
particular, when (xni , Yni ) = (xi , Yi ) is an i.i.d. sequence of pairs of random
variables, the argmax distribution for a sample of size n is the concomitant
of the extreme order statistic among Y1 , . . . , Yn .
Secondly, it is of interest to derive explicit argmax limits for other measure
indeces µ than homoscedastic regression models. Generally, if all {µ(x)}x∈Ω
are similar enough, their difference will asymptotically have no impact, so
that TµΛ = Λ, as in (22). On the other hand, if {µ(x)}x∈Ω differ a lot and can
be linearly stochastically ordered, only the stochastically largest distributions
will contribute to the limiting argmax distribution, i.e. TµΛ (A) = Λmax , a
measure supported on the set
Ωmax = {x ∈ Ω; µ(x) stochastically largest},
as in (17). The challenge is to find other non-trivial argmax distributions
between these two extremes. One such argmax distribution is provided by
(12). Another example is derived from a mixture class
µ(x) = (1 − p(x))U (−1, 0) + p(x)U (0, 1)
(30)
of probability measures, where p : Ω → [0, 1] gives the location dependent
mixture between two uniform distributions. It can be seen, when {xni }ni=1 is
an i.i.d. sample from Λ, that bn = 1, an = 1/n and gn (y) = n(y − 1) gives a
max field
Z
p(x)Λ(dx))−1
MµΛ (A) ∼ −Exp (
A
and argmax law
TµΛ (A)
Z
=C
p(x)Λ(dx)
A
that is a weighted distribution with weight function p, cf. Patil (2002).
Thirdly, it would be interesting to generalize the point process approach
described in Chapter 4 of Resnick (2008) for d = 1 and stationary mark
distributions (µ(x) ≡ H for some H). This entails
P establishing weak convergence of the sequence of point processes ξn = Pni=1 δ(xni ,gn (Yni )) as n → ∞ to
an appropriate Poisson Random Measure ξ = ∞
i=1 δ(xi ,Yi ) on Ω × R. Once
this is done, the limiting max field and argmax distributions are
MµΛ (A) = max Yi .
i;xi ∈A
27
and
TµΛ (A) = P(X ∈ A),
respectively, with X = arg maxxi Yi .
Fourthly, the max fields are related to extremal processes, as described for
instance in Chapter 4 of Resnick (2008). Indeed, when d = 1 and Ω = (0, 1],
we may define
MΛµ (t) = MµΛ ((0, t])
and
Nn
n
(t)
=
g
((0,
t])
M̃0N
M̃
n
µ
µ
for 0 < t ≤ 1. Suppose offer locations are equispaced (xni = i/n), marks
stationary (µ(x) ≡ H) and gn (y) = (y − bn )/an , with an and bn the normalizing constants of Theorem 2 when Yi ∼ H. Then MΛµ is an extremal
n
⇒ MΛµ can
process generated by H and functional weak convergence M̃0N
µ
be established on D(0, 1], the space of right continuous functions on (0, 1]
with left hand limits, embedded with the Skorohood topology. Theorem 1
can be viewed as an analogous (marginal) convergence result for max fields
corresponding to more general sampling dimensions d, sampling distributions
Λ and possibly nonstationary measure indeces with varying µ(x).
Fifthly, it is possible to allow for dependent offers. For instance, one may
consider a triangular array
Yni = m(xni ) + Z(xni ) + ni ,
i = 1, . . . , n,
of offers, with m a deterministic mean function, Z : Ω → R a zero mean
random field and {ni }ni=1 zero or constant mean white noise. Such models
are are frequently encountered in spatial statistics (Cressie, 1993). In this
paper we have focused on models with Z ≡ 0 and homoscedastic error terms
ni ∼ H, although (30) is another possible choice, with m(x) = p(x) − 1/2
and heteroscedasticity, since the variance of (30) depends on x. Conversely,
Z 6= 0 and ni ≡ 0 leads to models with no nugget effect and an argmax
theory or random fields, since, as n → ∞, the distribution of the argmax Xn
should be close to that of
X∞ := arg max (m(x) + Z(x)) ,
x∈Ω
provided supp(Λ) = Ω and that m + Z is sufficiently regular (for instance
continuous), with a unique maximum almost surely.
28
Acknowledgements
The authors wish to thank Dmitrii Silvestrov for reading through several versions of this manuscript and for providing valuable comments. Ola Hössjer’s
research was financially supported by the Swedish Research Council, contract
nr. 621-2008-4946, and the Gustafsson Foundation for Research in Natural
Sciences and Medicine.
References
[1] Ben-Akiva, M., N. Litinas and K. Tsunokawa (1985): Continuous spatial
choice: The continuous logit model and distributions of trips and urban
densities. Transportation Research, A19, 119-145.
[2] Billingsley, P. (1999). Convergence of Probability Measures, 2nd ed., Wiley, Hoboken, NJ.
[3] Cox, D.R. and Isham, V. (1980). Point processes, Chapman and Hall,
London.
[4] Cressie, N. (1993). Statistics for spatial data, revised version, Wiley,
New York.
[5] Dagsvik, J. K. (1994): Discrete and continuous choice, max-stable processes and Independence from Irrelevant Attributes. Econometrica, 62,
1179-1205.
[6] Diggle, P. (2003). Statistical Analysis of Spatial Point Patterns, 2nd ed.,
Arnold, London.
[7] Fisher, R.A. and Tippett, L.H.C. (1928). Limiting forms of the frequency
distributions of largest or smallest member of a sample. Proc. Cambridge
Philos. Soc., 24, 180-190.
[8] Gnedenko, B.V. (1943). Sur la distribution limité du terme d’une série
aléatoire. Ann. Math. 44, 423-453.
[9] Gumbel, E.J. (2004). Statistics of Extremes (new edition), Courier Dover
Publications.
[10] Horowitz, J. (1980). Extreme values from a non-stationary stochastic
processes: an application to air quality analysis. process:. Technometrics
22(4), 469-482.
29
[11] Hüsler, J. (1986). Extreme values of non-stationary random sequences.
J. Appl. Prob. 23, 937-950.
[12] Jacobsen, M. (2006). Point process theory and applications. Marked
point and piecewise deterministic processes. Birkhäuser, Boston.
[13] Khoshnevisan, D. (2002). Multiparameter Processes: An Introduction to
Random Fields, Springer.
[14] Leadbetter, M. R., Lindgren, G. & Rootzen, H. (1983). Extremes and
Related Properties of Random Sequences and Processes, Springer Verlag.
[15] Ledford, A.W. and Tawn, J.A. (1998) Concomitant tail behaviour for
extremes. Adv. Appl. Prob., 30(1), 197-215
[16] Malmberg, H. (2011). Spatial Choice Processes and the Gamma Distribution, Bachelor thesis 2011:3, Division of Mathematical Statistics,
Stockholm University.
[17] Malmberg, H. (2012). Argmax over Continuous Indeces of Random Variables - An Approach Using Random Fields, Master thesis 2012:2, Division of Mathematical Statistics, Stockholm University.
[18] Manski, C.F. and McFadden, D. (1981). Structural Analysis of Discrete Data with Econometric Applications, Section III, MIT Press,
http://elsa.berkeley.edu/ mcfadden/discrete.html.
[19] Marley, A.A.J. and Colonius, H. (1992). The “Horse race” random utility
model for choice probabilities and reaction times, and its competing risks
interpretation. J. Math. Psychology 36, 1-20.
[20] Mattsson, L-G., Weibull, J.W. and Lindberg, P-O. (2011). Extreme
values, invariance and choice probabilities. Manuscript.
[21] Patil, G. (2002). Weighted distributions. In Encyclopedia of Environmetrics 4, 2369-2377.
[22] Resnick, S. and R. Roy (1991): Random USC functions, max-stable
processes and continuous choice. Ann. Appl. Prob., 1, 267-292.
[23] Resnick, S.I. (2008). Extreme values, regular variation and point processes, Springer, New York.
[24] Robert, C.Y. (2013). Some new classes of stationary max-stable random
fields, Statistics & Probability Letters. 86(6), 1496-1503.
30
[25] Weissman, I. (1975). Extremal processes generated by independent nonidentically distributed random variables, Ann. Prob. 3(1), 172-177.
31
Extremal Behaviour, Weak Convergence and
Argmax Theory for a Class of Non-Stationary
Marked Point Processes
Hannes Malmberg∗
Ola Hössjer†
May 13, 2013
Abstract
We formulate a random utility model where we choose from n options 1, · · · , n. The options have associated independent and identically distributed (i.i.d) random variables {Xi , Ui }ni=1 , where Xi are
the characteristics of option i and Ui is its associated utility.
We use the connection between point processes and extreme value
theory to analyze the statistical properties of choice characteristics X
of the object with the highest utility as n → ∞. We derive analytic
expressions of the asymptotic distribution of choice characteristics for
a range of distributional assumptions on the utilities Ui .
In our discussion section, we suggest an extension of our method to
allow us to further relax our distributional assumptions. We also show
how our theoretical model can be used to explain empirical patterns
relating to commuting time distributions.
∗
†
Department of Mathematics, Div. of Mathematical Statistics, Stockholm University.
Department of Mathematics, Div. of Mathematical Statistics, Stockholm University.
1
1
Introduction
This paper deals with statistical models of choice behavior. First, it shows
that if we model choice characteristics as stochastic, random utility models
from economics can be understood in terms of the statistical theory of concomitant statistics of extremes. Secondly, the paper makes a contribution to
the analysis of the concomitant of extremes by showing how point process
theory can be used to derive tractable results for a range of distributional
assumptions.
The economic and psychological theory of choice was initiated by Luce (1959)
who posited a number of axioms for probabilistic choice, from which he derived the logit model for choice probabilities. The axiomatic approach was
later partially subsumed under an approach based on utility maximization
with unobservable characteristics/preferences (McFadden, 1980). In this literature, subjects are assumed to value choice options according to
Ui = h(xi ) + εi ,
(1)
where xi is a vector of (non-random) choice characteristics of option i =
1, · · · , n0 . It can be shown that in this model, the probability of selecting
P 0 h(xj )
e
if the εi ’s are Gumbel distributed. This
alternative i is eh(xi ) / nj=1
approach is called the random utility approach to probabilistic choice and
has been extended to more functional forms, distributional assumptions and
applications since McFadden’s initial contribution (Ben-Akiva and Lerman,
1985, Anderson et al., 1992, Train, 2009).
Mathematically, random utility theory is closely related to the theory of concomitants of extreme order statistics (David and Galambos, 1974, Nagaraja
and David, 1994, Ledford and Tawn, 1998). This theory deals with the
asymptotic behavior of the object
X[n:n] = XIn
where (X1 , U1 ), · · · (Xn , Un ) is a sequence of i.i.d. random variables where the
Ui ’s are real-valued, the Xi ’s belong to a general space, and In = arg max1≤i≤n Ui .
The main difference difference from (1) is that not only Ui , but also Xi , is
random.
In this paper we use the theory of point process to analyze concomitant
extreme order statistics. For more information on point processes in general,
2
see for example Cox and Isham (1980) and Jacobsen (2005). In particular,
we modify the methodologies presented in Resnick (2007), where the general
connection between point processes and extreme value theory is analyzed.
We focus on the problem for a range of specifications of the distribution of
U |X = x. We treat the problem when X[n:n] converges weakly to a specified
non-degenerate distribution (not all mass at one point mass or infinity), but
we also show in the discussion section how we can use the same theory to
analyze the convergence rates to different types of degenerate distributions.
By extending the theory of concomitants, the results in this paper provide
a framework for looking at random utility models in the limiting case when
the number of alternatives tends to infinity.
The paper is similar in aim to Malmberg (2012) and Malmberg and Hössjer
(2012). However, those papers used asymptotic properties of deterministic point process, and analyzed the problem using continuity properties of
random fields. The novel approach in this paper is to instead use point process theory to analyze the question, and this method turns out to allow a
simplification of the theory compared to our previous papers.
2
Model
Consider a sequence of independent and identically distributed pairs of rand
dom variables {(Xi , Ui )}∞
i=1 , where Xi ∈ Ω ⊆ R and Ui ∈ R. We define Un:i
as the ith order statistic of {U1 , · · · , Un }. For each n, we define the location
X[n:i] to be the X-value associated with Un:i for a sample of size n.
As mentioned in Section 1, we can think of Ui as the utility of alternative
i and Xi as its observable characteristics. We are interested in the limiting
properties of the optimal choice, and thus we study the asymptotic behavior
of the sequence of probability measures
Cn (·) = P X[n:n] ∈ · .
(2)
We will represent the distribution of (X, Y ) as
Z
P ((X, U ) ∈ A × B) =
µ(x; B)dΛ(x),
A
where FX = Λ is the marginal distribution of X over Ω, and µ(x; ·) is the
conditional probability measure of Ui given Xi = x. We make the following
assumption on µ:
3
Assumption 1 For the collection µ = {µ(x; ·); x ∈ Ω}, there exists a function
p : Ω → (0, ∞),
and a one-dimensional family of probability measures {Q(s; ·) : s ∈ R, s > 0}
with such that
µ(x; ·) = Q(p(x); ·).
Q(s; ·) is monotonic, i.e. Q(t; ·) stochastically dominates Q(s; ·) whenever
t > s. Furthermore, there exist sequences an , bn , independent of x, and a
distribution function Gα with α ∈ R, such that
Q(s; (−∞,
u − bn n
)) → Gα (u)s
an
(3)
as n → ∞, where Gα is a distribution function of one of the following three
forms:

{I(u<0)}

, α < 0,
 exp(−(−u)−α )
Gα (u) =
exp(− exp(−u)),
α = 0,

 I(u > 0) exp(−u−α ),
α > 0,
and I(·) is the indicator function.
In effect, our assumption is an assertion that all µ(x; ·) belong to the same
extreme value family α, and that their relative size can be described by the
one dimensional parameter p(x).
2.1
Method
The sequence {(Xi , Ui )}ni=1 may be viewed as a random collection of points in
Ω × R, and described as a sequence of point processes ξn . We will show that
after a suitable transformation, this sequence of point processes ξn converges
to a Poisson point process ξ in a sense which will be specified later. As
Cn (A) = P (X[n:n] ∈ A) = P sup Ui > sup Ui
i:Xi ∈A
i:Xi ∈A
/
is a functional on our point process ξn , the problem reduces to determining
whether this functional is continuous. In this case, we can use the limiting
point process ξ to calculate our results.
We will start with an introduction to point processes – in particular sufficient
conditions for convergence. After this, we will apply the point process machinery to our setup, and characterize the limit of our point process. Once
4
this is done, we will define random fields taking point processes as inputs,
and derive the asymptotic behavior of Cn from continuity properties of these
random fields.
3
3.1
Extremal Point Process Convergence
Background on Point Processes and Convergence
Results
This section contains background results and a notational machinery for point
processes. See Chapter 3 of Resnick (2007) for a more detailed treatment.
Throughout this discussion, the generic point process will take values in a
set E, with an associated σ-algebra E. For the purpose of our discussion, we
will take E to be a subset of a d + 1-dimensional Euclidean space with the
associated Borel σ-algebra B(Ω). A point mass is a set function, defined by
1
if z ∈ F
δz (F ) =
,
0
if z ∈
/F
where F ⊆ E, F ∈ E. A point measure is a measure m(·) such that there
exists a countable collections of points {zk } and numbers {wk } ≥ 0, such
that
X
m(·) =
wk δzk (·).
zk
We will confine our attention to the case wk ≡ 1.
Let MP (E) be the set of point measures on E, and let it have the minimal
σ-algebra which makes
{m ∈ MP (E) : m(F ) ∈ B}
measurable for all F ∈ E, B ⊆ B(R) where m(F ) is the point measure m
evaluated at the set F and B(R) is the Borel σ-algebra on R. We define a
point process to be a probability distribution over MP (E).
If N is an arbitrary point process, we define the Laplace transform ψ associated with N as
(
) Z
(
)
X
X
f (z) =
exp −
.f (z) dP.
ψN (f ) = E exp −
N 0 ∈M+
P (E)
z∈N
5
z∈N 0
Here P is a probability measure over the set MP (E). Moreover, the class
of functions f for which we are interested in ψN is usually the continuous
+
non-negative functions on E with a compact support. We write CK
(E) to
denote this set.
Definition 1 If we have a sequence of point processes Nn , n ≥ 0, we say
that Nn converges weakly to N0 , written Nn ⇒p N0 , if
ψNn (f ) → ψN0 (f )
+
for all f ∈ CK
(E).
We use the notation =⇒ for weak convergence of vector valued random variables in Euclidean space, in contrast to ⇒p for point process convergence.
Definition 2 Let X be a metric space. We call F ⊆ X relatively compact if
its closure F̄ in X is compact.
Definition 3 Let µ be a measure on a metric space X. We say that a sequence of measures µn converges vaguely to µ, written
µn ⇒ v µ
if
µn (F ) → µ(F )
for all relatively compact F with µ(∂F ) = 0, where ∂F is the boundary of the
set F .
Definition 4 A Poisson random measure N on E with intensity measure µ
is a point process with Laplace functional
−f (x) dµ(x)
)
ΨN (f ) = e− E (1−e
R
The following two results are known from point process theory (see, for example Resnick, 2007).
Proposition 1 Definition 4 uniquely defines a point process N . This point
process has the property that for any F ∈ E, and any non-negative integer k,
we have
−µ(F )
e
(µ(F ))k /k!
if µ(F ) < ∞
P (N (F ) = k) =
,
0
if µ(F ) = ∞
6
and that for any k ≥ 1, if F1 , · · · , Fk are mutually disjoint sets in E, then
{N (Fi )} are independent random variables.
Proposition 2 For each n, suppose {Zn,j : 1 ≤ j ≤ n} are i.i.d. random
variables and that
nP (Zn,1 ∈ ·) ⇒v µ.
Then
Nn =
n
X
δZn,j ⇒p N
i=1
where N is a Poisson random measure on E with intensity µ.
3.2
Point Process Convergence in our Setup
We will consider a sequence of transformations
gn (u) = (u − bn )/an
of offer values, where gn is chosen to ensure extreme value convergence for
all x as in Assumption 1.
Let δ(x,u) denote a one point distribution at (x, u) and define the extremal
marked point process (cf. Resnick 2007)
ξn =
n
X
δ(Xi ,gn (Ui ))
(4)
i=1
for a sample of size n. This is a random measure on (Ω × R, B(Ω × R)).
Before stating our theorem, we prove a preliminary lemma on boundary sets
of product spaces.
Lemma 1 If (X × U, Λ × ν) is a product measure space of two metric spaces,
and if F ⊆ X × U satisfies
(Λ × ν)(∂F ) = 0,
then
ν(∂Fx ) = 0
Λ − a.e.
where Fx = {u ∈ U : (x, u) ∈ F } is the cross-section of F at the point x.
7
Proof. We note that if we write
X × ∂FX = {(x, u) ∈ X × U : u ∈ ∂Fx },
we have
X × ∂FX ⊆ ∂F
(as each ball around a point (x, u) ∈ X × ∂FX contains both a point within
and outside F ). Thus, as
Z
(Λ × ν)(X × ∂FX ) =
ν(∂Fx )dΛ(x) ≤ Λ(∂F ) = 0
X
we get that ν(δFx ) = 0 Λ-almost everywhere.
2
We can now formulate our main result:
Theorem 1 Let Gα and p be as in Assumption 1. Suppose that the image
of every compact set under p : Ω → (0, ∞) is bounded. Then, as n → ∞, it
holds that
ξn ⇒p ξ,
where ξn is given by (4), and ξ is a Poisson Random Measure on (Ω ×
R, B(Ω × R)) with mean intensity Λp × να , where
Z
p(x)Λ(dx)
Λp (A) =
A
for all A ∈ B(Ω) and

−α
 I(u < 0)(−u) , if α < 0 and u < 0 ,
να ([u, ∞)) = − log(Gα (u)) =
exp(−u),
if α = 0,
 −α
u ,
if α > 0 and u > 0.
This theorem is similar to Proposition 3.21 in Resnick’s book. The difference
P
is that he considers a sequence of point processes ξn =
j=1 δ(jn−1 ,gn (Xj ))
where {Xj } is a sequence of independent and identically distributed random
variables. Thus, the difference is that we model the first coordinate as a
random variable, and let the distribution of the second coordinate depend on
this first coordinate. This creates some technical issues, which however turn
out not to affect the main result.
8
Proof. Before starting, we note that we have Gα (u) = 0 for α > 0 and
u ≤ 0. Whenever α > 0, it is implicit in the proof that u > 0. Using the
proof of Proposition 2, it suffices to show that
nP ((X1 , gn (U1 )) ∈ ·) ⇒v Λp × να ,
i.e. that
nP ((X1 , gn (U1 )) ∈ F ) → (Λp × να )(F ),
for all F ⊆ Ω × R which are relatively compact sets with respect to B(Ω × R)
and satisfy
(Λp × να )(∂F ) = 0.
Henceforth, let F be an arbitrary set with these properties. Now, we note
that
Z
nP ((X1 , gn (U1 )) ∈ F ) =
nP (gn (U1 ) ∈ Fx |X1 = x)dΛ(x),
Ω
where Fx is the x-cross section of F . Thus, our task is to show that
Z
Z
p(x)να (Fx )dΛ(x).
nP (gn (U1 ) ∈ Fx |X1 = x)dΛ(x) →
Ω
Ω
We do this first by showing that the integrand converges almost everywhere
to the desired quantity, and then we show that the sequence of integrands
satisfy regularity conditions allowing us to infer convergence of integrals from
pointwise convergence.
We observe that for every x,
nP (gn (Y1 ) ∈ ·|X1 = x) ⇒v p(x)να (·).
(5)
Indeed, it is true that if
xnn → a,
(6)
n(1 − xn ) → − log(a).
(7)
we have
Thus, by the reasoning above and Assumption (1), we have
nP (gn (U1 ) ≥ u|X1 = x) → −p(x) log(Gα (u)) = p(x)να ([u, ∞)) .
(8)
In order to deduce (5) from (8), we can note that if we have a measure γ
with
γ([u, ∞)) < +∞
9
for some u, then vague convergence of γn to γ is equivalent to
γn ([u, ∞)) → γ([u, ∞)),
(9)
for all u such that γ({u}) = 0. This can be seen by noting that if (9) is
true, then the sequence Pnu (·) = γn (· ∩ [u, ∞))/γn ([u, ∞)) of probability
measures converges weakly for all continuity points u of γ([u, ∞)) to Pu (·) =
γ(· ∩ [u, ∞))/γ[u, ∞)), and hence Pnu (F ) → Pu (F ) for all such u, from which
(5) follows.
Now, using Lemma 1, we know that
να (∂Fx ) = 0 Λp − a.e.
which means that
p(x)να (∂Fx ) = 0 Λp − a.e
as p(x) > 0 implies that p(x)να and να are equivalent for all x ∈ Ω. Thus,
we can use (5) to conclude that
nP (gn (U1 ) ∈ Fx |X1 = x) → p(x)να (Fx ) Λp − a.e.
Therefore, we have established pointwise convergence of the integrand almost
everywhere.
Now, we seek to show that nP (gn (U1 ) ∈ Fx |X1 = x) is uniformly bounded
over n and Ω to ensure that pointwise convergence almost everywhere implies
convergence in integrals. To do so, we try to define a maximal random
variable which dominates nP (gn (U1 ) ∈ Fx |X1 = x) for all n and x.
This works as p(x) indexes the distributions by stochastic dominance. We
write
πΩ : (x, u) 7→ x
and
πU : (x, u) 7→ u
for the projection on Ω and R respectively. In this case, we know that πΩ (F )
and πU (F ) are relatively compact sets of Ω and R respectively, and we define
p̄ = sup p(x).
x∈πΩ (F )
10
We can now define the maximum random variable as having the law
Ū (F ) ∼ Qα (p̄; ·).
By the monotonicity assumption of Qα made in Assumption (1), we know
that Ū (F ) stochastically dominates U1 |X1 = x for all x ∈ πΩ (F ).
Furthermore, we can define u as the smallest u-value attained on the whole
set πU (F ), which again is finite by the assumption of F being relatively
compact. Combining these two definitions gives us
nP (gn (U1 ) ∈ Fx |X1 = x) ≤
≤
=
→
<
nP (gn (U1 ) ≥ u|X1 = x)
nP (gn (Ū (F )) ≥ u|X1 = x)
nP (gn (Ū (F )) ≥ u)
maxx∈Px (F ) p(x)να ([u, ∞))
+∞
which means that nP (gn (U1 ) ∈ Fx |X1 = x) is uniformly bounded. Using the
bounded convergence theorem, we get
R
nP ((X1 , gn (U1 )) ∈ F ) = Ω nP (gn (U1 ) ∈ Fx |X1 = x)dΛ(x)
R
→ Ω να (Fx )p(x)dΛ(x)
= (Λp × να )(F )
2
which completes the proof.
4
Convergence of Functionals of Random Fields
Recall that our task is to study the limiting behavior of Cn as defined in (2).
The key to connect this limit to point processes is the observation that as as
gn is strictly increasing for all n, we have:
Cn (A) = P (X[n:n] ∈ A)
= P (Mξn (A) > Mξn (A))
where Mξn is the random field defined as
Mξn (A) = max gn (Ui ),
Xi ∈A
1≤i≤n
A ∈ B(Ω),
where B(Ω) is the Borel sigma algebra over Ω, and ξn is the point process from
(4). This formulation of the argmax-measure Cn in terms of random fields
11
defined over point processes allows us to generalize the notion of argmax to
the limiting case where the number of offers goes to infinity. We will study
the limiting behaviour of finite dimensional distributions of Mξn and this will
allow us to calculate the limit of Cn .
Write
ξ=
∞
X
δ(Xi∞ ,Ui∞ )
i=1
for a realization of the limiting Poisson point process ξ derived in Theorem
1. We then can define,
Mξ (A) = max
Yi∞ ,
∞
i;Xi ∈A
and
C(A) = P (Mξ (A) > Mξ (Ac )).
Proposition 3 If Λp (Ω) < ∞, we have
C(A) = Λp (A)/Λp (Ω).
Proof: Suppose first that Λp (Ac ) = 0 or Λp (A) = 0. In this case, it is clear
that we have C(A) = 1 or C(A) = 0 respectively as required by the formula
for A ∈ B(Ω). Indeed, using the convention that the supremum of an empty
set is minus infinity, if Λp (A) = 0, then Mξ (A) = −∞ almost surely. As
Mξ (Ac ) > −∞ almost surely, we will get C(A) = 0. A similar reasoning
applies to Ac .
Furthermore, since ξ is a Poisson random measure with mean measure Λp ×
να , we note that if Λp (Ω) < ∞ we have that Mξ (A) and Mξ (Ac ) are two
independent, proper random variables with
P (Mξ (A) ≤ y) = P (ξ(A × [y, ∞)) = 0) = e−Λp (A)να ([y,∞))
c
c
P (Mξ (A ) ≤ y) = P (ξ(A × [y, ∞)) = 0) = e
−Λp (Ac )να ([y,∞))
(10)
.
(11)
Using standard results from proportional hazards theory (Cox and Oakes,
1984, Fleming and Harrington, 1991), we get that
P (Mξ (A) > Mξ (Ac )) =
Λp (A)
= Λp (A)/Λp (Ω)
Λp (A) + Λp (Ac )
2
and our proof is complete.
12
From this result, we automatically get that C is a probability measure as it
is a normalized version of Λp which is a finite measure.
In order to prove weak convegence of Cn , we need some additional results
and notation. We will use that
ν1 µ1 and ν2 << µ2 ⇒ ν1 × ν2 µ1 × µ2 ,
(12)
where means "absolutely continuous with respect to".
We will also use that if ξn are point processes, ξ is a Poisson process, and
ξn ⇒p ξ,
then
P (ξn (F ) = 0) → P (ξ(F ) = 0)
(13)
for all F ∈ E with µ(∂F ) = 0, where µ is the intensity measure of ξ.
Lastly, we recall that if Sn is any sequence of random variables taking values
in Rk , we have that Sn ⇒ S if and only if
GSn (s1 , · · · , sk ) → GS (s1 , · · · , sk )
(14)
for all points of continuity of GS where GS denotes the distribution function
of the random variable S.
Theorem 2 If Λp (Ω) < ∞, we have
Cn (·) ⇒ C(·) =
Λp (·)
.
Λp (Ω)
(15)
Proof: Assume we have A with C(∂A) = 0. We aim to prove that Cn (A) →
C(A). By Proposition 3, C and Λp are equivalent, and we have Λp (∂A) = 0.
Noting that the result is clearly true whenever Λp (A) = 0 or Λp (Ac ) = 0, we
can assume that both are different from 0. By (10) and (11), this means that
(Mξ (A), Mξ (Ac )) is a proper random variable on R2 , and we will show that
(Mξn (A), Mξn (Ac )) jointly converge weakly to this random variable. Indeed,
consider
P (Mξn (A) ≤ x1 , Mξn (Ac ) ≤ x2 ) =
→
=
=
P (ξn (A × (x1 , ∞) ∪ Ac × (x2 , ∞)) = 0)
P (ξ(A × (x1 , ∞) ∪ Ac × (x2 , ∞)) = 0)
P (Mξ (A) ≤ x1 , Mξ (Ac ) ≤ x2 )
FMξ (A),Mξ (Ac ) (x1 , x2 ).
13
The convergence step uses (13) and that
∂ (A × (x1 , ∞) ∪ Ac × (x2 , ∞)) ⊂ ∂A × (min(x1 , x2 ), ∞) = F
and we have (Λp × να )(F ) = 0 as Λp (∂A) = 0, where Λp × να is the intensity
measure of ξ.
Hence, it follows from (14) that
(Mξn (A), Mξn (Ac )) ⇒ (Mξ (A), Mξ (Ac )).
Defining
D = {(a, b) ∈ R2 : a > b}
and using (12) , with ν1 ∼ Mξ (A), ν2 ∼ Mξ (Ac ), and µ1 , µ2 Lebesgue measure
in R, to conclude that.
P ((Mξ (A)Mξ (Ac )) ∈ ∂D) = 0
we get
Cn (A) =
=
→
=
P (Mξn (A) > Mξn (Ac ))
P ((Mξn (A), Mξn (Ac )) ∈ D)
P ((Mξ (A)Mξ (Ac )) ∈ D)
C(A)
2
and the proof is complete.
5
Examples
Here we provide some examples to illustrate our theory.
Example 1 (Exponential and mixture models.) A class of distributions
that satisfy Assumption 1 are
 −1/α

P
(2
×
1
−
1)(1
−
V
)
∈
·
α < 0,

{V1 <p(x)}
2

µα (x; ·) ∼
P (log(p(x)/V
α = 0,
1 ∈ ·) ,


−1/α
 P (2 × 1{V <p(x)} − 1)V
∈· ,
α > 0,
2
1
where V1 , V2 ∼ U (0, 1) are two independent and uniformly distributed random variables on (0, 1). A bit less formal, we may write

 −(1 − p(x))Beta(1, −α) + p(x)Beta(1, −α) α < 0,
µα (x) ∼
Exp(log(p(x)), 1),
α = 0,

−(1 − p(x))Pareto(α, 1) + p(x)Pareto(α, 1), α > 0,
14
where Beta(a, b) refers to a Beta distribution with density Cxa−1 (1−x)b−1 on
(0, 1), Exp(a, b) is a shifted exponential distribution with location paramter
a and scale parameter b, having distribution function 1 − e−(x−a)/b for x ≥
a, Pareto(α, b) is a Pareto distribution with shape parameter α and scale
parameter b, corresponding to a distribution function 1 − (x/b)−α for x ≥ b.
We have chosen the parameter α for the distributions µα in a way so that
they lie in the domain of convergence of Gα in (1).
Example 2 (An example from the commuting literature) Focusing on
α = 0 in the previous example, we have an interesting special case. Suppose
that the population is distributed uniformly on B(0, R), a disk in R2 . The
utility associated with each point is
U |X = x ∼ Exp(−c||x||, 1),
where ||x|| is the Euclidean distance from the origin. This is a good benchmark model for commuting choices. In this case, Λ has a uniform distribution
on B(0, R), and p(x) = Exp(−c||x||). Thus, we get
R −c||x||
e
dx
C(A) = R A
.
−c||x|| dx
e
B(0,R)
The particular direction of commuting is often not as interesting as the distribution of distances. The probability that we commute less than r is given
by
R r −cs
se ds
C({x : ||x|| ≤ r) = R 0R
,
−cs ds
se
0
which we recognize as a truncated Gamma(1, 1)-distribution.
There is suggestive evidence that commuting patterns follow a gamma distribution over short distances. We provide an example in Figure 1 with a
histogram over commuting distances with a super-imposed gamma distribution with parameters provided by moment fitting. The moment-fitted density
provides a reasonable fit for the left half of the data.
Example 3 (The logit model: a special case) Let Λ be uniformly distributed on the finite support {x1 , .., xn0 }. Let utilities be given by
Uj |Xj = Exp(−ckXj k, 1)
15
(16)
Figure 1: Histogram over commuting distances in Kungsholmen, Stockholm
0.00015
0.00010
0.00005
0.00000
Density
0.00020
0.00025
Histogram of avstkort
0
2000
4000
6000
avstkort
16
8000
10000
This corresponds to p(xi ) = eh(xi ) and we get
eh(xi )
C({xi }) = Pn0 h(xj )
j=1 e
just as in the logit model (1). We interpret the offers in (1) as standardized
maximal offers:
max
Uj − log(n)
(17)
j:1≤j≤n,Xj =xi
derived from (16) as n → ∞. From extreme value theory, we deduce that
(17) has an asymptotic Gumbel distribution plus h(xi ), and this provides
additional justification of (1).
6
6.1
Discussion
Mathematical extensions
We have derived a way to calculate the asymptotic behavior of Cn = X[n:n] ,
and have done so for a number of assumptions on the joint distribution of
(Xi , Ui ). However, in order to extend our results to a wider class of distributional assumptions, we must relax our requirement that X[n:n] should
converge to a non-degenerate distribution. For example, when X and U are
distributed bivariate normally with positive correlation, X[n:n] → ∞ almost
surely.
In these cases, it can nevertheless be possible to find a sequence of functions
hn such that
hn (X[n:n] ) ⇒ S
for a non-degenerate random variable S. In this case, we would have
d
X[n:n] ≈ h−1
n (S)
d
for large n, where ≈ means that the two random variables have approximately
the same distribution.
This would extend the empirical application of our results. Of course, we
will not know the exact n in practice, but if S belongs to a class of distributions invariant under n, we know which distribution class our result can be
expected to belong to. Furthermore, the asymptotic behaviour of hn can be
17
used to assess how different moments of X[n:n] will develop as n → ∞, thus
giving us a way of predicting the effect of for example increased population
density on commuting choices.
We have done some exploratory studies on this extension, and there are
indications that for a much larger class of distribution than studied in the
paper, it is possible to find sequences hn and gn such that
n
X
δ(hn (Xi ),gn (Ui )) ⇒p ξ
i=1
for some non-degenerate Poisson process ξ. With this result, it is possible
to apply analogous result to those in this paper to analyze the asymptotic
behavior of X[n:n] more generally.
6.2
Empirical applications
In Example 2, we showed that with linear transport costs and uniform population distribution on a two-dimensional disc, the resulting distribution of the
distance from the origin of the optimal choice is asymptotically a truncated
gamma distribution when utilities are exponentially distributed and have a
deterministic additive term.
This is agreement with an observed empirical regularity that commuting
distances seem to follow a gamma distribution for short distances. The extension outlined in Section 6.1 seeks to show that this result is true not only
for exponentially distributed utilities, but whenever utilities belong to the
Gumbel domain of attraction (i.e. that their extreme values converge to a
Gumbel distribution). If this can be shown, the empirical regularity with
gamma distributed commuting distances will have a foundation in utility
maximization and probabilistic choice.
References
Anderson, S. P., De Palma, A., and Thisse, J.-F. (1992). Discrete choice
theory of product differentiation. MIT press.
Ben-Akiva, M. and Lerman, S. (1985). Discrete choice analysis: theory and
application to travel demand, volume 9. MIT press.
18
Cox, D. R. and Isham, V. (1980). Point processes, volume 12. Chapman &
Hall/CRC, London.
Cox, D. R. and Oakes, D. (1984). Analysis of survival data, volume 21.
Chapman & Hall/CRC, London.
David, H. and Galambos, J. (1974). The asymptotic theory of concomitants
of order statistics. Journal of Applied Probability, pages 762–770.
Fleming, T. R. and Harrington, D. P. (1991). Counting processes and survival
analysis, volume 8. Wiley Online Library, New York.
Jacobsen, M. (2005). Point Process Theory and Applications: Marked Point
and Piecewise Deterministic Processes. Birkhäuser, Boston.
Ledford, A. W. and Tawn, J. A. (1998). Concomitant tail behaviour for
extremes. Advances in Applied Probability, 30(1):197–215.
Luce, R. D. (1959). Individual Choice Behavior a Theoretical Analysis. John
Wiley and sons, New York.
Malmberg, H. (2012). Argmax over continuous indices of random variables
– an approach using random fields. Master’s thesis, Stockholm University.
Malmberg, H. and Hössjer, O. (2012). Argmax over continuous indices of
random variables – an approach using random fields. Technical report, Division of Mathematical Statistics, Department of Mathematics, Stockholm
University. Submitted.
McFadden, D. (1980). Econometric models for probabilistic choice among
products. Journal of Business, 53(3):13–29.
Nagaraja, H. N. and David, H. A. (1994). Distribution of the maximum
of concomitants of selected order statistics. The Annals of Statistics,
22(1):478–494.
Resnick, S. I. (2007). Extreme values, regular variation, and point processes.
Springer, New York.
Train, K. E. (2009). Discrete choice methods with simulation. Cambridge
University Press, Cambridge, 2nd edition.
19
Mathematical statistics
June 2013
www.math.su.se
Mathematical statistics
Department of Mathematics
Stockholm University
SE-106 91 Stockholm
Department of Mathematics
Stockholm University