Empirical distribution function and Percentiles
1
Q1. Given Data={-1,0,1}. What proportion of the data do not exceed π¦ = 2 ?
Ans. There are 3 observations in the data, out of which 2 do not exceed
1
2
(we compare
1
2
each data value to 2 and count). Therefore the required proportion is equal to = 3.
Let us now generalize this question
Q2. Given the data {π₯1 , β¦ β¦ , π₯π }, what portion of the data is less than or equal to ?
Ans. We have to compute the number of observations in the data that are less than or equal to
π¦. To get this number we compare each π₯π with π¦, π = 1, β¦ , π. If π₯π does not exceed π¦, we
count as one and zero otherwise. Hence
The number of π₯π βs , π = 1, β¦ , π, not exceeding π¦ equals to βππ=1 πΌ(π₯π β€ π¦),
where πΌ(π₯π β€ π¦)=1 if π₯π β€ π¦ and zero otherwise.
Therefore the proportion of the data not exceeding π¦, equals to
1
π
βππ=1 πΌ(π₯π β€ π¦).
Let us denote this proportion by πΉπ (π¦). Given the data {π₯1 , β¦ β¦ , π₯π }, we have a function as
follows
0, π¦ < π₯(1)
πΉπ (π¦) =
1
,
π
2
,
π
π₯(1) β€ π¦ < π₯(2)
π₯(2) β€ π¦ < π₯(3)
β¦..
β¦..
(π β 1)
, π₯(πβ1) β€ π¦ < π₯(π)
π
1,
π₯(π) β€ π¦
{
Note:- {π₯(1) , β¦ β¦ , π₯(π) } are the sorted data. That is π₯(1) denotes the minimum and π₯(π) is
the maximum. π₯(π) denotes the observation such that exactly π β 1 π₯π βs , π = 1, β¦ , π, are less
than π₯(π) .
The function πΉπ has a name, viz. Empirical distribution function.
Exercise: Plot the function πΉπ and state properties.
Q3. Given the data {π₯1 , β¦ β¦ , π₯π } and 0 < π < 1, can you find a number π¦ such that
100π percent of the data do not exceed π¦.
Ans. Recall that πΉπ (π¦) is the proportion of the data not exceeding π¦.
Therefore , to answer the above question, it is natural to solve the equation πΉπ (π¦) = π.
However from the definition of πΉπ (π¦) it is important to realize that there may not be any π¦.
for which πΉπ (π¦) = π. (Why is that so? Well πΉπ (π¦) can be only equal to one of the π + 1
1 2
πβ1
numbers {0, π, π......., π , 1} and π may not be equal to any one of these numbers.)
Moreover, if such a π¦ exists, it may not be unique. Eg. πΉ4 (π¦) = 0.5, π₯(2) β€ π¦ < π₯(3) .
However, our purpose is served if we can get a π¦, such that
1. πΉπ (π¦) β₯ π and
2. for any number π§ < π¦, πΉπ (π§) < π.
Since πΉπ is a non negative monotonically non decreasing function increasing to 1, the set
{π₯: πΉπ (π₯) β₯ π} is bounded below. Therefore we can define
π¦ = inf{π₯: πΉπ (π₯) β₯ π}.
Note: Such a π¦ satisfies 1 and 2 (why? π§ < π¦ and πΉπ (π§) β₯ π, then π¦ is not even a lower
bound of the set {π₯: πΉπ (π₯) β₯ π}. Therefore π§ < π¦ βΉ πΉπ (π§) < π.)
The π¦ in the above definition is the 100π percentile.
Percentile: Given the data {π₯1 , β¦ β¦ , π₯π } and 0 < π < 1, the 100π percent
percentile is denoted by ππ , and is defined as
ππ = inf{π₯: πΉπ (π₯) β₯ π}.
The percentiles divide the data 100 equal parts.
Quartiles: We can divide the data into four parts using the 25th, 50th and 75th
percentiles, viz ππ , π = 0.25, 0.50, 0.75. These percentiles are called quartiles,
denoted by π1 , π2 , π3 .
So therefore, there are 25percent of the data not exceeding π1, 50percent of the
data not exceeding π2 , 75percent of the data not exceeding π3 .
Ex1. What percent of the data are between π1 and π3 ?
Ex2. What percent of the data are between π2 and π3 ?
Ex3. What is the relation between π2 and the median ?
© Copyright 2026 Paperzz