Empirical distribution function and Percentiles

Empirical distribution function and Percentiles
1
Q1. Given Data={-1,0,1}. What proportion of the data do not exceed 𝑦 = 2 ?
Ans. There are 3 observations in the data, out of which 2 do not exceed
1
2
(we compare
1
2
each data value to 2 and count). Therefore the required proportion is equal to = 3.
Let us now generalize this question
Q2. Given the data {𝑥1 , … … , 𝑥𝑛 }, what portion of the data is less than or equal to ?
Ans. We have to compute the number of observations in the data that are less than or equal to
𝑦. To get this number we compare each 𝑥𝑖 with 𝑦, 𝑖 = 1, … , 𝑛. If 𝑥𝑖 does not exceed 𝑦, we
count as one and zero otherwise. Hence
The number of 𝑥𝑖 ’s , 𝑖 = 1, … , 𝑛, not exceeding 𝑦 equals to ∑𝑛𝑖=1 𝐼(𝑥𝑖 ≤ 𝑦),
where 𝐼(𝑥𝑖 ≤ 𝑦)=1 if 𝑥𝑖 ≤ 𝑦 and zero otherwise.
Therefore the proportion of the data not exceeding 𝑦, equals to
1
𝑛
∑𝑛𝑖=1 𝐼(𝑥𝑖 ≤ 𝑦).
Let us denote this proportion by 𝐹𝑛 (𝑦). Given the data {𝑥1 , … … , 𝑥𝑛 }, we have a function as
follows
0, 𝑦 < 𝑥(1)
𝐹𝑛 (𝑦) =
1
,
𝑛
2
,
𝑛
𝑥(1) ≤ 𝑦 < 𝑥(2)
𝑥(2) ≤ 𝑦 < 𝑥(3)
…..
…..
(𝑛 − 1)
, 𝑥(𝑛−1) ≤ 𝑦 < 𝑥(𝑛)
𝑛
1,
𝑥(𝑛) ≤ 𝑦
{
Note:- {𝑥(1) , … … , 𝑥(𝑛) } are the sorted data. That is 𝑥(1) denotes the minimum and 𝑥(𝑛) is
the maximum. 𝑥(𝑖) denotes the observation such that exactly 𝑖 − 1 𝑥𝑖 ’s , 𝑖 = 1, … , 𝑛, are less
than 𝑥(𝑖) .
The function 𝐹𝑛 has a name, viz. Empirical distribution function.
Exercise: Plot the function 𝐹𝑛 and state properties.
Q3. Given the data {𝑥1 , … … , 𝑥𝑛 } and 0 < 𝑝 < 1, can you find a number 𝑦 such that
100𝑝 percent of the data do not exceed 𝑦.
Ans. Recall that 𝐹𝑛 (𝑦) is the proportion of the data not exceeding 𝑦.
Therefore , to answer the above question, it is natural to solve the equation 𝐹𝑛 (𝑦) = 𝑝.
However from the definition of 𝐹𝑛 (𝑦) it is important to realize that there may not be any 𝑦.
for which 𝐹𝑛 (𝑦) = 𝑝. (Why is that so? Well 𝐹𝑛 (𝑦) can be only equal to one of the 𝑛 + 1
1 2
𝑛−1
numbers {0, 𝑛, 𝑛......., 𝑛 , 1} and 𝑝 may not be equal to any one of these numbers.)
Moreover, if such a 𝑦 exists, it may not be unique. Eg. 𝐹4 (𝑦) = 0.5, 𝑥(2) ≤ 𝑦 < 𝑥(3) .
However, our purpose is served if we can get a 𝑦, such that
1. 𝐹𝑛 (𝑦) ≥ 𝑝 and
2. for any number 𝑧 < 𝑦, 𝐹𝑛 (𝑧) < 𝑝.
Since 𝐹𝑛 is a non negative monotonically non decreasing function increasing to 1, the set
{𝑥: 𝐹𝑛 (𝑥) ≥ 𝑝} is bounded below. Therefore we can define
𝑦 = inf{𝑥: 𝐹𝑛 (𝑥) ≥ 𝑝}.
Note: Such a 𝑦 satisfies 1 and 2 (why? 𝑧 < 𝑦 and 𝐹𝑛 (𝑧) ≥ 𝑝, then 𝑦 is not even a lower
bound of the set {𝑥: 𝐹𝑛 (𝑥) ≥ 𝑝}. Therefore 𝑧 < 𝑦 ⟹ 𝐹𝑛 (𝑧) < 𝑝.)
The 𝑦 in the above definition is the 100𝑝 percentile.
Percentile: Given the data {𝑥1 , … … , 𝑥𝑛 } and 0 < 𝑝 < 1, the 100𝑝 percent
percentile is denoted by 𝑄𝑝 , and is defined as
𝑄𝑝 = inf{𝑥: 𝐹𝑛 (𝑥) ≥ 𝑝}.
The percentiles divide the data 100 equal parts.
Quartiles: We can divide the data into four parts using the 25th, 50th and 75th
percentiles, viz 𝑄𝑝 , 𝑝 = 0.25, 0.50, 0.75. These percentiles are called quartiles,
denoted by 𝑄1 , 𝑄2 , 𝑄3 .
So therefore, there are 25percent of the data not exceeding 𝑄1, 50percent of the
data not exceeding 𝑄2 , 75percent of the data not exceeding 𝑄3 .
Ex1. What percent of the data are between 𝑄1 and 𝑄3 ?
Ex2. What percent of the data are between 𝑄2 and 𝑄3 ?
Ex3. What is the relation between 𝑄2 and the median ?

Download Report

Empirical distribution function and Percentiles

Paperzz.com

Your Paperzz