IIMC Long Duration Executive Education
Executive Programme in Business Management
Statistics for Managerial
Decisions
Distributions & Modeling Data
Prof. Saibal Chattopadhyay
IIM Calcutta
A Brief Review
Uncertainty and Randomness: Theory of Probability
• Random Experiments, Events, Sample Space, Mutually
Exclusive and Exhaustive Events, Set-theoretic
operations with events (Union, Intersection, Difference,
Complement), Classical Definition, Total probability
Theorem, Bayes’ Theorem, Independence of two or
more events.
• Random Variables & Probability Distributions, Mean and
Variance
Decision Making Under Uncertainty: Utility Theory
• Decisions Based on Expected Utility
• Choice of Utility Function U(w): Risk-averse {U(w) = w},
Risk-seeker {U(w) = w2} and Risk-neutral {U(w) = w}
• Preference Reversal in Decision Making
Some Probability Distributions
Two types of Random Variables:
(A) Discrete Random Variables
• X is discrete if it takes a few values (masspoints) x1, x2, …, xn, … with corresponding
probabilities p1, p2, …, pn, … .
• probability law specifying the probabilities
for different values, is called the probability
mass function (pmf)
f(x) = P(X = x), for x = x1, x2, …, xn,…
Necessary Conditions for a function to be a pmf
•
f(x) 0, and
•
f(x) = 1, sum taken over all values of x
Example: The random variable X takes 10 values
1, 2, …, 10; the probability for X to take any value
is proportional to the square of the value.
Thus f(x) = C.x2 where C is the constant of
proportionality.
From condition (i), C 0, and from (ii)
1 = f(1) + f(2) + …+ f(10) = C{12 + 22 +…+102}
= C. 10.11.21/6, so C = 1/385.
f(x) = x2 / 385, for x = 1, 2, …, 10.
Probability Distribution of a Random Variable
Table giving the different values of the random
variable and the corresponding probabilities:
X=x
x1
x2
…
xn
Total
P(X=x)
p1
p2
…
pn
1
Characteristics of importance: Mean & SD
Mean = = Sum(value*probability)
SD = = SQRT_Sum1 – (Mean)2
Sum1 = Sum(value2*probability)
Use: Help in assessing the shape of the distribution and
the coverage probability (Chebyshev’s Inequality)
Some Special Discrete Distributions
1 Binomial Distribution:
Applicable for the following types of
experimentations (called Binomial/
Bernoullian trials):
(a) Only two outcomes, called Success (S)
and Failure (F) for each trial;
(b) P(S) = p and P(F) = 1 – P(S) = 1 – p = q,
same for all trials;
(c) Trials are probabilistically independent.
When is such a set-up applicable in real life?
Condition (a) generally holds: Call Success as
any event for some experiment and Failure as
the complement of the event
Condition (b) holds in most situations unless the
definition of ‘Success’ changes mid-stream, or
the initial conditions vary from one trial to
another
Condition (c) holds for repetitive trials
Calculation of probabilities for random events
under such a set-up is easy !
Binomial Distribution
Consider n Binomial trials with P(Success) = p,
and P(Failure) = q = 1 – p.
Define X = Number of successes in ‘n’ trials
X is discrete random variable with values 0,1,…,n.
The probability law (p.m.f) of X is
f(x) = P(X=x) = nCx px qn-x, for x = 0,1,…,n
Mean = n.p
SD = n.p.q
2. Poisson Distribution
X is Poisson if the probability law is
f(x) = P(X = x) = e-m.mx/x! ,for x = 0,1, 2,…,
• m = mean = (SD)2
• Distribution is positively skewed (longer tails
towards the higher values)
• Used to model count data for rare events
• Approximates binomial distribution when n (the
number of trials) is ‘large’, p (the probability of
success in a trial) is ‘small’ but n.p (the average
number of successes) is finite, equal to m
Continuous Probability Distributions
Random Variable X is continuous in (a, b) if it can
take any value in (a, b).
Probability Law for X?
How many values? Uncountable !!
Can’t assign probabilities to individual values of
the variable!
How to proceed? Use a continuous function f(x)
over (a, b) to describe the probability law
P(cXd) = Area under the function f(x) between
x=c and x=d
Continuous Distribution
The function f(x) is called the probability density
function (pdf) of the continuous random
variable
Necessary conditions:
1. f(x) 0, for all x in (a, b)
2. Total area under f(x) in (a, b) = 1
( Definite Integral ab f(x)dx = 1)
Continuous Distribution: Probability is SAME as
area under a curve
Naturally, P(X = any particular value) = 0, but
P(X taking values in any interval) > 0.
An Example
Consider random variable X over an interval(1, 10)
such that the pdf f(x) is a constant over the
interval, i.e.,
f(x) = C for 1 x 10,
= 0, otherwise.
Since total area = 1, C = 1/(10-1) = 1/9
Thus f(x) = 1/9 for 1 x 10, and
= 0 elsewhere
Rectangular / Uniform Distribution
Density is uniform over the entire range of the
variable
Not true in general for any distribution!
Some Continuous Distributions
1. Normal Distribution
• Most widely used distribution in Statistical
Literature
• Unimodal, bell-shaped probability curve
• Ranges over the entire real line (-, )
• Distribution is characterized by its mean and
SD (- < < , 0 < < )
• Distribution is perfectly symmetrical about its
mean
• Mean = Median = Mode =
Normal Distribution Continued
Standard Normal Distribution (Z)
Mean and SD are two standard values: Mean =
= 0, and SD = = 1.
Result: If X is Normal with mean and SD , then
the standardized variable
Z = (X – Mean) / SD = (X - ) /
is Standard Normal.
Probability Table for Standard Normal Distribution
is available, and this can be used to calculate
normal probabilities
Approximating a discrete probability
distribution by Normal Distribution
• Normal Approximation of Binomial
If X is Binomial with parameters (n, p), then the
binomial probability P(a X b), where ‘a’ and
‘b’ are integers, can be well approximated by a
normal area P(a – ½ X b + ½ ) where X
follows a normal distribution with mean = np and
SD= npq.
Approximation works well unless the binomial
distribution is too skewed (p very close to 0 or 1)
2. Exponential Distribution
• Another continuous distribution which varies
over the positive part of real line (0, )
• Not symmetric, in fact the density curve is
positively skewed (longer tail is towards the
higher values of the variable)
• Used to model the life of complex electronic
equipments
• Has a “loss of memory” property: future life is
independent of the current age of the product
• Widely used is Reliability Analysis
Reproductive Property of Distributions
Many distributions retain the same form
when two or more identical but
independent distributions are combined
1. Binomial: X1 is Binomial (n1, p) and X2 is
Binomial (n2, p) X1 + X2 is also
Binomial (n1 + n2, p)
Note: Result not true if success
probability p is different
Reproductive Property
2. Poisson: X1 is Poisson with mean m1, X2 is
Poisson with mean m2 X1 + X2 is Poisson
with mean m1 + m2
3. Normal: X1 is Normal (1, 1) and X2 is Normal
(2, 2) X1 + X2 is also Normal (Mean = 1 +
2, SD = {12 + 22} )
Notes: a) For discrete distributions, the property
does not hold for the difference X1 – X2
b) For normal distribution, the property holds
for the difference as well: X1 – X2 is also normal
(Mean = 1 - 2, SD = {12 + 22} )
Joint Distribution of Two Random Variables
Two random variables X and Y are studied
together for examining their possible
interdependence
Consider “both discrete” case:
X has k values x1, x2, …, xk
Y has l values y1, y2, …, yl
Joint probability law: P(X = xi, Y = yj) = Pij,
i = 1, 2, …, k; j = 1, 2, …, l.
An Example of a Joint Distribution
X and Y are (random) percentage returns from
two stocks in BSE;
X could take one of the values 5%, 10% or 20%
Y takes one of the values 10% or 20%.
From past data, the joint probabilities are
estimated as
P(X=5, Y=10) = 0.10; P(X=5, Y=20) = 0.25;
P(X=10, Y=10) = 0.08; P(X=10, Y=20) = 0.22;
P(X=20, Y=10) = 0.30; P(X=20, Y=20) = 0.05;
Joint Distribution Table
The joint distribution of X and Y can be
shown in the following table:
X \ Y
10
20
5
0.10
0.25
Row Total
(X)
0.35
10
0.08
0.22
0.30
20
0.30
0.05
0.35
Column
Total (Y)
0.48
0.52
1.00
Some Concepts in a Joint Distribution
a) Marginal Probability Distributions – Mean
and SD
These are obtained as the Row and Column
totals:
of X along the rows: Pio, and of Y along
the columns:Poj
distribution of only one variable when
variation in the other variable is ignored
Marginal Distributions of X and Y
Of X:
P(X=5) = 0.35, P(X=10) = 0.30 and P(X=20) =
0.35.
Mean & SD of X: As usual
Mean of X =Average % return of stock X = x =
5.(0.35) + 10. (0.30) + 20.(0.35)= 11.75
SD of the % return for stock X = SD of X = x =
{SUM-SQ – (Mean)-Sq} =
[{25.(0.35)
+
100.(0.30) + 400.(0.35)} – (11.75)2 ] = 40.6875
= 6.38
Marginal Distribution of Y
P(Y=10) = 0.48; P(Y=20) = 0.52
Mean of Y = Average % return of stock Y =
y = 10.(0.48) + 20.(0.52) = 15.2,
SD of the % return for stock Y = SD of Y =
y = (256 – 231.04) = 5.00
Is that all?
What about their possible interdependence?
Independence of the Random Variables
Recall: For two events A and B, they are
independent if P(AB) = P(A).P(B)
X and Y will be independent random
variables if similar things hold:
(X=xi) and (Y=yj) must be independent
events for all choices of xi and yj, that is
P{(X=xi)(Y=yj)} = P(X=xi).P(Y=yj)
Every Cell prob. = Row total . Column total
Independence of two random variables
For this example, P(X=5, Y=10) = 0.10 while
P(X=5) = 0.35, P(Y=10) = 0.48 so that
P(X=5).P(Y=10)=(0.35).(0.48) =0.168 0.10
X and Y are not independent
How to examine the extent of dependence?
• Correlation Approach: Examine if X and Y
are related, either exactly or at least
approximately, in a linear form
Correlation Coefficient (Pearson)
The Covariance between X and Y is
xy = Sum(xiyjPij) – (Mean of X)(Mean of Y)
The SD’s of X and Y are, as before,
x = SQRT{Sum(xi2Pio) – (Mean of X)2 }
y = SQRT{Sum(yj2Poj) – (Mean of Y)2 }
Correlation Coefficient = = xy / x y
For our example, xy = 162 – 11.75*15.2 =
-16.6; x = 6.38; y = 5.00, so that = -0.52
How does it help?
Result: For any joint distribution, -1 1.
Interpretation: Sign of tells us how one variable
behaves with variation in the other:
• if both behave in the same direction (both
increases or both decreases) that is a case of
positive correlation; will be positive here; 0 <
< 1.
• if they behave in opposite directions (one
increases as the other decreases or conversely)
that is a case of negative correlation; will be
negative here; -1 < < 0.
Interpretation of Correlation Coefficient
• Case when = 0: Here the two variables are
called uncorrelated. This means that there is no
linear relationship between the two variables.
• Case when = 1: This is the case of perfect
positive correlation in the sense that for all pairs
of values of (X, Y), Y = a + bX with certainty, i.e.,
P(Y = a + bX) =1, with b>0.
• Case when = -1: This is the case of perfect
negative correlation in the sense that for all pairs
of values of (X, Y), Y = a - bX with certainty, i.e.,
P(Y=a - bX) =1, with b>0.
Example Revisited
In our example, = - 0.52.
So there is a high negative correlation
between the two variables X and Y.
Since X and Y indicate the % return from the
two stocks, this means that if one stock
performs well (giving high return), the
other stock is likely to under-perform,
giving returns below its expected return.
Limitation of
Examines only Linear Relationship between
X and Y; if true relationship is non-linear or
if there is no relationship, fails to capture
Thus = 0 does not mean that X and Y are
independent random variables !!
A serious drawback of
How to capture other relationships, if any?
Regression Approach
Regression Equation
Emphasis is to examine how one variable
explains the variation in the other variable
Y = Study variable
X = auxiliary variable
To develop an equation that explains Y when
X is known
Regression Equation of Y on X
Regression Equation
Different types:
• Y = + X (linear regression)
• Y = + X + X2 (quadratic regression)
• log Y = + X (logarithmic regression)
, , etc. are equation parameters, usually
unknown
Need to estimate them from data
How to estimate the parameters?
Least Squares Principle
Data: n pairs of values on (X, Y): (x1,y1),
(x2, y2), … , (xn, yn)
Consider Linear Regression: Y = + X
For X =Xi,
• Observed value of Y = Yi, and
• Predicted value of Y = Value of Y obtained
from the model = + Xi , i=1,2,…, n.
Least Squares Principle
Error is ei = Yi – ( + Xi ), i = 1,2,…, n
Minimize sum of squares of the errors:
= ei2 = (Yi – ( + Xi )2 w.r.t. and
Equations for solving and :
Yi = n + Xi
XiYi = Xi + Xi2
Two equations in two unknowns and
Solve for and , say, ^ and ^
Estimated Regression Equation
Y^ = ^ + ^ X
(^ and ^ are the estimates of and )
For a given value of X = x*, the predicted
value of Y is
Y* = ^ + ^ x*
Regression of X on Y: Similar
Equations are not interchangeable !
An Example of Fitting a Linear Regression
Data on two variables (X, Y):
X
1.0
1.1
1.3
1.8
2.0
2.4
Y
10.0 12.3 17.0 30.1 36.2 43.0 55.3
Calculations: n = 7; X = 12.6; Y = 203.9;
X2 = 25.9; XY = 441.31
^ = 23.07;
^ = -12.40
Fitted Regression: Y^ = -12.4 + 23.07 X
When X=5, predicted value of Y = Y^ = 102.95.
3.0
Statistical Model Vs. Mathematical Model
Where is the difference?
In our approach:
• Mathematical Model: Deterministic, no
concept of an error component
• Statistical Model: Probabilistic, with a
provision for allowing random error to
operate (to account for uncertainties
associated with several market forces
acting together) – a greater scope for
application in real-life situations
How good is a Statistical Model?
• Given data on (X, Y) there may be several
competitive models (Linear, Quadratic,
Logarithmic etc.)
• Which one will give us the best fit?
• Need to examine the significance of any
fitted model - how much of the total
variation the model can explain
Statistical Inference/Hypothesis Testing
Reference:
Text Book for the Course
• Statistical Methods in Business and Social
Sciences: Shenoy, G.V. & Pant, M.
(Macmillan India Limited)
Suggested Reading
• Complete Business Statistics: Aczel, A.D.
& Sounderpandian, J. – Fifth Edition (Tata
McGraw-Hill)
© Copyright 2026 Paperzz