significance of a correlation

Correlation Significance
Calculations using Numerical
Integration
Review: Correlation
Value is +1 in the case of a (perfectly)
increasing linear relationship

−1 in the case of a (perfectly)
decreasing linear relationship

Some value in-between in all other
cases

Indicates the degree of linear
dependence between the variables
 The closer the coefficient is to either
−1 or 1, the stronger the correlation
between the variables
 r > 0.7 is considered “good” for PSP
planning purposes

Winter 2004
SE-280
Dr. Mark L. Hornick
2
In the PSP, definite integrals of the t-distribution are
used to calculate the significance of a correlation
and the prediction interval of an estimate.
Requirement:

Integrate an arbitrary f(x)
from a to b
F ( x)   f ( x)dx
b
 f ( x)dx  F (b)  F (a)
a
The problem is that there is no (simple) closed-form solution for
the integral of the t-distribution function.
SE-280
Dr. Mark L. Hornick
3
Probability density
Distributions are important statistical
functions that we often need to
integrate.
Normal Distribution:
The probability density
function for a large
sample size
1
F  x 
e
2
 x2
2
Its integral represents a
cumulative probability
over some range (more
on that in a moment).
SE-280
Dr. Mark L. Hornick
4
The t distribution is another type of probability
density function we often need to integrate.
T probability density function
d = degrees
of freedom
1
5
25
-6
0
As d increases, the
t-distribution approaches
the normal distribution
6
In the PSP, the t distribution is used to
calculate the significance of a correlation
and the prediction interval of an estimate.
SE-280
Dr. Mark L. Hornick
5
The t-distribution function
 d 1 

  x 2  d 1 / 2
2 

f ( x) 
1 

d 
d 
d *    
2
d = number of degrees of freedom
The gamma function
For integer values,   x    x  1!
(5)  4! 24
SE-280
Dr. Mark L. Hornick
7
The gamma function is defined recursively:
  x    x  1   x  1
where :  1  1
1
and :     
2
Base cases to
terminate recursion
In the t distribution, some gamma arguments are multiples of one-half!
We often calculate the definite
integral of the t-distribution.
Integral value = p
0
SE-280
Dr. Mark L. Hornick
tx
x
9
In cycle 6, you will be required to
calculate the significance of a
correlation.
First, calculate an integration limit (t) for
use with the t distribution.
rx,y correlation
n number of historical data points
m number of independent (x) variables
t
rx , y n  m  1
1  rx , y 
2
Next, calculate the t-distribution area in the "tails"
outside (-t,t) with n-(m+1) degrees of freedom.
tail area  1  2 p
where "p" is the area (integral) from 0 to +t.
A tail area of < 0.05 indicates high significance, while a
value > 0.2 suggests the relationship is due to chance.
SE-280
Dr. Mark L. Hornick
10
Integration issues
Probability density
Problem: how do we
integrate from -?
t
SE-280
Dr. Mark L. Hornick
x
11
Integrating to (+)
F  t    f ( x)dx   f ( x)dx   f ( x)dx
t
t
0
Probability density

N
0
N is some large value such
that f(N)0
t
SE-280
Dr. Mark L. Hornick
x
12
Integrating to (-)
F  t    f ( x)dx   f ( x)dx   f ( x)dx
t
N

t
Probability density
0
0
-t
SE-280
Dr. Mark L. Hornick
x
13
Summary of significance
calculation
t
significance  1  2 *  f ( x)dx
0
t
rx , y
n   m  1
1   rx , y 
2
Where n=# of data values,
m=# of independent variables
 d 1 

  x 2  d 1 / 2
 2 
f ( x) 
1 

d 
d 
d *    
2
Where d=# degress of freedom,
And d = n - (m+1)
SE-280
Dr. Mark L. Hornick
  a    a  1   a  1
where :  1  1
1
and :     
2
14
Here are some additional
notes on Cycle 6.
To calculate significance, you need to integrate
only the t distribution

Evaluating the t distribution requires you to evaluate the
gamma function, which is a recursive function.
Some defects (e.g., off-by-one loop errors) can result in very small
discrepancies in the calculated values – don't be fooled!