Correlation Significance Calculations using Numerical Integration Review: Correlation Value is +1 in the case of a (perfectly) increasing linear relationship −1 in the case of a (perfectly) decreasing linear relationship Some value in-between in all other cases Indicates the degree of linear dependence between the variables The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables r > 0.7 is considered “good” for PSP planning purposes Winter 2004 SE-280 Dr. Mark L. Hornick 2 In the PSP, definite integrals of the t-distribution are used to calculate the significance of a correlation and the prediction interval of an estimate. Requirement: Integrate an arbitrary f(x) from a to b F ( x) f ( x)dx b f ( x)dx F (b) F (a) a The problem is that there is no (simple) closed-form solution for the integral of the t-distribution function. SE-280 Dr. Mark L. Hornick 3 Probability density Distributions are important statistical functions that we often need to integrate. Normal Distribution: The probability density function for a large sample size 1 F x e 2 x2 2 Its integral represents a cumulative probability over some range (more on that in a moment). SE-280 Dr. Mark L. Hornick 4 The t distribution is another type of probability density function we often need to integrate. T probability density function d = degrees of freedom 1 5 25 -6 0 As d increases, the t-distribution approaches the normal distribution 6 In the PSP, the t distribution is used to calculate the significance of a correlation and the prediction interval of an estimate. SE-280 Dr. Mark L. Hornick 5 The t-distribution function d 1 x 2 d 1 / 2 2 f ( x) 1 d d d * 2 d = number of degrees of freedom The gamma function For integer values, x x 1! (5) 4! 24 SE-280 Dr. Mark L. Hornick 7 The gamma function is defined recursively: x x 1 x 1 where : 1 1 1 and : 2 Base cases to terminate recursion In the t distribution, some gamma arguments are multiples of one-half! We often calculate the definite integral of the t-distribution. Integral value = p 0 SE-280 Dr. Mark L. Hornick tx x 9 In cycle 6, you will be required to calculate the significance of a correlation. First, calculate an integration limit (t) for use with the t distribution. rx,y correlation n number of historical data points m number of independent (x) variables t rx , y n m 1 1 rx , y 2 Next, calculate the t-distribution area in the "tails" outside (-t,t) with n-(m+1) degrees of freedom. tail area 1 2 p where "p" is the area (integral) from 0 to +t. A tail area of < 0.05 indicates high significance, while a value > 0.2 suggests the relationship is due to chance. SE-280 Dr. Mark L. Hornick 10 Integration issues Probability density Problem: how do we integrate from -? t SE-280 Dr. Mark L. Hornick x 11 Integrating to (+) F t f ( x)dx f ( x)dx f ( x)dx t t 0 Probability density N 0 N is some large value such that f(N)0 t SE-280 Dr. Mark L. Hornick x 12 Integrating to (-) F t f ( x)dx f ( x)dx f ( x)dx t N t Probability density 0 0 -t SE-280 Dr. Mark L. Hornick x 13 Summary of significance calculation t significance 1 2 * f ( x)dx 0 t rx , y n m 1 1 rx , y 2 Where n=# of data values, m=# of independent variables d 1 x 2 d 1 / 2 2 f ( x) 1 d d d * 2 Where d=# degress of freedom, And d = n - (m+1) SE-280 Dr. Mark L. Hornick a a 1 a 1 where : 1 1 1 and : 2 14 Here are some additional notes on Cycle 6. To calculate significance, you need to integrate only the t distribution Evaluating the t distribution requires you to evaluate the gamma function, which is a recursive function. Some defects (e.g., off-by-one loop errors) can result in very small discrepancies in the calculated values – don't be fooled!
© Copyright 2026 Paperzz