tema_6_ingljjstls

Chapter 6
INFERENTIAL STATISTICS I:
Foundations and
sampling distribution
Introduction
INFERENTIAL STATISTICS
STATISTICAL
PARAMETERS

Statistical estimation theory:




Find a value for the index in the sample level, the
goal is to infer the value of the index in the
population.
Punctual estimation: if it provides a single value.
Estimation by intervals: if it facilitates a range of
values whose limits we expected to be the
population mean.
Statistical decision theory :


Procedure to make decisions in the field of
statistical inference.
We’ll see in chapter 8.
Phases of the inferential process
1

A sample is obtained, randomly, and
calculated the corresponding statistics:
2
X, Mdn, Mo, S , S, P
2

We wonder: What would have
happened if we had worked with the
entire population?
3
Know the probabilistic model of possible
outcomes:
* normal law, binomial,
* Ji-square,
* Student-Fisher’ t,
* Snedecor’ F
Etc...

It is known because we have information about
similar situations or because it is deductible.
4
Construction of the STATISTIC SAMPLING
DISTRIBUTION (means or proportions). That
is, construct the distribution of all possible
outcomes.

5

Knowing the sampling distribution and the
underlying probability model, we just can
make probability judgments about the
statistics.
Sampling error
theory
9
As larger the sample size and better sampling
procedure performed, easier would be that the
statistical value is close to the parameter value
satisfactorily.

But,
it is expected that there is some discrepancy
between them (sampling error).

Solution:


We renounce to know the precise and specific sampling
error.
We can have some confidence that this error does not
exceed a limit amount. Thanks to the inference, this
amount is known with a certain confidence.
1º) We selected a sample by a random system (MAS =
simple random sampling) and obtain the main
statistics.
Sample
P
Mdn
S
X
Population



2

Statistics
Parameters
(latin letters)
(greek letters)

2º) Calculate the SAMPLING ERROR, "the
difference between a statistic and its corresponding
parameter“.
 Em
Em = X - 
Em = P - 

Ex. The mean in the sample is 13.2 and
there is a 0.05 probability of being wrong
in asserting that the population mean
differs from 13.2 in 2 units, plus or minus.
How can we measure the
sampling error?
It is not known, but we can have some confidence that
this error does not exceed a certain limit amount.

Thanks to the inference, this amount is known with a certain
confidence (translated to a specific value of probability)
ACCURACY
ERROR
SAMPLE
RELIABILITY
a) Accuracy

"The precision with which a statistic represents the
parameter. "
X1 = 47
X2 = 54
 = 50
X1 = 47
Em1 = 47 - 50 = -3
X2 = 54
Em2 = 54 - 50 = +4
E m1  E m2
b) Reliability

"The measure of the constancy of a statistic when
you get several samples of the same type and size. "
Example

We select large samples (n> 30) and obtain the
means:
X1  76 X2  78 X3  75 X4  77

If they vary little among themselves, as in the
example, we could say they are very reliable, if
not they will be unreliable and we will not trust
them.

This is an indirect indication of the accuracy.

Knowledge about the context in which the
inference is being madeallows us to conclude at
the population level, with a degree of certain
security or certainty.

To obtain this probability measure, it’s necessary
to know how rare or expected is to find what we
have found.

We need to know a sampling distribution and a
probability model associated with it.
SAMPLING DISTRIBUTION

To arrive to the knowledge of sampling
distributions should follow a process of
construction in 3 phases:
1ª FHASE

Obtaining
sample
population.

areas
of
the
I.e.:
Collection of all samples of the same
size "n", extracted randomly from the
population under study.
Population
M1
M2
M3
M...
Mn
If in each of the samples we calculate the mean we can
see that does not always take the same value but varies
its value from sample to sample.
2ª PHASE

Get all the means of each of these
samples.
M1
X
1
M2
X2
M3
X3
M...
X ...
Mn
Xn
3ª PHASE

Grouping these measures in a new
distribution called:
X1
X2
X3
X ...
Sample distribution of means
Xn
Parameters of the Sample Distribution
of means
Mathematical expectation or
expected value
X
Standard error
X
mŷ
X  
sigma
X 

n

S
n 1
Population distribution and sample
distribution of means
X 



n

X
S
n1
X
X
CARACTERISTICS of sample
distributions of means

1ª) The statistics obtained in the samples
are grouped around the population
parameter.

2ª) As you increase n, the statistics will be
more grouped around the parameter.

3ª) If the samples are large, the graphic
representation of the sampling distribution,
we can observe that:

a) The graphic representation is
SYMMETRICAL about the central vertical
axis that is the parameter ().

b) Bell-shaped more narrow when higher is
"n".
c) Takes the form of the normal curve.
 d) The mean of the sampling distribution of
means matches with the real mean in the
population.

X  

e) The distribution is more or less variable.
If the sampling distribution changes little,
i.e., has a very small sigma, means differ
little among themselves, and it’s be very
reliable.
The standard error of the mean =
X

Depends on the value taken by the
standard deviation of the sampling
distribution of means.

This value is known as typical error of
the mean.

Symbolically:

S
X 

n
n 1
=
typical error of the mean
STANDARDIZATION
X X
Z
S
s
X
S
X 
Z



X
X
Z
X 
X

X 
X 


S
n
n 1

In D.M. we do not work directly with
theoretical scores, but typical scores.

Typify the D.M. allows us to calculate
probabilities (if you also know the
probability model that has the
distribution). We can consider Normal
Distribution if:


n≥30 in Distrib. of means
Πn ≥5 y (1- Π)n ≥5 in Distrib. of probability
Characteristics of the sampling distribution of
means in terms of population size and population
and sample variances
X
Means
N=∞
N≠∞
X
Based on σ



n

n
N n
N 1
Based on S
S
n 1
N n
S
N (n  1)
Sampling Distribution of Means
EXAMPLE 1
(suppose we know
variance)
We have applied a test of a population and we
obtained a mean () of 18 points, with a
standard deviation ( ) of 3 points. Assuming
that the variable is normally distributed in the
population, calculate:
A) Between what values will the central 95% of the
subjects of that population be?
B) Between what valueswill the central 99% of the
average scores in samples of size n = 225, drawn at
random from this population be?
A)Between what values will the central 95% of the
subjects of that population be?
95%
-1.96
1.96
Calculations
1.96 =
X-
-1.96 =

X -18

 X = 18 + 5.88 = 23.88
3
X-

X -18

 X = 18 - 5.88 = 12.12
3
95%
12.12
23.88
The central 95% of the subjects will be obtained between
12.12 and 23.88 points
B) Between what valueswill the central 99% of the average scores
in samples of size n = 225, drawn at random from this population be?
99%
-2.58
2.58
Calculations
X- 
X -18
-2.58 =

 X =18 - 0.516 =17.484
3
X
225
X -  X -18
2.58 =

 X =18 + 0.516 =18.516
3
X
225
99%
17,484
18,516
The central 99% of the mean scores ranging
between 17.484 and 18.516
EXAMPLE 2
Calculate the probability of extracting from a
population whose mean () is 40 and standard
deviation () is 9 -, a sample of size n = 81, whose
average is equal to or less than 42 points.
  40
 9
P(X  42)
P(X
P(X42)
42)
  40
 9
Calculations
X   42  40 42  41
Z


2
9
X
1
81
Z  2  P  0,4772
P(X  42)  P(Z  2) = 0,50 + 0,4772 = 0,9772
EXAMPLE 3
In one sampling distribution of means with samples
of 49 subjects, central 90% of samples means are
between 47 and 53 points:
Which scores delimit the central 95% of means?
Which is the σ, related to the origin samples
population?
Which scores delimit the central 95% of means, if n
is 81 subjects?
A)¿Which scores delimit the central 95% of means?
Because n is 49, > than 30 = DN
Mean= 53 + 47/2= 50
99%
SD 1.64 = 53 – 50 /  X
 X = 1.83
47
-1.64
53
1.64
95%
-1.96 = X – 50 / 1.83
1.96 = X – 50 / 1.83
¿?
-1.96
¿?
1.96
X = 46.48
X = 53.59
B) ¿Which is the σ, related to the origin samples population?
X =

n
1.83 = σ / √49; σ = 12.81
c) Which scores delimit the central 95% of means, if n is 81 subjects?
95%
-1.96 = X – 50 /  X
 X = 12.81 / √81 = 1.42
¿?
-1.96
¿?
1.96
-1.96 = X – 50 / 1.42
1.96 = X – 50 / 1.42
X = 47.22
X= 52.78
Sampling Distribution of Proportions
EXAMPLE 1
In a given population, the proportion of smokers was
0.60. If we choose for this population a sample of n =
200 subjects; What is the probability that in that
sample we find 130 or fewer smokers?
0,6
0,5
0,4
0,3
 = 00.60
.60
0,2
0,1
0
  0,60
 = 0.60
(1   )  0,40
CONDITIONS OF APPLICATION
n    5  200  0.60  120
n  (1   )  5  200  0.40  80
n  P  5  200  0.65  130
n  (1  P)  5  200  0.35  70
130
p
 0.65
200
Parameters of the sample distribution
proportions
A) P  
B) TYPICAL ERROR   P 
 (1- )
n
A)  p  
p

P(1-P)

n
Standardization process
PZ
P -
P

P -
 (1 -  )
n
p

CALCULATIONS
Z
p 
(1  )
n

p 
p(1 p)
n

0.65  0.60
0.60 * 0.40
200
 143
.
Z  143
.  P  0,4236
P(P  0.65)  P(Z  143
. )  0,50  0,4236  0,9236
Characteristics of the sampling distribution of
means in terms of population size and population
and sample variances
Proportions
p
p
Based on σ
N=∞

 (1   )
N≠∞

 (1   ) N  n
n
n
N 1
Based on S
p(1  p)
n
p (1  p )
N n
N (n  1)
EXAMPLE 2

In the elections in a particular university, to elect
president, a candidate obtained 45% of the vote.

If you will choose randomly and independently a
sample of 100 voters, what is the probability that
the candidate receives more than 50% of the
vote?
Z
p 

 (1   )
n
0.50 - 0.45
1
0.45* 0.55
100
p 
 (1   )
n

P(P  0.50) = P(Z  1) =0.50-0.3413=0.1587
EXAMPLE 2
We know that 30% of seville students pass one concrete
test. Extracting samples of 100 students from this
population:
Which values delimit the central 99% from proportions
of these samples?
Which samples % will have a proportion equal or higher
than 0,35 respect to students that pass the test?
A) Which values delimit the central 99% from proportions of these samples?
n  P  5  100  0.3  30
n  (1  P )  5  100  0.7  70
99%
-2.58
2.58
A) Which values delimit the central 99% from proportions of these samples?
P -
P -
Z

P
 (1 -  )
n
P  0. 3
2.58 
0.3(1  0.3)
100
P  0 .3
2.58 
0.046
P  2.834
P  0.181
99%
-2.58
2.58
A) Which samples % will have a proportion equal or higher than 0,35
respect to students that pass the test?
X  P  0.35
0.35 - 0.3
Z
0.046
Z  1.0869
P  0.3599
0.5  0.3599  0.1401
≥0.35