Quantitative Training (course two)

As a data user, it is
imperative that you
understand how the data
has been generated and
processed…
This will help you
understand the
limitations of the data
and the uses to which it
can be put (and the
confidence with which
you can put it to those
uses).
Ipsos MORI’s technical note
Available on the survey website
Contents:
– Sampling methodology
– Data collection
– Data processing
– Data weighting
– Statistical reliability…
An intro to sampling
 Not feasible to question all of the people in the population we
are interested in i.e. all residents in a local authority area (this
would be a ‘census’)
Sampling is making an inference about a…
Population…
…from a…
…Sample
We use sampling in our everyday lives!
Do you need to eat the
whole pot to see if it is
correctly seasoned?
Or
Will just tasting it be
enough?
TASTING IT PROVIDED
IT’S WELL
STIRRED!
NHTS uses probability sampling…
 That is every sampling unit (an address) has a known and nonzero probability of selection
– we know the size of the population – the total no. of addresses – we are
drawing from
– we know the chance of selection that applies to every individual unit – an
address – within the population (“1 in n”).
 This does not apply to quota sample surveys which are the
most commonly used approach in market research.
 Random probability sampling is theoretically purer
– it is less prone to non-response bias
– all statistical reliability tests assume random sampling
Statistical reliability

A sample survey produces figures which are estimates of the ‘truth’:
– that is, we are drawing inferences about the entire population based on a
sample (the ‘true’ figure would be based on a survey of the entire
population i.e. a census)

Statistical reliability is a statement about those estimates and the confidence
we have about them in relation to the ‘truth’

Reliability is sometimes referred to as “confidence intervals”, “margins of
error” or “sampling tolerances”

It is determined by:
– the percentage
– the sample size on which the percentage is based
– the level of confidence we want to apply – it is usual to test at the 95%
confidence level
– the effect of any weights applied
Confidence intervals and sample size
 Larger samples provide more accurate data
 But to achieve double the reliability we need to quadruple
the sample size
 For example, the following figures are for a 50% finding:
Sample
size
50
Confidence 14%
interval +
100
200
400
10%
7%
5%
Some examples

For an NHTS survey based on 400 responses and in relation to a finding that
50% are fairly satisfied with a service:
– “We are 95% confident that our sample percentage is reliable to plus or
minus 5%.”

Put another way:
– “Out of every 100 surveys we conduct where we see these figures, in 95 of
them we would be right and the true figure lies within that range, in 5 of
them we would be wrong and the true figure would not lie in that range.”
– “The chances are 95 in 100 that this result would not vary by more than 5
percentage points from the ‘true’ result, that which would be found had
the entire population responded i.e. the ‘true’ result would be between
45% and 55%.”

Another example, based on 100 responses:
– “We are 95% confident that our sample percentage is reliable to plus or
minus 10%. The ‘true’ result would be between 40% and 60%.”
Statistical tests should
not be used for sample
sizes less than 50 – in
fact, any estimates
derived from small
samples should be
considered, at best,
indicative.
A full list of confidence
intervals for a range of
sample sizes and
percentages is provided
in Ipsos MORI’s Technical
note.
Statistical significance between %s
 When we extend this to look at two figures (survey estimates),
we are usually interested in the likelihood of the difference
between the two figures being ‘real’:
– In other words, how confident are we that the difference reflects what
we would have found if we had surveyed the entire population?
 The issue may apply either to two figures from one sample or to
a comparison of figures in two different samples for example:
– comparing the views of men and women
– comparing the views of one authority’s residents with another's
– comparing attitudes among one authorities’ residents and how they
have changed between 2009 and 2010
 The answer is obtained using a test of statistical significance
Statistical significance: some examples
 For two samples based on 1,000 responses and with a
difference of 5 percentages points (50% vs 55%)
– The confidence interval is +4.
– “We are 95% confident that this is a statistically significant finding.”
 With the same %s based on 500 vs 500 responses
– The confidence interval is +6.
– “We are 95% confident that this is not a statistically significant finding”.
 With the same %s based on 100 vs 100 responses
– The confidence interval is +14.
– “We are 95% confident that this is not a statistically significant finding”.
This means that there will be
different confidence intervals
involved in comparisons of
survey %s for local authority
A vs B, and comparisons of C
vs D (assuming that A-D
received different numbers of
responses).
There can still be merit in
reporting on, and using
findings, which are not
statistically significant but
caution should be exercised
the smaller the sample size.
We cannot use these findings
with the same degree of
confidence.
A full list of confidence
intervals for different
sample size comparisons
and percentages is
provided in Ipsos MORI’s
Technical note.