Sampling

Quantitative approaches
Quantitative approaches
Plan
Lesson 3:
Sampling
1.
2.
3.
4.
5.
6.
Introduction to quantitative sampling
Sampling error and sampling bias
Response rate
Types of "probability samples"
The size of the sample
Types of "non-probability samples"
1
Quantitative approaches
2
Quantitative approaches
1. Introduction to quantitative sampling
Sampling: Definition
Sampling =
3
choosing the unities (e.g. individuals,
famililies, countries, texts, activities) to be
investigated
4
Quantitative approaches
Quantitative approaches
Sampling: quantitative and qualitative
Population and Sample
"First, the term "sampling" is problematic for qualitative research,
because it implies the purpose of "representing" the population sampled.
Quantitative methods texts typically recognize only two main types of
sampling: probability sampling (such as random sampling) and
convenience sampling."
(...) any nonprobability sampling strategy is seen as "convenience
sampling" and is strongly discouraged."
This view ignores the fact that, in qualitative research, the typical way of
selecting settings and individuals is neither probability sampling nor
convenience sampling."
It falls into a third category, which I will call purposeful selection; other
terms are purposeful sampling and criterion-based selection."
This is a strategy in which particular settings, persons, or activieties are
selected deliberately in order to provide information that can't be gotten
as well from other choices."
Population
Sample
IIIIIIIIIIIIIIII
IIIIIIIIIIIIIIII
IIIIIIIIIIIIIIII
IIIIIIIIIIIIIIII
IIIIIIIIIIIIIIII
Sampling
IIIII
IIIII
(= «!Miniature population!»)
Maxwell , Joseph A. , Qualitative research design..., 2005 , 88
5
Quantitative approaches
Quantitative approaches
Population, Sample, Sampling frame
Population =
ensemble of unities from which the sample is
taken
Sample =
part of the population that is chosen for
investigation. The choice may be based on
randomness or not.
Sampling
frame =
6
Representative sample, probability sample
Representative sample =
Sample that reflects the population
in a reliable way: the sample is a
«!miniature population!»
Probability sample =
Sample that has been randomly
chosen. Therefore, every unity has
a known probability to be chosen.
list of all the unities from which the choice is
made.
7
8
Quantitative approaches
Quantitative approaches
Representativity: an empirical question
2. Sampling error, sampling bias
The representativity of the sample cannot be assured by
following a given method. If we use the correct methods
(random choice, stratification etc.) we can only maximize the
probability of producing a representative sample.
It is an empirical question (and should be tested) if the
sample is really representative of the population.
For example: we would investigate if the percentage of
women in the sample are not significantly different from
those of the population (==> the sample is representative
concerning gender).
9
Quantitative approaches
Quantitative approaches
Errors: different types
1. Sampling error
2. Sampling bias
3. Data collection error
4. Data processing error
5. Data analysis error
6. Data interpretation error
10
Sampling error, sampling bias
due to chance, size of sample
not due to chance or size of
sample. E.g. non-response linked
to the specific theme of the
research
e.g. bad question wording;
bad interviewing
e.g. wrong coding
e.g. wrong statistical model;
erroneous data analysis
e.g. wrong interpretation of results
11
Sampling error =
Differences between the sample and the
population that are due to the sampling
(the randomness). Sampling error can be
diminished by increasing the size of the
sample
Sampling bias =
Differences between the sample and the
population that are not due to sampling
(the randomness); the sampling bias
does not diminish with increased sample
size.
12
Quantitative approaches
Quantitative approaches
Sampling error/bias: example (I)
smokers
non-smokers
smokers
Sampling error/bias: example (II)
non-smokers
smokers
non-smokers
smokers
non-smokers
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
Population : N = 200
Sample : N = 32
no error/bias
P(s) = 0.5; p(s) = 0.5
Population : N = 200
Population : N = 200
Sample : N = 32
a bit of error/bias
P(s) = 0.5; p(s) = 0.47
Population : N = 200
Sample : N = 32
a lot of error/bias
P(s) = 0.5; p(s) = 0.33
13
14
Quantitative approaches
Quantitative approaches
Sampling error: decreases
with increasing sample size
Possible reasons for sampling bias
Experiment with a coin
Probability of throwing «!heads!»?
•
•
P «!in reality!» = 0.5
We do 5 tries with N =1,2,5,20
With growing N, the p is approaching the P
N = 1 ->
N = 2 ->
N = 5 ->
N = 20 ->
•
The sampling frame does not include all the elements of the
population (example: telephone directory)
The choice is not really random (example: open telephone
directory at a random page and choose the next 600 names)
Certain groups of respondents have a higher (lower) response
rate (example: the very poor, the very rich, ther very active,
the people with an active interest in the question, the people
critical of surveys)
p = 0, 1, 0, 1, 1
p = 0, 0.5, 0.5, 1, 0
p = 0.6, 0.2, 0.4, 0.8, 0.1
p = 0.4, 0.35, 0.45, 0.35, 0.55
15
16
Quantitative approaches
Quantitative approaches
Sampling error vs. sampling bias: Citation
3. Response rate
Sampling error is random. Every time you select an individual, a text, a
situation, or any "unit of observation," that unit of observation will be
different from the population of such units. Hence you always have an
error (we hope a small one) in generalizing to the population of units."
"Unlike sampling error, "sampling bias" is systematic (nonrandom). For
example, if for a focus group study you "randomly" select one of every
five students who happen to be in the library on a Friday afternonnon,
you might have a biased sample that does not represent the views of
"average" college students."
"Unlike sampling error, increasing the size of the samle does not
decrease the degree of bias in your sample."
Obviously, the results of a biased sample cannot be considered to be
representative of the population (i.e. , the findings have low
transferability or external validity)."
Tashakkori / Teddlie, Mixed Methodology. Combining Qualitative and Quantitative ..., S.
72-73
17
Quantitative approaches
Quantitative approaches
Response rate
Response rate=
18
Response rate: example RLS
Percentage of individuals of the sample
who have responded to the questionnaire
N of returned interviews - N returned interviews, not usable
=
Sample - number of individuals who were not able to
answer or could not be reached
Example
652 - 8
=
= 0.56
1212 - 66
19
Tabelle 1Ausschöpfungsrate und Anzahl der verwendeten Interviews in dieser Studie
N
%
Brutto-Stichprobe
4800
Stichprobenneutrale Ausfälle
1712
davon
1. Stufe
1291
2. Stufe
141
3. Stufe
280
Netto-Stichprobe
(Brutto-Stichprobe - stichprobenneutr. Ausf.)
3088
100.0%
Verweigerungen
davon
1. Stufe
2. Stufe
3. Stufe
Realisierte Interviews
davon
Deutsche Schweiz
(davon Kanton Zürich)
Französische Schweiz
Italienische Schweiz
1424
46.1%
1062
183
179
1664
53.9%
(=NettoAusschöpfung)
1054
330
409
201
Anhänger/innen nichtchristlicher Religionen
28
in dieser Studie verwendete Interviews
davon Kanton Zürich
1636
325
20
Quantitative approaches
Quantitative approaches
Response rate: example
Christliches Zeugnis
•
•
4. Types of probability sample
Der tatsächliche Rücklauf war besser als erwartet. Von 942
angeschriebenen Personen antworteten 469 auf das erste
Schreiben(49,8%); nach erfolgter Mahnung sandten weitere
125 Personen (13,3%) gültige Fragebogen ein. Die
Gesamtrücklaufquote beläuft sich damit auf rund 63% (594
Personen).
Dies nach Abzug der ungültigen Antworten und der
Befragten, die nicht mehr aufzufinden, krank oder gestorben
waren.
21
22
Quantitative approaches
Quantitative approaches
Types of probability sample
4.1 Simple random sample
4.1. Simple random sample
Simple random sample =
4.2. Systematic random sample
4.3. Stratified random sampling
1.
2.
3.
4.
4.4. Multi-stage cluster sampling
23
choose randomly a
predetermined number of the
population (sample frame)
decide what population to use
choose the sampling frame
decide sample size
use random numbers (e.g. with the help of a computer) in
order to choose the units)
24
Quantitative approaches
Quantitative approaches
Systematic random sample:
Christliches Zeugnis (I)
4.2 Systematic random sample
Systematic sample =
1.
2.
3.
4.
choose randomly/systematically a
predetermined number of the
population (sample frame)
Ziel war, eine für den Evangelikalismus der deutschen
Schweiz repräsentative Untersuchung durchzuführen.
Als Methode wurde die schriftliche Befragung gewählt. In
einem nächsten Schritt musste eine geeignete Adresskartei
aller Evangelikalen gefunden werden, um die repräsentative
Stichprobe ziehen zu können. Eine solche Kartei existiert
nicht - und es ist schwierig, ja fast unmöglich, eine sinnvolle
Stichprobe selbst zu konstruieren. (...)
Auf der Suche nach einem Ausweg aus dieser Schwierigkeit
stiessen wir auf Campus für Christus, eine evangelikal
ausgerichtete Organisation.
decide what population to use
choose the sampling frame
decide sample size
begin with a random number between 1 and i; choose every
ith unit in the sampling frame. i = sample / population
25
Quantitative approaches
26
Quantitative approaches
Systematic random sample:
Christliches Zeugnis (II)
Systematic random sample:
Study on islamophobia
Sie gibt eine Zeitschrift, das "Christliche Zeugnis", heraus,
welche innerhalb des Evangelikalismus recht weit verbreitet
ist und eine Auflage von ca. 20000 erreicht. Von der Kartei
dieser Zeitschriftenempfänger kann man hoffen, dass sie ein
unverzerrtes Bild des E in der deutschen Schweiz liefert.
Die Zufallsstichprobe wurde wie folgt gezogen: Die erste
Adresse wurde durch eine Nummer zwischen 1 und 20
zufällig gewählt; dann wurden von hier aus in 20-erSchritten die weiteren Adressen aussortiert. Als gültig
erwiesen sich 942 Adressen.
The data used for this study stem from a closed-question
face-to-face survey, each interview taking from 45-60
minutes. The population consisted of inhabitants of the city
of Zurich in the age range 18 to 65 with Swiss nationality.
The survey was conducted between October 1994 and March
1995 by the Sociological Institute of the University of
Zurich. The people were chosen randomly from the official
files of the state (Einwohnerkontrolle). In all, 1,138
interviews were conducted. The response rate was 72%. The
survey can be regarded as representative of the Swiss
population of the city of Zurich (Stolz, 2000, 226).
27
28
Quantitative approaches
Quantitative approaches
4.3 Stratified random sampling
Stratified random sampling:
Stratified random sampling: example (1)
create strata in your sampling
frame corresponding to central
cleavages in your popultion.
Inside every strata, choose
predetermined numbers of units
randomly.
On sait que dans notre population de 7'000'000 nous avons
72% de germanophones, 20% de francophones et 8%
d'italophones. Notre sample size est 1000.
Alors nous décidons de chosir aléatoirement
dans la population des germanophones: 720
dans la population des francophones:
200
dans la population des italophones:
80
-> Concernant la langue, notre sample est absolument
représentatif.
-> Si nous avions effectué un simple random sample, le
sampling erreur aurait produit p.ex. un sample avec: germ:
742, franc: 195, ital: 63
29
Quantitative approaches
30
Quantitative approaches
Stratified random sampling: example (2)
4.4 Multi-stage cluster sampling
In the NCS-CH study, we stratified for religious tradition.
Furthermore, we overweighted smaller religious traditions.
Multi-stage cluster sampling = on choisit d'abord
aléatoirement
des groupes
d'unités (clusters); puis, on
choisit aléatoirement dans ces
groupes
-> Souvent moins cher
31
32
Quantitative approaches
Quantitative approaches
Multi-stage cluster sampling:
Etude sur les évangéliques (Milieu) (I)
Multi-stage cluster sampling:
Etude sur les évangéliques (Milieu)(II)
Some 1,850 questionnaires were given out and 1,100 were
returned, giving a response rate of 59.4%. The response rate
was 57.9% (N= 359) for the charismatic group, 54.6%
(N=377) for the moderate and 66.9% (N= 361) for the
fundamentalist group. Being a mail survey, these response
rates can be seen as very satisfactory. The data was collected
between June 2003 and September 2003. This sample can be
said to be representative of the members of evangelical free
churches in Switzerland. For a number of analyses we
aggregated the data sets from 1999 and 2003. One of the
central features of the design of our study on evangelical free
churches was to include a large number of questions that had
already been used in the 1999 survey of the Swiss
population, in order to be able to compare the evangelical
33
milieu to the „societal environment“.
Quantitative approaches
•
Our data stem from two representative surveys, one
conducted in 1999 covering the whole population of
Switzerland, and a second survey from 2003 among the
members of the evangelical free churches in Switzerland.
The first data set (1999) was produced by conducting 1,562
computer-aided telephone interviews (CATI), based on a
random sample of the inhabitants of Switzerland within the
age-range of 16 to 75. Response rate was 54%.
34
Quantitative approaches
Multi-stage cluster sampling:
Etude sur les évangéliques (Milieu) (III)
5. The size of the sample
The second data set (2003) was produced by a mail survey of
1,100 evangelicals from evangelical free churches in
Switzerland, based on a stratified cluster sample. Cluster
sampling was effectuated by randomly choosing evangelical
free churches from a list and then randomly selecting
members from these churches. Stratification was achieved by
dividing the sample into three groups: charismatic, moderate
and fundamentalist. Since the fundamentalist group in our
population only amounts to about 11%, the fundamentalist
stratum was overrepresented in the sample, in order to be
able to make a better comparison between the three groups.
35
36
Quantitative approaches
Quantitative approaches
Size matters!
Size : absolute and relative
It is not the relative but the absolute size that matters.
The larger the sample, the better you fare!
-> A random sample of 1000 has the same «!value!» if the
population is Switzerland or China
With larger samples,
- your estimates of the parameters gain in precision
(confidence intervals are getting smaller)
- the differences you find will become significant easier
- you will be able to make analyses at a more detailed level
(comparing various subgroups etc.)
37
Quantitative approaches
Quantitative approaches
Example : increasing the sample size
decreases the confidence interval
Formula
n
Arithmetic mean = x =
!x
i
What is the true mean in the population?
2
Mean in the sample (n = 105): 4.8
standard deviation (sample) = 1.2
standard error (mean) = 1.2/ 105 = 0.117
confidence interval: true mean = 4.8 +- 1.96 * 0.117
-> between 4.571 et 5.029
i=1
n
n
Variance = s =
2
38
" (x ! x)
n
i
i=1
n !1
Standard error = sx =
Standard deviation = s =
" (x ! x)
2
i
i=1
n !1
Mean in the sample (n = 1000): 4.8
standard deviation (sample) = 1.2
standard error (mean) = 1.2/ 1000 = 0.00694
confidence interval: true mean = 4.8 +- 1.96 * 0.00694
-> entre 4.7864 et 4.8136
s
n
95% confidence interval = X ± z0.25 sx
(z0.25 = 1.96)
39
40
Quantitative approaches
Quantitative approaches
Factors influencing the size of the sample
Coûts:
from n = 1000 on for the sample, the gains in precision are
decreasing
Non-response:
a certain percentage of individuals will refuse to participate;
we therefore have to start out with a larger sample
Heterogeneity:
If the heterogeneity of the the sample is large, we have to
have a larger sample.
The example of the dwarfs
Type of analysis: If we want to analyze the relationship between many
variables at the same time (multivariate analysis), we have
to have a larger sample (e.g. sex * age * political
preference)
41
Quantitative approaches
42
Quantitative approaches
Sampling error: decreases
with growing N
Simulation with R
In this example, we imagine an infinite population of dwarfs. We would
like to know their mean hight and the variance of their hight in the
population.
Simulation with R
plot(c(0,100),c(0,15),type="n",xlab="Sample
size",ylab="Variance", cex.lab=1.2)
for (df in seq(5,100,5)){
for(i in 1:30){
x<-rnorm(df,mean=10,sd=2)
points(df,var(x))}}
The question: how many dwarfs do we have to draw randomly from the
population in order to measure them and then estimate the population
hight and variance?
In the following simulation we draw 30 samples for different N’s (for N=
5,10,15,20....100).
The «!real!» mean in the population is 10 cm. The «!real!» variance in the
population is 4 (standard deviation = 2)
The simulation shows that for samples smaller than N = 40, the estimate
of the mean and variance are very unreliable.
43
44
Quantitative approaches
Quantitative approaches
How estimate of variance becomes more
reliable with growing N
Weighting, change in sample size and their
effect on standard errors : example NCS-CH
Variance in
population: 4
45
46
Quantitative approaches
Quantitative approaches
6. Types of non-probability samples
Types of non-probability samples
6.1
6.2
6.3
47
Convenience sampling
Snowball sampling
Quota sampling
48
Quantitative approaches
Quantitative approaches
6.1 Convenience sampling
Convenience sampling: example
"Nous avons déposé dans les boîtes aux lettres des enseignants - qui existent dans
la plupart des universités - le questionnaire, une note explicative du contenu de
notre recherche, et une enveloppe avec notre adresse afin qu'ils puissent nous
faire parvenir le questionnaire dûment rempli.
La plupart des universités parisiennes - ainsi qu'un bon nombre des plus
importants centres de recherche - sont inclus dans notre enquête. Nous avons
déposé des questionnaires à Paris I, Paris II, Paris III, Paris V, Paris VI, Paris VII,
Sauphine, Paris X-Nanterre, Paris VIII, l'Institut de Sciences Politiques, la
Maison des Sciences de l'Homme, et l'Ecole Normale Supérieure.
271 enseignants nous ont fait parvenir leurs réponses au questionnaire.
Cependant, les 271 réponses ne constituent pas un échantillon représentatif qui
permette de décrire les caractéristiques générales de la population des
enseignants. Par exemple, il ne nous permet pas de déterminer le pourcentage
d'individus qui sont séduits pour les positions de gauche. L'échantillon n'est donc
construit que pour fournir un test et non pour décrire la population des
enseignants parisiens.
Convenience sampling
= We choose the people who are most easily available /
approachable.
Problem:
We do not know for what population these people are
representative / whom they stand for
(Magniberton/Rios, 2003)
49
Quantitative approaches
Quantitative approaches
6.2 Snowball sampling
Snowball sampling =
50
Snowball sampling: example
We ask the first participants for
addresses of other individuals who have
the same characteristics. Every
participant is again asked for still other
participants.
Problem: no representativity
"I conducted fifty interviews with marijuana users. I had
been a professional dance musician for some years when I
conducted this study and my first interviews were with
people I had met in the music business. I asked them to put
me in contact with other users who would be willing to
discuss their experiences with me... Although in the end half
of the fifty interviews were conducted with musicians, the
other half covered a wide range of people, including
laborers, machinists, and people in the professions
(Becker 1963: 45-6)
51
52
Quantitative approaches
Quantitative approaches
6.3 Quota sampling
Quota sampling =
6.3 Quota sampling
Starting with a knowledge of the
population (e.g. 50% of women, 20%
between 18 and 30 etc.), we decide how
many individuals in certain groups
(quotas) the sample should contain.
Example: we need 3 elderly women
living in a rural area in the canton of
Appenzell Innerrhoden). Now, the
interviewers have the responsibility of
finding individuals with these
characteristics.
Problems:
- Not really representative; bias because of the choice and the
networks of the interviewers
- We cannot calculate the standard errors. Statistical inference
from the sample to the population is not permitted.
Advantage:
- faster
- cheaper
Often used in market research
53
54