Why sample? - WordPress.com

RESEARCH METHODS IN LINGUISTICS
1302740
Lecture (3)
Population Samples
Learn the reasons for sampling

Develop an understanding about different sampling methods

Distinguish between probability & non probability sampling

Discuss the relative advantages & disadvantages of each
sampling methods
2


Population
 the group you are ultimately interested in knowing more
about their linguistic behaviour
 On the basis of sample study we can predict and generalize
the behavior of mass phenomena.
 “entire aggregation of cases that meets a designated set of
criteria".

A sample is “a smaller (but hopefully representative)
collection of units from a population used to determine truths
about that population” (Field, 2005)
Sample vs. Census



Census: an accounting of the complete population
A census study occurs if the entire population is very small or it is
reasonable to include the entire population (for other reasons).
It is called a census sample because data is gathered on every
member of the population.

Why sample?
 The population of interest is usually too large to attempt
to survey all of its members.

Resources (time, money) and workload
So…
 A carefully chosen sample can be used to represent the
population.

The sample reflects the characteristics of the population
from which it is drawn.

Gives results with known accuracy that can be calculated
mathematically

If all members of a population were identical, the
population is considered to be homogenous.

That is, the characteristics of any one individual in the
population would be the same as the characteristics of any
other individual (little or no variation among individuals).

So, if the human population on Earth was homogenous in
characteristics, how many people would an alien need to
abduct in order to understand what humans were like?

When individual members of a population are different from
each other, the population is considered to be heterogeneous
(having significant variation among individuals).

How does this change an alien’s abduction scheme to find out
more about humans?

In order to describe a heterogeneous population, observations
of multiple individuals are needed to account for all possible
characteristics that may exist.
What you
want to talk
about
What you
actually
observe in
the data
Population
Sampling Process
Sample
Sampling
Frame
Inference
Using data to say something (make an inference) with confidence, about
a whole (population) based on the study of a only a few (sample).

If a sample of a population is to provide useful (linguistic) information
about that population, then the sample must contain essentially the same
(linguistic) variation as the population.

The more heterogeneous a population is…
 The greater the chance is that a sample may not adequately describe a
population we could be wrong in the inferences we make about the
population.

And…

The larger the sample needs to be to adequately describe the
population we need more observations to be able to make accurate
inferences.

Sampling is the process of selecting observations (a
sample) to provide an adequate description and robust
inferences of the population
 The sample is representative of the population.

The deviation between an estimate from an ideal
sample and the true population value is the sampling
error.

Almost always, the sampling frame does not match up
perfectly with the target population, leading to errors of
coverage.

Non-response is probably the most serious of these
errors.

Arises in three ways:
1.
Inability of the person responding to come up
with the answer
2.
Refusal to answer
3.
Inability to contact the sampled elements

These errors can be classified as due to the interviewer,
respondent, instrument, or method of data collection.

Interviewers have a direct and dramatic effect on the way
a person responds to a question.

Most people tend to side with the view apparently
favored by the interviewer, especially if they are
neutral.

Friendly interviewers are more successful.

In general, interviewers of the same gender, racial,
and ethnic groups as those being interviewed are
slightly more successful.

Respondents differ greatly in motivation to answer
correctly and in ability to do so.

Obtaining an honest response to sensitive questions is
difficult.

Basic errors
 Recall bias: simply does not remember
 Prestige bias: exaggerates to ‘look’ better
 Intentional deception: lying
 Incorrect measurement: does not understand the units
or definition

There are 2 types of sampling:
 Non-Probability sampling
 Probability sampling

Probability Samples: each member of the population has a
known non-zero probability of being selected


Methods include random sampling, systematic sampling,
and stratified sampling.
Nonprobability Samples: members are selected from the
population in some nonrandom manner

Methods include convenience sampling, judgment
sampling, quota sampling, and snowball sampling

Probability Samples: each member of the population has a
known non-zero probability of being selected

Methods include
1. (simple) random sampling
2. systematic sampling
3. stratified sampling
Random sampling is the purest form of probability
sampling.

Each member of the population has an equal and known
chance of being selected.

When there are very large populations, it is often
‘difficult’ to identify every member of the population, so
the pool of available subjects becomes biased.

You can use software to generate random numbers or
to draw directly from the columns of random numbers
Lottery method
Random number tables
Define the population
 Determine percentage to
be interviewed or studied
 Each individual has an
equal chance of selection
 Random sample becomes
representative of the larger
whole

List of population
Random subsample
advantages…
• …easy to conduct
• …strategy requires
minimum knowledge
of the population to be
sampled
disadvantages…
• …need names of all
population members
• …may over- represent
or under- estimate
sample members
• …there is difficulty in
reaching all selected
in the sample

Systematic sampling is often used instead of random
sampling. It is also called an Nth name selection
technique.

After the required sample size has been calculated, every
Nth record is selected from a list of population members.

As long as the list does not contain any hidden order, this
sampling method is as good as the random sampling
method.

Its only advantage over the random sampling technique is
simplicity (and possibly cost effectiveness).
Procedure
Number units in population from
1 to N.
 Decide on the n that you want or
need.
 N/n=k the interval size.
 Randomly select a number from
1 to k.
 Take every kth unit.

N = 100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
N = 100
Want n = 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
N (population) = 100
n (sample) = 20
N/n (interval) = 5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
N = 100
Want n = 20
N/n = 5
Select a random number from 1-5:
chose 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
N = 100
Want n = 20
N/n = 5
Select a random number from 1-5:
chose 4
Start with #4 and take every 5th unit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
advantages…
• …sample selection
is simple
• may be more precise
than simple random
sample.
disadvantages…
• …all members of the
population do not have
an equal chance of
being selected
• …the Kth person may
be related to a
periodical order in the
population list,
producing
unrepresentativeness in
the sample

Stratified sampling is commonly used probability method that is
superior to random sampling because it reduces sampling error.

Sometimes called "proportional" or "quota" random sampling.

A stratum is a subset of the population that share at least one
common characteristic; such as males and females.

Identify relevant stratums and their actual representation in the
population.

Random sampling is then used to select a sufficient number of subjects
from each stratum.

Stratified sampling is often used when one or more of the stratums in
the population have a low incidence relative to the other stratums.
• Objective: Population of N units divided into non-overlapping
strata N1, N2, N3, ... Ni such that N1 + N2 + ... + Ni = N; then
do simple random sample of n/N in each strata.
• To insure representation of each strata, oversample smaller
population groups.
• Sampling problems may differ in each strata.
• Increase precision (lower variance) if strata are homogeneous
within.
List of clients
List of clients
African-American
Strata
Hispanic-American
Others
List of clients
African-American
Hispanic-American
Others
Strata
Random subsamples of n/N
advantages…
• …more precise sample
• …can be used for both
proportions and
stratification sampling
• …sample represents
the desired strata
disadvantages
• …need names of all
population
members
• …there is difficulty
in reaching all
selected in the
sample

Nonprobability Samples: “Members are selected from the
population in some nonrandom manner” (Barreiro, 2009)
Methods include
1. convenience sampling
2. judgment sampling
3. quota sampling
4. snowball sampling

Convenience sampling is used in
exploratory research where the
researcher is interested in getting
an inexpensive approximation.

The sample is selected because
they are convenient (to the
researcher).

It is a nonprobability method.
 Often used during preliminary
research (pilot studies) efforts to
get
an
estimate
without
incurring the cost or time
required to select a random
sample
Exploratory research
 Inexpensive approximation
 Ex: preliminary research
efforts to attain the number
of L1, L2, …., Ln speakers at
university
 Saves time and money
 selected because they are
willing and available


Convenience samples: samples drawn at the
convenience of the interviewer. People tend to make
the selection at familiar locations and to choose
respondents who are like themselves.

Error occurs
in the form of members of the population who are
infrequent or nonusers of that location
1)
1.
who are not typical in the population
advantages…
• useful in pilot studies.
disadvantages…
• …difficulty in
determining how
much of the effect
(dependent variable)
results from the
cause (independent
variable)

Judgment (Purposive) sampling is a common
nonprobability method.

The sample is selected based upon judgment.

an extension of convenience sampling

Researcher's knowledge is used to hand pick the cases
to be included in the sample

When using this method, the researcher must be
confident that the chosen sample is truly
representative of the entire population.

Subjective judgment

“The person who is selecting the sample is who tries to
make the sample representative, depending on his
opinion or purpose, thus being the representation
subject” (Barreiro, 2009)

Requires researcher confidence that the sample truly
represents an entire population
disadvantages…
• Small no. of sampling
units
• Study unknown
traits/case sampling
disadvantages…
• …potential for
inaccuracy in the
researcher’s criteria
and resulting sample
selections
• Personal prejudice &
bias
• No objective way of
evaluating reliability of
results

Quota sampling is the nonprobability equivalent of
stratified sampling.

First identify the stratums and their proportions as
they are represented in the population

Then convenience or judgment sampling is used to
select the required number of subjects from each
stratum.

Convenience or judgment sampling to fill quota from
specific sub-groups of a population
 Ex: Interviewer is instructed to interview 50 males
between the ages of 18-25

Useful when:
 Time is limited
 Money restraints
 Detailed accuracy
is not important
disadvantages…
• …people who are less
accessible (more
difficult to contact,
more reluctant to
participate) are
under-represented

Snowball sampling is a special
nonprobability method used when the
desired sample characteristic is rare.

It may be extremely difficult or cost
prohibitive to locate respondents in these
situations.

This technique relies on referrals from
initial subjects to generate additional
subjects (friend-of-friend).

It lowers search costs; however, it
introduces bias because the technique itself
reduces the likelihood that the sample will
represent a good cross section from the
population.
disadvantages…
disadvantages…
• access to difficult to
• not representative of the
reach populations (other
population and will result in
methods may not yield
a biased sample as it is selfany results).
selecting.
• Convenient
• Economical
Rarely representative of researcher's target population not every element in the population has a chance of
being included in the sample
 Must be cautious about inferences and conclusions
drawn from the data


The more heterogeneous a population is, the larger the
sample needs to be.

Depends on topic – frequently it occurs?

For probability sampling, the larger the sample size, the
better.

With nonprobability samples, not generalizable
regardless – still consider stability of results

About 20 – 30% usually return a questionnaire

Follow up techniques could bring it up to about 50%

Still, response rates under 60 – 70% challenge the
integrity of the random sample

How the survey is distributed can affect the quality of
sampling

Sample size depends on:
 How much sampling error can be tolerated—levels of
precision
 Size of the population—sample size matters with small
populations
 Variation within the population with respect to the
characteristic of interest—what you are investigating
 Smallest subgroup within the sample for which estimates
are needed
 Sample needs to be big enough to properly estimate the
smallest subgroup

Rule of thumb: “the larger the sample size, the more
closely your sample data will match that from the
population” (Birchall, 2009)

Key factors to consider:
How accurate you wish to be
 How confident you are in the results
 What budget you have available


http://www.surveysystem.com/sscalc.htm

http://www.ezsurvey.com/samplesize.html

http://www.macorr.com/ss_calculator.htm

List the research goals (usually some combination of
accuracy, precision, and/or cost).

Identify potential sampling methods that might effectively
achieve those goals.

Test the ability of each method to achieve each goal.

Choose the method that does the best job of achieving the
goals.

Power:
 statistical method used to determine sample size
 “Statistical power is the ability to detect a true difference
when, in fact, a true difference exists in the population of
interest.” McNamara (1994), p. 56

The larger the sample the more representative of the population
it is likely to be.

When expected differences between groups are large a
large sample is not needed to ensure that differences will
be revealed in statistical analysis

When expected differences are small a large sample is
needed to show differences in statistical analysis

"A large sample cannot correct for a faulty sampling
design".

Must assess both the size of the sample & the method by
which the sample is selected.
• Sample plan: definite sequence of steps that the researcher
goes through in order to draw and ultimately arrive at the
final sample
• Step 1:
Define the relevant population.
• Specify the descriptors, geographic
locations, and time for the sampling units.
• Step 2:
Obtain a population list, if possible;
may only be some type of sample
frame
• List brokers, government units, customer
lists, competitors’ lists, association lists,
directories, etc.
• Step 2 (concluded):
• Incidence rate (occurrence of certain types in
the population, the lower the incidence the
larger the required list needed to draw
sample from)
• Step 3:
Design the sample method (size and
method).
• Determine specific sampling method to be
used. All necessary steps must be specified
(sample frame, n, … recontacts, and
replacements)
• Step 4:
Draw the sample.
• Select the sample unit and gain the
information
• Step 4 (Continued):
• Drop-down substitution
• Oversampling
• Resampling
• Step 5:
Assess the sample.
• Sample validation – compare sample profile
with population profile; check nonresponders
• Step 6:
Resample if necessary.