RESEARCH METHODS IN LINGUISTICS 1302740 Lecture (3) Population Samples Learn the reasons for sampling Develop an understanding about different sampling methods Distinguish between probability & non probability sampling Discuss the relative advantages & disadvantages of each sampling methods 2 Population the group you are ultimately interested in knowing more about their linguistic behaviour On the basis of sample study we can predict and generalize the behavior of mass phenomena. “entire aggregation of cases that meets a designated set of criteria". A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005) Sample vs. Census Census: an accounting of the complete population A census study occurs if the entire population is very small or it is reasonable to include the entire population (for other reasons). It is called a census sample because data is gathered on every member of the population. Why sample? The population of interest is usually too large to attempt to survey all of its members. Resources (time, money) and workload So… A carefully chosen sample can be used to represent the population. The sample reflects the characteristics of the population from which it is drawn. Gives results with known accuracy that can be calculated mathematically If all members of a population were identical, the population is considered to be homogenous. That is, the characteristics of any one individual in the population would be the same as the characteristics of any other individual (little or no variation among individuals). So, if the human population on Earth was homogenous in characteristics, how many people would an alien need to abduct in order to understand what humans were like? When individual members of a population are different from each other, the population is considered to be heterogeneous (having significant variation among individuals). How does this change an alien’s abduction scheme to find out more about humans? In order to describe a heterogeneous population, observations of multiple individuals are needed to account for all possible characteristics that may exist. What you want to talk about What you actually observe in the data Population Sampling Process Sample Sampling Frame Inference Using data to say something (make an inference) with confidence, about a whole (population) based on the study of a only a few (sample). If a sample of a population is to provide useful (linguistic) information about that population, then the sample must contain essentially the same (linguistic) variation as the population. The more heterogeneous a population is… The greater the chance is that a sample may not adequately describe a population we could be wrong in the inferences we make about the population. And… The larger the sample needs to be to adequately describe the population we need more observations to be able to make accurate inferences. Sampling is the process of selecting observations (a sample) to provide an adequate description and robust inferences of the population The sample is representative of the population. The deviation between an estimate from an ideal sample and the true population value is the sampling error. Almost always, the sampling frame does not match up perfectly with the target population, leading to errors of coverage. Non-response is probably the most serious of these errors. Arises in three ways: 1. Inability of the person responding to come up with the answer 2. Refusal to answer 3. Inability to contact the sampled elements These errors can be classified as due to the interviewer, respondent, instrument, or method of data collection. Interviewers have a direct and dramatic effect on the way a person responds to a question. Most people tend to side with the view apparently favored by the interviewer, especially if they are neutral. Friendly interviewers are more successful. In general, interviewers of the same gender, racial, and ethnic groups as those being interviewed are slightly more successful. Respondents differ greatly in motivation to answer correctly and in ability to do so. Obtaining an honest response to sensitive questions is difficult. Basic errors Recall bias: simply does not remember Prestige bias: exaggerates to ‘look’ better Intentional deception: lying Incorrect measurement: does not understand the units or definition There are 2 types of sampling: Non-Probability sampling Probability sampling Probability Samples: each member of the population has a known non-zero probability of being selected Methods include random sampling, systematic sampling, and stratified sampling. Nonprobability Samples: members are selected from the population in some nonrandom manner Methods include convenience sampling, judgment sampling, quota sampling, and snowball sampling Probability Samples: each member of the population has a known non-zero probability of being selected Methods include 1. (simple) random sampling 2. systematic sampling 3. stratified sampling Random sampling is the purest form of probability sampling. Each member of the population has an equal and known chance of being selected. When there are very large populations, it is often ‘difficult’ to identify every member of the population, so the pool of available subjects becomes biased. You can use software to generate random numbers or to draw directly from the columns of random numbers Lottery method Random number tables Define the population Determine percentage to be interviewed or studied Each individual has an equal chance of selection Random sample becomes representative of the larger whole List of population Random subsample advantages… • …easy to conduct • …strategy requires minimum knowledge of the population to be sampled disadvantages… • …need names of all population members • …may over- represent or under- estimate sample members • …there is difficulty in reaching all selected in the sample Systematic sampling is often used instead of random sampling. It is also called an Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity (and possibly cost effectiveness). Procedure Number units in population from 1 to N. Decide on the n that you want or need. N/n=k the interval size. Randomly select a number from 1 to k. Take every kth unit. N = 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 N = 100 Want n = 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 N (population) = 100 n (sample) = 20 N/n (interval) = 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 N = 100 Want n = 20 N/n = 5 Select a random number from 1-5: chose 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 N = 100 Want n = 20 N/n = 5 Select a random number from 1-5: chose 4 Start with #4 and take every 5th unit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 advantages… • …sample selection is simple • may be more precise than simple random sample. disadvantages… • …all members of the population do not have an equal chance of being selected • …the Kth person may be related to a periodical order in the population list, producing unrepresentativeness in the sample Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error. Sometimes called "proportional" or "quota" random sampling. A stratum is a subset of the population that share at least one common characteristic; such as males and females. Identify relevant stratums and their actual representation in the population. Random sampling is then used to select a sufficient number of subjects from each stratum. Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums. • Objective: Population of N units divided into non-overlapping strata N1, N2, N3, ... Ni such that N1 + N2 + ... + Ni = N; then do simple random sample of n/N in each strata. • To insure representation of each strata, oversample smaller population groups. • Sampling problems may differ in each strata. • Increase precision (lower variance) if strata are homogeneous within. List of clients List of clients African-American Strata Hispanic-American Others List of clients African-American Hispanic-American Others Strata Random subsamples of n/N advantages… • …more precise sample • …can be used for both proportions and stratification sampling • …sample represents the desired strata disadvantages • …need names of all population members • …there is difficulty in reaching all selected in the sample Nonprobability Samples: “Members are selected from the population in some nonrandom manner” (Barreiro, 2009) Methods include 1. convenience sampling 2. judgment sampling 3. quota sampling 4. snowball sampling Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation. The sample is selected because they are convenient (to the researcher). It is a nonprobability method. Often used during preliminary research (pilot studies) efforts to get an estimate without incurring the cost or time required to select a random sample Exploratory research Inexpensive approximation Ex: preliminary research efforts to attain the number of L1, L2, …., Ln speakers at university Saves time and money selected because they are willing and available Convenience samples: samples drawn at the convenience of the interviewer. People tend to make the selection at familiar locations and to choose respondents who are like themselves. Error occurs in the form of members of the population who are infrequent or nonusers of that location 1) 1. who are not typical in the population advantages… • useful in pilot studies. disadvantages… • …difficulty in determining how much of the effect (dependent variable) results from the cause (independent variable) Judgment (Purposive) sampling is a common nonprobability method. The sample is selected based upon judgment. an extension of convenience sampling Researcher's knowledge is used to hand pick the cases to be included in the sample When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population. Subjective judgment “The person who is selecting the sample is who tries to make the sample representative, depending on his opinion or purpose, thus being the representation subject” (Barreiro, 2009) Requires researcher confidence that the sample truly represents an entire population disadvantages… • Small no. of sampling units • Study unknown traits/case sampling disadvantages… • …potential for inaccuracy in the researcher’s criteria and resulting sample selections • Personal prejudice & bias • No objective way of evaluating reliability of results Quota sampling is the nonprobability equivalent of stratified sampling. First identify the stratums and their proportions as they are represented in the population Then convenience or judgment sampling is used to select the required number of subjects from each stratum. Convenience or judgment sampling to fill quota from specific sub-groups of a population Ex: Interviewer is instructed to interview 50 males between the ages of 18-25 Useful when: Time is limited Money restraints Detailed accuracy is not important disadvantages… • …people who are less accessible (more difficult to contact, more reluctant to participate) are under-represented Snowball sampling is a special nonprobability method used when the desired sample characteristic is rare. It may be extremely difficult or cost prohibitive to locate respondents in these situations. This technique relies on referrals from initial subjects to generate additional subjects (friend-of-friend). It lowers search costs; however, it introduces bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population. disadvantages… disadvantages… • access to difficult to • not representative of the reach populations (other population and will result in methods may not yield a biased sample as it is selfany results). selecting. • Convenient • Economical Rarely representative of researcher's target population not every element in the population has a chance of being included in the sample Must be cautious about inferences and conclusions drawn from the data The more heterogeneous a population is, the larger the sample needs to be. Depends on topic – frequently it occurs? For probability sampling, the larger the sample size, the better. With nonprobability samples, not generalizable regardless – still consider stability of results About 20 – 30% usually return a questionnaire Follow up techniques could bring it up to about 50% Still, response rates under 60 – 70% challenge the integrity of the random sample How the survey is distributed can affect the quality of sampling Sample size depends on: How much sampling error can be tolerated—levels of precision Size of the population—sample size matters with small populations Variation within the population with respect to the characteristic of interest—what you are investigating Smallest subgroup within the sample for which estimates are needed Sample needs to be big enough to properly estimate the smallest subgroup Rule of thumb: “the larger the sample size, the more closely your sample data will match that from the population” (Birchall, 2009) Key factors to consider: How accurate you wish to be How confident you are in the results What budget you have available http://www.surveysystem.com/sscalc.htm http://www.ezsurvey.com/samplesize.html http://www.macorr.com/ss_calculator.htm List the research goals (usually some combination of accuracy, precision, and/or cost). Identify potential sampling methods that might effectively achieve those goals. Test the ability of each method to achieve each goal. Choose the method that does the best job of achieving the goals. Power: statistical method used to determine sample size “Statistical power is the ability to detect a true difference when, in fact, a true difference exists in the population of interest.” McNamara (1994), p. 56 The larger the sample the more representative of the population it is likely to be. When expected differences between groups are large a large sample is not needed to ensure that differences will be revealed in statistical analysis When expected differences are small a large sample is needed to show differences in statistical analysis "A large sample cannot correct for a faulty sampling design". Must assess both the size of the sample & the method by which the sample is selected. • Sample plan: definite sequence of steps that the researcher goes through in order to draw and ultimately arrive at the final sample • Step 1: Define the relevant population. • Specify the descriptors, geographic locations, and time for the sampling units. • Step 2: Obtain a population list, if possible; may only be some type of sample frame • List brokers, government units, customer lists, competitors’ lists, association lists, directories, etc. • Step 2 (concluded): • Incidence rate (occurrence of certain types in the population, the lower the incidence the larger the required list needed to draw sample from) • Step 3: Design the sample method (size and method). • Determine specific sampling method to be used. All necessary steps must be specified (sample frame, n, … recontacts, and replacements) • Step 4: Draw the sample. • Select the sample unit and gain the information • Step 4 (Continued): • Drop-down substitution • Oversampling • Resampling • Step 5: Assess the sample. • Sample validation – compare sample profile with population profile; check nonresponders • Step 6: Resample if necessary.
© Copyright 2026 Paperzz