Statistics 101 for Market Research September 2014 [email protected] Statistics 101 • Sampling • Margin of Error Statements • Weighting • Statistical Significance Testing: t-Test Sampling Sampling: Source River Panel River Sample Selection It is a blend of whatever sample is available at the moment you dip into to try to find respondents for your study. While you can make river sample look good via quotas, we can not control who is included. It’s typically not the highest quality sample. Panel Sample Selection Gen Pop Sample Targeted Sample Pre-screening through Self–Identification Statistics: Probability Sample A probability sample is one where every element in the population has a known/non-zero probability of being selected. • A frame that include every element in the population • Known probability of selection: a random process of selection with the chance of selection equal to predetermined probability Statistical Inferences (drawing conclusions about the population based on sample data) can only be done through the use of probability samples. 7 Sample Selection Population Panel Probability Sample Panel Sample Statistics: Stratified Sample Often the population can be divided into strata, and the sample process can be repeated within each stratum separately. This is called stratified sample. 9 Panel Sample Balancing Panel balancing is essentially a stratified random sample with stratum made up of combinations of many demographic variables. Universe:: AGEP between 18 and 99 Weight used: PWGTP DataSet(s) selected: 2009 population Source: ACS Public Use Microdata Sample Total number of completes - n= 1000 NortheastMidwest South Male 18 34 27 34 58 Male 35-54 34 40 67 Male 55+ 27 32 53 Female 18-34 26 33 56 Female 35-54 35 41 69 Female 55+ 34 39 65 Total 184 219 367 West 39 43 32 36 42 37 230 Total 158 185 144 151 187 175 1,000 - US Gen Pop – Age/Gender/Region 10 Response rate adjusted sample balancing Within the national panels, we have a pretty good idea how likely any given stratum within the balancing target matrix will respond to the survey invite. A sampling analyst may choose to use this information – adjusting for each stratum based on differential response rate. This allows the ending sample (i.e. the number of completed survey) within each stratum to be closer to the census information. For the commonly used national sample balancing matrices, \\Netapp01\projects\NATIONAL_PANELS\01_NATIONAL_PANELS\NatPanel_PR OFILING\Matrices 11 We are rarely interested in true ‘gen pop’ Not really ‘gen pop’ anymore—It’s ‘Gen Pop who…’ Gen Pop Soup buyers …aged 24-54 • Sample pull is GenPoP • In-study screening/ lots of DQ • Keep quota to a minimum and “real” Good weighting options …moms Example: Do you really need this? Is it right? Probably NOT. Quotas Kids 13-18 @ home Yes No Mom's Age 24-39 n=100 n=100 40-54 n=100 n=100 Targeted Sample Or we can sample only among self-identified Soup buyers Soup buyers …aged 24-54 …moms • Sample pull is from self-identified Examples: You have to have quota Soup buyers • • • • In-study screening, Less DQ Use quota to make sample “look” good. Tricky to weight. “Keeners” effect Quotas Kids 13-18 @ home Yes No ? Mom's Age 24-39 n=100 n=100 40-54 n=100 n=100 What is a Router Sample? Complete Complete Y Origin Study 1 DQ Y Complete … Origin Study 2 DQ Origin Study K DQ Y N N N Want to be Routed? N DQ Y Router Study 1 Router Study 2 … Router Study #P Margin of Error Statement With panels, the panel is the population and the frame. We can draw probability samples from the panel and make inferences from the sample to the panel only. From June 18th to June 19th 2014 an online survey was conducted among 1,510 randomly selected Canadian adults who are Angus Reid Forum panelists. For comparison purposes, a probability sample of this size has a margin of error of +/- 2.5%, 19 times out of 20. Inferences from panel sample to the general population is still a dance step away. However, through the use of sampling balancing and weighting, we can create a sample as closely matching general population demographic characteristics as possible. 15 Weighting Weighting • Weighting is when we run into issues in sampling • The effect of weighting is cosmetic: • Makes sample “look” good • Does not fix structural problems. 17 It’s all about Proportions Ideal Sample Panel Sample Weighted Sample 18 Weighting Efficiency Weighted Sample Unweighted Ideal Sample n=1,000 weighting efficiency=20% n=200 weighting efficiency=100% 19 Two Principles 1. Use good information 2. Weight with a light hand 20 Good Information Relevant, Credible, Independent • Information match what’s in the sample • Trusted Sources – in order of preference: 1. 2. 3. 4. 5. Census Large scale studies from national agencies Industry (databases, publications, etc…) Validated Historical data When all else fails, VC Omni studies – our best effort at a GenPop sample. • National Panels Mosaic study is not a good source. At a minimum, run Omni to validate. 21 Weight with a light hand • Only weight where you need to • Take the “goodness” of source into account • Use as few variables as you can • Use as broad classification as you need • E.g. 5 regions vs. 10 province • Use RIM weighting if possible • RIM weighting creates the least amount of distortion to the data, and results in the best weighting efficiency. 22 Statistical Significance Testing Statistics: Significance Testing • A form of Hypothesis Testing: • Need to form a hypothesis before you can test • E.g. Consumers are more likely to purchase Concept A than our current product. • Concept A has a higher PI (T2B) than Control Concept • Can only be applied to a probability sample 24 t-Test • Applied to any sample • We can only test for differences in our sample, not the population • Applied to all data tables: proportion, mean • Applied to any 2 (non-overlapped) subgroups. • High degree of data fishing 25 Data Fishing http://xkcd.com/ 26 t-Test Total Atlantic(A) Quebec(B) Ontario(C) Prairies(D) BC (E) BDC Client Are you a BDC client? Base 732 595 732 100.0% 294 266 335 40.2% 51 50 58 100.0% 33 156 151 190 100.0% 79 257 208 248 100.0% 82 152 101 124 100.0% 62 116 95 112 100.0% 38 63.9% CE 50.5% C 32.0% 40.9% 33.0% 64 61 76 8.7% 2 15 17 15 15 4.3% 9.9% 6.5% 9.7% 12.8% 16 62 158 75 63 31.8% 39.6% 61.5% AB 49.5% 54.2% Yes No, but I used to be 374 275 321 51.1% No, I have never been a BDC client 27 Bonferroni Correction • Available on Quick Report data tables (by default) • Apply a more stringent criterion for declaring significance • Family-wise Confidence level • Help reducing risk of data fishing • Report all significant differences flagged 28 Wincross Data Tables • No Bonferroni correction • t-Test options for non-independent samples • Useful for sequential monadic concept testing 29
© Copyright 2026 Paperzz