Sampling for Variety Matti Miestamo Dik Bakker Antti Arppe Overview Sampling for Variety 2 Overview 1. Introduction: Probability vs Variety Sampling Sampling for Variety 3 Overview 1. Introduction: Probability vs Variety Sampling 2. Looking for Variety Sampling for Variety 4 Overview 1. Introduction: Probability vs Variety Sampling 2. Looking for Variety 3. The Experiment: DV versus GM Sampling for Variety 5 Overview 1. Introduction: Probability vs Variety Sampling 2. Looking for Variety 3. The Experiment: DV versus GM 4. Conclusions Sampling for Variety 6 1. Probability vs Variety Sampling Sampling for Variety 7 Linguistic data analysis Analysis Sampling for Variety Report 8 Linguistic data analysis Analysis Report All languages of the world: only guarantee that no existing possibility is missed (cf extinct & future languages ) Sampling for Variety 9 Linguistic data analysis Analysis Report All languages of the world: - Impossible: gaps in description, time considerations etc. Sampling for Variety 10 Linguistic data analysis Analysis Report All languages of the world - Impossible Sampling for Variety 11 Tendencies & Correlations Analysis All languages of the world - Impossible - Not desired Sampling for Variety 12 Tendencies & Correlations Analysis Tendencies & Correlations All languages of the world - Impossible - Not desired Sampling for Variety 13 Tendencies & Correlations Analysis Tendencies & Correlations All languages of the world - Impossible - Not desired Sampling for Variety 14 Exploring Existing Variety Analysis All languages of the world - Impossible - Not desired - Not necessary Sampling for Variety 15 Exploring Existing Variety Analysis Existing Variety All languages of the world - Impossible - Not desired - Not necessary Sampling for Variety 16 Exploring Existing Variety Analysis Existing Variety All languages of the world - Impossible - Not desired - Not necessary Sampling for Variety 17 Exploring Existing Variety Analysis Sampling for Variety Existing Variety 18 2. Looking for Variety Sampling for Variety 19 Variety Sampling Some methods that could be applied for finding variety: Sampling for Variety 20 Variety Sampling Some methods that could be applied for finding variety: 1. Tomlin (1986) Sampling for Variety 21 Variety Sampling Some methods that could be applied for finding variety: 1. Tomlin (1986) 2. Dryer (1989) Sampling for Variety 22 Variety Sampling Some methods that could be applied for finding variety: 1. Tomlin (1986) 2. Dryer (1989) 3. Nichols (1992) Sampling for Variety 23 Variety Sampling Some methods that could be applied for finding variety: 1. Tomlin (1986) 2. Dryer (1989) 3. Nichols (1992) 4. Bybee, Perkins & Pagliuca (1994) Sampling for Variety 24 Variety Sampling Some methods that could be applied for finding variety: 1. Tomlin (1986) 2. Dryer (1989) 3. Nichols (1992) 4. Bybee, Perkins & Pagliuca (1994) … Sampling for Variety 25 Variety Sampling Some methods designed for finding variety: Sampling for Variety 26 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method Sampling for Variety 27 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method - All independent families represented, extra number of languages taken from each grouping determined by their Diversity Value Sampling for Variety 28 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method - All independent families represented, extra number of languages taken from each grouping determined by their Diversity Value - No areal stratification Sampling for Variety 29 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method - All independent families represented, number of languages taken from each grouping determined by their Diversity Value - No areal stratification - Problem: dependent on the details of classifications, but these are often uncertain and incommensurable across the world Sampling for Variety 30 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method - All independent families represented, exrtra number of languages taken from each grouping determined by their Diversity Value - No areal stratification - Problem: dependent on the details of classifications, but these are often uncertain and incommensurable across the world B. Miestamo (2003, 2005) – GM method Sampling for Variety 31 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method - All independent families represented, number of languages taken from each grouping determined by their Diversity Value - No areal stratification - Problem: dependent on the details of classifications, but these are often uncertain and incommensurable across the world B. Miestamo (2003, 2005) – GM method - All Genera (Dryer 1989) are represented Sampling for Variety 32 Variety Sampling Some methods designed for finding variety: A. Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method - All independent families represented, number of languages taken from each grouping determined by their Diversity Value - No areal stratification - Problem: dependent on the details of classifications, but these are often uncertain and incommensurable across the world B. Miestamo (2003, 2005) – GM method - All Genera (Dryer 1989) are represented - Areal stratification > Macro-areas (Dryer 1989) Sampling for Variety 33 Variety Sampling Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method Miestamo (2003, 2005) – GM method Comparison DV versus GM method: Sampling for Variety 34 Variety Sampling Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method Miestamo (2003, 2005) – GM method Comparison DV versus GM method: - specifically designed for variety sampling Sampling for Variety 35 Variety Sampling Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method Miestamo (2003, 2005) – GM method Comparison DV versus GM method: - specifically designed for variety sampling - most explicit general methods available Sampling for Variety 36 Variety Sampling Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method Miestamo (2003, 2005) – GM method Comparison DV versus GM method: - specifically designed for variety sampling - most explicit general methods available - implemented computationally Sampling for Variety 37 Variety Sampling Rijkhoff & al. (1993); Rijkhoff & Bakker (1998) – DV method Miestamo (2003, 2005) – GM method Comparison DV versus GM method: - specifically designed for variety sampling - most explicit general methods available - implemented computationally - replicable Sampling for Variety 38 A. Diversity Value method Sampling for Variety 39 Diversity Value method Basis: Tree-shaped Genealogical classification (e.g. Ethnologue-16 Ethnologue-15 WALS Ruhlen 1991 Voegelin&Voegelin …) Sampling for Variety 40 Diversity Value method Basis: Tree-shaped Genealogical classification Basic Sample (BS): one language per family (= highest node) Sampling for Variety 41 Diversity Value method Basis: Tree-shaped Genealogical classification Basic Sample (BS): one language per family (= highest node) Ethn-15 147 WALS 209 Ruhlen 22 minimum sample size Sampling for Variety 42 Diversity Value method Basis: Tree-shaped Genealogical classification Basic Sample (BS): one language per family (= highest node) Small Sample (< minimum) Sampling for Variety 43 Diversity Value method Basis: Tree-shaped Genealogical classification Basic Sample (BS): one language per family (= highest node) Small Sample (< minimum): Random (1 per family) Sampling for Variety 44 Diversity Value method Basis: Tree-shaped Genealogical classification Basic Sample (BS): one language per family (= highest node) Small Sample (< minimum): Random (1 per family) Extended Sample (> minimum) Sampling for Variety 45 Diversity Value method Basis: Tree-shaped Genealogical classification Basic Sample (BS): one language per family (= highest node) Small Sample (< minimum): Random (1 per family) Extended Sample (> minimum): 1 + DV value Sampling for Variety 46 Diversity Value method Extended Sample: 1 + DV value DV value: weight of a family tree based on recursively calculated complexity of all subtrees Sampling for Variety 47 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i Sampling for Variety 48 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i Fam_k Sampling for Variety 49 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i > Sampling for Variety Fam_k 50 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=3 > Sampling for Variety Fam_k DV=2 51 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=3 = Sampling for Variety Fam_k DV=3 52 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=3 Fam_k DV=3 9 Sampling for Variety 53 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=3 Fam_k DV=3 6 9 Sampling for Variety 54 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=7.5 Fam_k DV=3 4.5 6 9 Sampling for Variety 55 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=7.5 Fam_k DV=6 4.5 3.0 6 9 Sampling for Variety 56 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=7.5 > Fam_k DV=6 4.5 3.0 6 9 Sampling for Variety 57 Diversity Value method Extended Sample: 1 + DV value DV value: Fam_i DV=7.5 Fam_k DV=6 etcetera … Sampling for Variety 58 Diversity Value method Extended Sample: 1 + DV value Sampling for Variety 59 Diversity Value method Extended Sample: 1 + DV value Sampling for Variety 60 Diversity Value method Extended Sample: 1 + DV value Sampling for Variety 61 Diversity Value method Extended Sample: 1 + DV value Sampling for Variety 62 B. Genus Macro-area method Sampling for Variety 63 Genus Macro-area method Basis: Genealogical Genera (Dryer 1989, 2005, 2008) Sampling for Variety 64 Genus Macro-area method Basis: Genealogical Genera (Dryer 1989, 2005, 2008) - Time depth comparable Sampling for Variety 65 Genus Macro-area method Basis: Genealogical Genera (Dryer 1989, 2005, 2008) - Time depth comparable - Widely accepted Sampling for Variety 66 Genus Macro-area method Basis: Genealogical Genera (Dryer 1989, 2005, 2008) - Time depth comparable - Relatively uncontroversial - Relatively large size Sampling for Variety 67 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Sampling for Variety 68 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) How to establish a GM sample of a specific size? Sampling for Variety 69 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 70 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 71 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 72 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 73 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 ( RANDOM MOD FAMILY ) Africa: 7 Families: 70 > 15 deleted iteratively from largest family Sampling for Variety 74 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 75 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 76 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n < 475 Sampling for Variety 77 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 Sampling for Variety 78 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 Sampling for Variety 79 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 ( RANDOM MOD FAMILY ) Sampling for Variety 80 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 ( RANDOM MOD FAMILY ) Iteratively assigned to proportionally least represented family wrt genera Sampling for Variety 81 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 Sampling for Variety 82 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 Sampling for Variety 83 Genus Macro-area method Genus Sample (GS): 1 language per Genus (n=475) Basic Sample (BS): n > 475 … Sampling for Variety 84 Variety Sampling Variety Sampling: which method is the best? Sampling for Variety 85 3. The Experiment: DV versus GM Sampling for Variety 86 DV versus GM Sampling for Variety 87 DV versus GM Procedure: Sampling for Variety 88 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM Sampling for Variety 89 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM 2. Compare variety for each sample Sn for DV and GM Sampling for Variety 90 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM 2. Compare variety for each sample Sn for DV and GM 3. Determine which of the two has highest average variety Sampling for Variety 91 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM 2. Compare variety for each sample Sn for DV and GM 3. Determine which of the two has highest average variety Points: Sampling for Variety 92 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM 2. Compare variety for each sample Sn for DV and GM 3. Determine which of the two has highest average variety Points: a. Which classification(s) to use? Sampling for Variety 93 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM 2. Compare variety for each sample Sn for DV and GM 3. Determine which of the two has highest average variety Points: a. Which classification(s)? b. Which typological variables to measure variety on? Sampling for Variety 94 DV versus GM Procedure: 1. Generate a sample of size S1, S2, S3, … for both DV and GM 2. Compare variety for each sample Sn for DV and GM 3. Determine which of the two has highest average variety Points: a. Which classification(s)? b. Which typological variables? c. How test? Sampling for Variety 95 DV versus GM a. Which classification(s)? Sampling for Variety 96 DV versus GM a. Which classification(s)? DV: must have ‘depth’: Ethnologue (#15); n of lgs = 7299 Sampling for Variety 97 DV versus GM a. Which classification(s)? DV: must have ‘depth’: Ethnologue (#15); n of lgs = 7299 GM: must have genera:WALS; n of lgs = 2561 Sampling for Variety 98 DV versus GM b. Which typological variables? Sampling for Variety 99 DV versus GM b. Which typological variables? WALS database: Sampling for Variety 100 DV versus GM b. Which typological variables? WALS database: - 138 variables (phon/morph/synt/lex/…): representative Sampling for Variety 101 DV versus GM b. Which typological variables? WALS database: - 138 variables (phon/morph/synt/lex/…) - 2 – 9 different values Sampling for Variety 102 DV versus GM b. Which typological variables? WALS database: - 138 variables (phon/morph/synt/lex/…) - 2 – 9 different values - value distribution frequent < - > rare Sampling for Variety 103 DV versus GM b. Which typological variables? WALS database: - 138 variables (phon/morph/synt/lex/…) - 2 – 9 different values - value distribution frequent < - > rare - total 2511 languages with 1 or more values Sampling for Variety 104 DV versus GM b. Which typological variables? WALS database: - 138 variables (phon/morph/synt/lex/…) - 2 – 9 different values - value distribution frequent < - > rare - total 2511 languages with 1 or more values - value for 415 languages on average per variable Sampling for Variety 105 DV versus GM c. How test? Sampling for Variety 106 DV versus GM c. How test? 1. For each variable Vi (i = 1 - 138) in the WALS database: Sampling for Variety 107 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of 18 sample sizes Sn (n=50, 100, 150, …, 900): Sampling for Variety 108 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): Sampling for Variety 109 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C Sampling for Variety 110 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C Sampling for Variety 111 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C 6. Randomly select a language for each node from C Sampling for Variety 112 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C 6. Randomly select a language for each node from C 7. For each language determine value (1-9) for Vi in WALS, or 0 Sampling for Variety 113 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C 6. Randomly select a language for each node from C 7. For each language determine value for Vi in WALS, or 0 8. Add value to value set Vi.Sn Sampling for Variety 114 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C 6. Randomly select a language for each node from C 7. For each language determine value for Vi in WALS, or 0 8. Add value to value set Vi.Sn 9. Compare completeness Vi.Sn for DV and GM Sampling for Variety 115 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 138 x 18 x 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C 6. Randomly select a language for each node from C 7. For each language determine value for Vi in WALS, or 0 8. Add value to value set Vi.Sn 9. Compare completeness Vi.Sn for DV and GM 100 x Sampling for Variety 116 DV versus GM c. How test? 1. For each variable Vi in the WALS database: 2. For a series of sample sizes Sn: 18 x 138 x 3. For each method M (DV, GM): 4. Generate sample Vi.Sn on the basis of M-specific classification C 5. This gives n nodes (family, group, genus) in C 6. Randomly select a language for each node from C 7. For each language determine value for Vi in WALS, or 0 8. Add value to value set Vi.Sn 9. Compare completeness Vi.Sn for DV and GM 100 x = 250,200 samples per method Sampling for Variety 117 DV versus GM Key factors (0.0 = LOW, 1.0 = HIGH) Sampling for Variety 118 DV versus GM Key factors Saturation (SAT): proportion of values for variable in a sample 0.0: no values found 1.0: all values (2-9) found Sampling for Variety 119 DV versus GM Key factors Saturation (SAT): proportion of values for variable in a sample (0.0 – 1.0) Completeness (COMP): number of draws necessary to find all values 0.0: maximum (=100 draws) reached 1.0: all values found in first draw Sampling for Variety 120 DV versus GM Key factors Saturation (SAT): proportion of values for variable in a sample (0.0 – 1.0) Completeness (COMP): number of draws necessary to find all values (0.0 – 1.0) Several more, not discussed here … Sampling for Variety 121 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 122 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 123 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 124 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 GM > DV GM > DV Sampling for Variety 125 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 5028 languages with no value!!! Sampling for Variety 126 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety Only languages with a value!!! 127 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 128 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 129 DV versus GM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 DV > GM Sampling for Variety 130 DV versus GM DV slightly better overall than GM, but: Sampling for Variety 131 DV versus GM DV slightly better overall than GM, but: A. Differences per sample size? Sampling for Variety 132 DV versus GM DV slightly better overall than GM, but: A. Differences per sample size? B. Differences per feature? Sampling for Variety 133 DV versus GM A. Sample size: Sampling for Variety 134 DV versus GM A. Sample size: 1. SATuration Sampling for Variety 135 DV versus GM A. Sample size: 1. SATuration Sampling for Variety 136 DV versus GM A. Sample size: 1. SATuration DV ≈ GM DV > GM Sampling for Variety 137 DV versus GM A. Sample size: 2. COMPleteness Sampling for Variety 138 DV versus GM A. Sample size: GM > DV 2. COMPleteness DV > GM Sampling for Variety 139 DV versus GM A. Sample size: GM improves around sample size 450 - 500 Sampling for Variety 140 DV versus GM A. Sample size: GM improves around sample size 450 - 500 GM: number of genera = 475 Sampling for Variety 141 DV versus GM A. Sample size: GM improves around sample size 450 - 500 GM: number of genera = 475 > support for genera? Sampling for Variety 142 DV versus GM B. Feature: Sampling for Variety 143 DV versus GM B. Feature: Sampling for Variety 144 DV versus GM Factor? B. Feature: Sampling for Variety 145 DV versus GM How GOOD are DV and GM? Sampling for Variety 146 DV versus GM How GOOD are DV and GM? Compare with RANDOM sample of same size: Sampling for Variety 147 DV vs GM vs RANDOM MEAN over FEATURES (138) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 148 DV vs GM vs RANDOM MEAN over FEATURES (138) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 149 DV vs GM vs RANDOM DV & GM > RAN Sampling for Variety 150 DV vs GM vs RANDOM MEAN over FEATURES (139) * SAMPLE SIZES (50, 100, … , 900) N of draws per Feature * SampleSize: 100 Sampling for Variety 151 DV vs GM vs RANDOM DV & GM > RAN Sampling for Variety 152 4. Conclusions Sampling for Variety 153 Conclusions 1. As an explorative device, Variety Sampling works better at any common sample size (50-900) than Random Sampling: Sampling for Variety 154 Conclusions 1. As an explorative device, Variety Sampling works better at any common sample size (50-900) than Random Sampling: a. to find maximum variety Sampling for Variety 155 Conclusions 1. As an explorative device, Variety Sampling works better at any common sample size (50-900) than Random Sampling: a. to find maximum variety b. to establish it relatively easily Sampling for Variety 156 Conclusions 1. As an explorative device, Variety Sampling works better at any common sample size (50-900) than Random Sampling: a. to find maximum variety b. to establish it relatively easily (c. all other measures …) Sampling for Variety 157 Conclusions 2. A purely genealogically based method (DV) works slightly better for smaller samples (< 500) than a method which combines a genealogical basis with areal stratification (GM). Sampling for Variety 158 Conclusions 2. A purely genealogically based method (DV) works slightly better for smaller samples (< 500) than a method which combines a genealogical basis with areal stratification (GM). For larger samples both methods are equally good, but areal stratification may make it easier to find the optimal sample. Sampling for Variety 159 Conclusions 3. Unclear why areal stratification does not simply improve genealogical sampling Sampling for Variety 160 Conclusions 3. Unclear why areal stratification does not simply improve genealogical sampling - Macro-Areas too crude AutoTyp areas? Sampling for Variety 161 Conclusions 3. Unclear why areal stratification does not simply improve genealogical sampling - Macro-Areas too crude AutoTyp areas? - Areal under/overrepresentation of Genera Sampling for Variety 162 Conclusions 3. Unclear why areal stratification does not simply improve genealogical sampling - Macro-Areas too crude AutoTyp areas? - Areal under/overrepresentation of Genera - Try other diversity criteria (e.g. Dahl 2008) Sampling for Variety 163 Conclusions 3. Unclear why areal stratification does not simply improve genealogical sampling - Macro-Areas too crude AutoTyp areas? - Areal under/overrepresentation of Genera - Try other diversity criteria (e.g. Dahl 2008) GOAL: find optimal balance Areal vs Genealogical stratification Sampling for Variety 164 ? Sampling for Variety 165
© Copyright 2026 Paperzz