PittsBerzofskyWitt

Applying Statistical Methodologies Originally
Developed for Household Surveys to the Design
and Analysis of Establishment Studies
Angela Pitts, Marcus Berzofsky, and Michael Witt
6/19/07
RTI International is a trade name of Research Triangle Institute
3040 Cornwallis Road
■
P.O. Box 12194
■
Research Triangle Park, North Carolina, USA 27709
1
Topics of Discussion
 Using Multistage Sampling Techniques
 Using Area Probability Sampling Techniques
 Selecting Units Proportional to a Composite Size
Measure
 Selecting Units by Sequential PPS w/ Minimal
Replacement
 Computing Weight Adjustments Using a Model-Based
Approach
 Large Table Production Tasks
2
Multistage Sampling
 Refers to the process of sampling within a previous
selected sample
 Used to decrease data collection cost
 Used when a sampling frame is not available
 Often one would stratify within stage depending on
known information
3
Multistage Sampling (continued)
 Establishment study examples:
 O*NET
 Establishment is PSU; sampling unit is employee
 National survey of employees within occupations
 Selected establishments, then occupations, and then
employees
 School-based studies
 National School Radon Survey – selected districts, then
schools, and then classes
 Educational Longitudinal Study (ELS) – select schools
and then students
4
Area Probability Sampling
 Closely related to multistage sampling
 Refers to the multistage design process of selecting
geographic areas at the early stages of design
 Used to minimize data collection costs and address
incomplete sampling frames
 May be useful for non-response follow-up or
response-analysis studies
 Sometimes clustering can decrease the precision of
estimates
5
Area Probability Sampling (continued)
 Establishment study examples:
 National Study of Assisted Living for the Frail Elderly
 Estab is SSU; sampling unit is resident
 Select geographic areas
 Used outside sources to create a list of assisted living
facilities in each area – suitable sampling frame did not
exist
 Select facilities then residents
6
Area Probability Sampling (continued)
 Establishment study examples (continued):
 National Postsecondary Student Aid Study (NPSAS
pre-1994)
 Estab is SSU; sampling unit is student
 Select 3-digit zip code areas, then higher education
institutions within areas, and then students
 Cash Payment Study (currently being designed)
 Estab is SSU; sampling unit is establishment
 Considering oversampling establishments in a sample of
geographic areas for in-person non-response follow-up
7
Selecting Units Proportional to CSM
 Common size measure for establishments is the
number of employees, however this may not be
suitable for all target population(s) of interest
 Composite size measure accounts for varying
subsampling fractions across subdomains (Folsom et
al, 1980)
 In a sense, the composite size measure represents
the “value” of a PSU relative to other PSUs on the
frame and relative to the desired ultimate sample
8
Selecting Units Proportional to CSM
(continued)
 Benefits of CSM methodology:
 Can help equalize between PSU workload (cost
efficiency)
 Can help equalize final probabilities of selection within
domains (variance reduction)
9
Selecting Units Proportional to CSM
(continued)
For example, suppose subdomain is part-time/full-time
workers
 Let Ni=100 and Npi=70 and Nfi=30
 Let fp = np/Np = 0.25 for all i
 Let ff = nf/Nf = 0.80 for all i
 Si = fp * Npi + ff * Nfi = 41.5
 Note that a commonly used size measure would be
100 (total number of employees in establishment)
10
Selecting Units Proportional to CSM
(continued)
 Establishment study examples:
 O*NET
 Subdomain is occupation within establishment
 Could consider using this for hospital studies
 Subdomain could be medical personnel in different
specialties within hospital
11
Selecting Units by Sequential PPS w/
Min. Replacement
 Application of sequential, with replacement sampling
to the PPS environment, developed by Dr. Chromy
(1979)
 Each PSU is guaranteed to be selected within 1 of its
expected value (m*Si/S+)
 If expected value is less than 1, then PSUi would be
selected either 0 or 1 times
 If expected value is greater than 1, for example 3.2,
then PSUi would be selected 3 or 4 times
12
Selecting Units by Sequential PPS w/
Min. Replacement (continued)
 Sorting of the file prior to selection can yield variance
reduction and/or better control over domain sample
sizes via the implicit stratification
 SAS has an option for Chromy’s method in their
survey select procedures
13
Selecting Units by Sequential PPS w/
Min. Replacement (continued)
 Establishment study examples:
 National Inmate Survey




Establishment is a PSU; sampling unit is inmate
Survey of correctional facilities
Size measure was number of inmates in facility
Implicitly stratified by region and state to ensure that at
least one facility was selected in each state
 O*NET
 Composite size measure used based on occupations
within establishment
 Implicitly stratified by industry grouping and establishment
size
14
Weight Adjustments
 Use model-based approach, such as the generalized
exponential model (GEM), developed by RTI for
household surveys (Folsom and Singh, 2000)
 Can be used with nonresponse, post-stratification,
and extreme weight adjustments
 Allows for a larger number of statistically significant
main effects and lower-order interaction terms in the
adjustment compared to a weighting class adjustment
 Potential to decrease bias
15
Weight Adjustments (continued)
 Preset bounds on resulting adjustment can be applied
– this minimizes unequal weighting thereby increasing
precision
 Maintains marginal control totals unlike logistic
regression (propensity scores)
 May need to produce multiple weights to perform
analyses on different units
16
Weight Adjustments (continued)
 Establishment study examples:
 Cyanide Survey
 Estab is PSU; sampling unit is advanced life support
providers
 Survey of ALS providers regarding emergency services
related to cyanide
 Used GEM for both nonresponse and post-stratification
adjustments
 NIOSH Fire Fighter Fatality Investigation and
Prevention Program Evaluation
 Estab is PSU; sampling unit is also PSU
 Survey of fire departments
 Used GEM for nonresponse adjustments
17
Table Production
 Technique developed for HH surveys where a large
number of tables were required
 Uses a collection of SAS and Visual Basic programs
to create analysis tables
 Programs produce the analysis data and populate the
tables
 Tables can be of varying sizes and formats
 Tables are formatted and printer-ready
18
Table Production (continued)
 Establishment study examples:
 National Inmate Study
 Generated response rate tables, analysis data tables, and
nonresponse tables
 Cash Payments Study
 Establishment survey designed to collect information on
the use of cash versus credit/debit cards, etc.
 Generated response rate tables, analysis data tables, and
nonresponse tables
19
Conclusion
 HH methods shown have not been commonly used in
establishment studies
 In some instances, these methods are beneficial for
establishment studies
 When list frame not available
 When only a portion of the establishment is the domain
of interest
 When data collection is done in the field (as opposed to
telephone)
20