Standard Errors for Indexes From Complex Samples

Standard Errors for Indexes From Complex Samples
Author(s): Leslie Kish
Source: Journal of the American Statistical Association, Vol. 63, No. 322 (Jun., 1968), pp. 512529
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2284022 .
Accessed: 25/07/2014 11:52
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD ERRORS FOR INDEXES FROM COMPLEX SAMPLES*
LESLIE
KISH
University
of Michigan
Methods of computingstandarderrorswere developedand applied
for several statisticsof importancein economic and social surveys.
These statisticsare based on ratiomeansr = y/x;x is oftenthe variable
sample size of complex samples. Some statistics involve weighted
sums 22wjrj;othersconcernthe relativesRt=r1/ro= (yi/x,)/(yo/xo),
the
ratio of current(1) to base (0) mean. From theserelativesthe indexes
II
=-
J i
Ri
are constructed,
and changes,12-11, in the index.We developcomputing formuilas
and helpfulapproximationsfor standard errorsof these
statistics,and apply themto a seriesof data fromour Center'sSurveys
& Expectations.
A largegroupof empiricalresults
of ConsumerAttitudes
yield evidence on the behaviorof these statistics,with wide implications for the design of surveys to measure economic and social indicators.
1. STANDARD
ERRORS
FOR MEANS
AND THEIR
CHANGES
THE use of sample surveysforcollectingsocial and economicdata is increasing in scope. Important indicatorsare more and more based on sample
surveys.Many of these resemblein the essentialsof sample designthe Surveys
of ConsumerExpectations [Katona and Mueller, 1957, Katona 1964, Mueller
1963] that provided both stimulationand data for our investigations.The
directorsof the surveydesignedthe methodsand scales used forcollectingthe
data. They also designedthe formsofthe indicatorsderivedfromthe data: the
scores (means), the relatives,and the indexes. The methodsof measurement,
estimation,and indexformationare not the subject of this paper; they are the
subjectofcontinuingand livelydiscussionin economicand statisticaljournals.1
* Supportedby grant GS-777 for "AnalyticalStatisticsfor Complex Samples" fromthe National Science
Foundation,and presentedat the 1962 Annual MeetingsoftheAmericanStatisticalAssociation.I am gratefulfor
the collaborationof R. K. Pillai who took completechargeof the mass of complexcomputations;also forhelpful
suggestionsfromW. Scott Maynes and fromone of the editors.
I ProfessorKatona kindlyadded theseremarks:
Surveyquestionson buyingintentionsand on consumerattitudesor expectationshave been asked by the
SurveyResearchCenterof the UniversityofMichigansince 1946. Studiesofdecisionmakingand of psychological
millionsof conantecedentsof overtbehavior (e.g., purchases)were called forbecause, withincreasingaffluence,
sumeracquired discretionof action and thereforetheirmajor expendituresbecame dependentnot only on their
abilityto buy, but also on theirwillingnessto buy. The purposeof thesestudiesis predictionof futuretrendsof
purchases,as well as understandingof the factorsresponsibleforchangein the rate of purchases [Katona, 1964,
and earlierpublicationsquoted there].
The emphasisin the Center's studies shiftedfrombuyingintentionsto attitudesand expectations,partly
because of considerationsof sampling errorsand samplingvariabilitypresentedin this paper, and partly beprocess: changesin optimismor pessimism,conficause ofa searchforan earlyinterceptof the decision-making
ofspecificbuyingplans.The Center'sIndex
denceoruncertainty,
etc.,are assumedto occurpriorto theformulation
of ConsumerSentimentis available since 1952. Since 1963 it is constructedon a quarterlybasis fromitems1 to 5
inclusive(see Table 1, above). Items 6 to 15, and manyotherquestions,are continuedbecause ofspecificproblems
on whichtheyshed light.
The predictivevalue of the Index was studied by means of regressionequations in which,forinstance,the
CommerceDepartmentseriesof durablegoods purchases,or the numberof cars sold, or incurrenceof installment
debt, served as dependentvariables. As independentvariables the Index (or change in the Index) was used in
willingness
conjunctionwithseveralothervariables.The simplestregressionequation,withthe Index representing
abilityto buy as independentvariables-both measured6 to 9 monthsearlierthan
to buy and incomerepresenting
512
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
513
SAMPLES
Our firstaim was empirical:to learn about the magnitudesand sources of the
these importanteconomicindicators.Our second
samplingvariationsaffecting
aim is to presentmethodsand resultsthat should prove usefulforsimilarindicators in othercomplexsamples, which are not uncommon(thoughpresentation of theirsamplingerrorsis).
The surveysare based on complexmultistagesamples,fromwhichare computed means r, each the ratio of two randomvariablesy and x:
r = y/x =
h
H
H
H
Yh /E
Xh =
h
E
ha
ah
E Yha/
H
ah
E EXha.
h
a
(1)
Typically y is the sum of some characteristicin the sample and an estimateof
the population value Y, except fora constantfactor-such as f, the sampling
fraction.The random variable x in the denominatoris here (and often) the
sample size; in generalit estimatesthe populationvalue X in a formsimilarto
the numerator.
When y/xis merelythe sample mean it could be writtensimplyy=y/n; the
more complex notation conformsto methods for computingvariances, describedin section5, whichapproximatethe complexitiesof the sample design.
The sample is selectedin H primarystrata and ah denotesnumberof replicate
fromthe h-thstratum.The quantitiesYh,' and xh,represent
(primary)selections
sample totalsforthe a-th primaryselectionin the h-thstratum;Yha' is the total
scoreforan attitudeand Xh,a the sample size; Yh and Xh are stratumtotals. These
formulascan be, and are, used generallyfor combinedratio estimators;the
values Yhaf and Xha are understoodto include any needed unequal weighting.
[Kish and Hess, 1959].
Data fromsix surveysare represented,each ofabout n = 1370 interviews,and
two fromeach of threeyears: the base year 1956, and two current(at the time
of this research)years, 1959 and 1960. Essentiallythe same methodsof multistage probabilityarea samplingwere used in each survey to select dwellings
with equal probabilities.In dwellingsfamilieswere identified(1.04 per dwelling), and a singleinterviewwas takenfromhead or wife,alternatelydesignated
(or fromhead, ifwithouta spouse).
About 1320 dwellingscame fromabout 400 segmentsand these from 66
primarysamplingareas, eithersinglecounties or Standard MetropolitanStatisticalAreas. These are widespreadsamples,withan average of 20 cases coming froman average of seven segmentsper area. Dwellings and segmentsare
changed betweensurveys,but primaryareas are not. Of the 66 areas 54 were
selectedwithless than certaintyand thesecontribute27 strata,each containing
two primaryselections,to the computationof the variance. The other 12 are
the largestmetropolitanareas, includedwithcertainty;heresegmentswerethe
primarysamplingunitsand these were paired inito18 computingstrata. Altogetherthenthe computationofthe variance is based on 27 + 18 = 45= H strata,
withtwo primaryselectionsin each [Kish and Hess, 1959 and Kish 1965, 7.3].
The 15 variatesofTable 1 representanswersobtainedin essentiallythe same
formover the years; the scores in column 2 stand for the means of 6 scores
obtained on 6 surveys.A score denotesthe ratio mean of scales of attitudesof
on durables-yieldedan R2 of.91 fortheperiod1952-66 [Mueller,1963,Katona 1967,and Maynes
theexpenditures
1968].
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
514
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
JUNE 1968
TABLE 1. STANDARD ERRORS FOR SCORES (r)
AND THEIR CHANGES (r2-ri)
17Surveys
Item
Scores
se(r)
100 se(r)/r
(1)
(2)
(3)
(4)
(5)
(6)
2.32
2.11
2.95
6.26
1.73
.10)
2.29
( .31)
2.35
( .35)
2.74
( .31)
2.37
( .33)
2.76
( .25)
1.68
( .35)
1.35
2.37
3.78
1.47
2.91
12.93
1.93
2.54
10.83
2.12
3.10
8.50
2.58
3.15
9.68
2.53
3.72
27.37
1.40
2.26
11.96
9.93
1.92
1.76
5.67
2.42
2.94
6.48
3.70
4.25
10.57
2.06
1.60
8.78
1.61
1.29
9.75
1.64
1.55
8.12
1.92
1.69
1. Evaluationoffinancialsituationas compared
witha year earlier
2. Expected changein financialsituation
3. Businessconditionexpectedoverthenext12
12 months
4. Businessconditionexpectedforthenextfive
years
5. Good or bad time to buy large household
goods
6. Changingpricesexpectedfornextyear is to
the good or bad
7. Evaluation of currentbusiness conditions
comparedto thosea year ago
8. Expectedbusinessconditiona yearfromnow
as comparedwiththe present
9. Plan to buy house duringthe nextyear
110
128
156
122
129
92
109
120
14
10. Intentionto buy automobileduringthenext
12 months
11. Evaluation of chancesofhome repair
33
42
12. Evaluation ofchancesofbuyingrefrigerator
14
13. Evaluation of chancesofbuyingT.V.
14
14. Evaluation of chances of buying cooking
range
15. Evaluation of chances of buying washing
machine
12
17
( .14)
1.39
.27)
1.87
( .20)
2.72
.32)
1.48
.13)
1.23
.22)
1.17
( .17)
1.38
( .18)
se(r2-ri)
in
individualsof a survey sample. All 15 variates, as they appear here, denote
trichotomies.For example, on the firstitem the response "Better off"has a
value of200, the negativeresponse"Worse off" has a value of0, and the neutral
responsehas a value of 100; the neutral responsesare mostly "Same," with
fewer"Uncertain" and "Don't know," and very few "Unascertained." The
score 110 expressesresultsfromabout 30 per cent "Better off"against 20 per
cent "Worseoff,"with50 per cent in the middlegroup; the deviationfrom100
between positive and negative percentages.Of the 8
expressesthe difference
items (1-8) of attitudes and expectations,seven average more than 100,
showingmore positive than negative response.The 7 buyingvariates (9-15)
denotebuyingintentions.On all 15 itemspositivedirectionsare used to denote
optimism,improvement,or intentionsto buy.
The distributionsof the 8 attitudinal variables (1-8) differgreatlyfrom
those of the 7 rare buyingintentions(9-15); this difference
runs throughthe
entireanalysis.The attitudinalitemshave fairlylarge middlegroups,comprising about one-thirdto one-half;and theytend to be balanced, with both sides
ofmoderatesizes. On the contrary,the seven buyingitemsrepresentJ-shaped
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
515
SAMPLES
distributions.The "certainor probable" buyersare ratherrare,the "slightly"
inclinedor "uncertain"are even fewer,but nonbuyersare numerous.For example,a scoreof 14 mighttypicallyconsistof theseproportions:0.06 probable
buyerswithscale value 200 (in otherwords,6 per cent with a score of 2); 0.02
possiblebuyerswith scale value 100; and 0.92 nonbuyerswith scale value 0;
thus 12+2+0 = 14. Note that to estimatethe proportionof buyersone should
divide these scores and theirstandard deviationsby two; but coefficients
and
relativesremainunaffectedthereby.Thus thefivevariates (numbered9, 12-15)
with scores near 14 denote buyingintentionsof about 7 per cent; variate 10
with scores of about 32 representsexpected car buying by 16 per cent; and
variate 11 with scores around 42 representsexpected home repairsby 21 per
cent. Because the middle groups are small, the statisticalpropertiesof these
scoresresemblethe simpleproportionsof buyers;thelatterhas oftenbeen presentedinsteadoftheformerin surveyresults.
In column3 we presentforeach itemthe mean of6 standarderrorscomputed
forthe six surveys.Thus we note foritem 1 a score of 110,subject to a standard
errorof 2.32; both of these figureshave increasedstabilitybecause they are
means of 6 separate computations.The standard errorsare fairlyuniform,
mostlyin the range of 1.5 to 2.5 points.Neverthelesssome differences
between
them are appreciablylarge comparedto theirown reliability.To measurethe
reliabilityof the standarderrorswe also computedstandard deviationswithin
the sets of 6 values, and we presentthese in parentheses.The 6 values of standard errorsare not independententirelybut largelyso, because most sampling
units are changed; the value of 1//6- 0.4 can serve as a roughapproximation
forthe coefficient
of variationforthe standarderrors;thus the standard error
of 2.32 is subject roughlyto a standarderrorof0.4X0.14=0.056.
of variations: the mean standard errors
Column 4 contains the coefficients
of column3 divided by the mean scoresof column2. The uniformity
we noted
beforein column3 disappearsherein column4: the seven (9-15) buyingintentionshave muchlargercoefficients
ofvariationthan the attitudes(1-8).
For the main sourceofthisdifference
ofvariawe may look to the coefficient
tion under unrestrictedrandom sampling. For binomials this would be
\/(1-P)/Pn, and this becomes large for small P. For trinomialsit can be
written,withP2, P1, and Po denotingthe proportions
havingscoresof2, 1, and 0
respectively,as:
/Var(2p2+ P)
'V(2P2
+
A4P2
+ Pi-
V
p1)2
(2P2
(2P2 +
+
P1)2n
P2 + PO -(P2
-
~V
(1+
P2
P1)2
-PO)
2
(2)
-Po)2n
When P2=PO the coefficient
ofvariationbecomes V/(P2+Po)/n. The attitudes
(1-8) approximatethis condition; furthermore,
they tend to have values of
of variaPI = P2+Po = 0.5 veryroughly.Under these conditionsthe coefficient
tionis 0.7/Vn-;and forn = 1370 this comesto .019. However,forrareitemsthe
of variation
situationis different:
P2 is small,P1 is even smaller,the coefficient
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
516
AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1968
approachesthe binomialmodel -/Po/(l -Po)n. For example,forthe score of
14 with P2=0.06, P1=0.02, Po-=0.92, we get a/(.98-.862)/(I.06-.92) 2n
=3.50/N/n. (Note that for P0=0.92 the binomial standard errorcomes to
3.39/V/n;and for P0=0.93 (for a score of 14/2) it comes to 3.64/V/n.)For
n = 1370thevalue of3.50/x/n .095; thisis 5 timesas highas the typicalvalue
ofvariationofthe
of.019 forattitudes.These factsexplainthe highcoefficients
fiveintentionitems9, 12-15; items10 and 11 fareslightlybetter.
Thus the 7 intentionstend to resembleasymmetricaldichotomies,whereas
ofvariation
the8 attitudesare fairlysymmetricaltrichotomies.The coefficients
permitready comparisonsof the errorsof the diverseitems; they also point to
of behavior between the 8 unimodal attitudes and the 7
the wide difference
bimodal intentionsthat we shall note later in the errorsof relatives and
indexes.2
Column 5 containsthe standard errorsof changes of scores (r2- ri) between
two periods.Each entryis the mean of threecomputationsobtained fromthe
pairs of surveysfor each of threeyears. These standard errorsare somewhat
means (column3), but not a/2 timesas high,
higherthan forthe corresponding
whichtheywould be forindependentsurveys.We computedthe ratiosof standard errorsof (r2-r1) to \/var(r2)
+var(rj); the 15 computedratioslay between
0.77 and 0.99, and average0.916 ratherthan 1; that is, se(r2-r1)/se(r)averages
not 1.414 but 0.916 X 1.414= 1.295. We may considerroughlythat the average
ratio0.9162=0.84 expressesthe effectof an average correlationofp=0.16. This
correlationcomes fromusing the same countiesfor the pairs of surveys,not
oftenthe same blocks,and neverthe same segments.Ihencethe correlationsare
years. Larger correlahigherthan forpairs of surveysfromdifferent
onlylittf-e
tionscould resultfrompanel studies using the same segmentsand dwellings;
thiswould be true formost attitudes,thoughnot all buyingintentions[Kish
1965, 12.5].
Economistsmay utilize the magnitudesof changes over a variety of time
spans: last survey,last year, etc. The values of se(r2-r1) in column 5 can be
accepted as reasonably good measures of the standard errorbetween any 2
surveysnot far apart. (They would naturally tend to increase slowly and
slightlyto 1.414 times se(r) in column4.) This measure of the samplingvariabilityof each item should be comparedto some measure of the magnitudeof
the item's fluctuationbetweenperiodicsurveys.A good treatment,involving
multiplepurposes and models, is beyond the scope of this (already complex)
paper. But interestingcontrastsemergefromthe most basic approach: from
the scoresof the 17 surveysavailable in the years 1955-62 we computedstandard deviationsbetweenthe 17 surveyscores (column 6).3 The attitudes (1-8)
though
all had fluctuationsmuchgreaterthan the standarderrorofdifferences;
variate2 fluctuatesless than the others,and variate 7 much more.In contrast,
2 I am not surprisedat the poor statisticalperformance
of thesedichotomiesforrareitems.PersonallyI have
longbeen opposed (oftenin speech,occasionallyin writing)to the preponderanceofdichotomiesin social research,
and in favorofusingmorescales with3, 5 and morepoints.I welcometheinvestigationof 10 pointintentionscales
by the Census Bureau [Juster,19661.
ofall 15 itemswas, withsomevariation,about 3.5 as greatas thestandarddeviations
s The rangeoffluctuation
presentedhere.These resultsforthe range/s.d.are about what one could expect forsamples of 17 fromnormal
distributions.
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
517
SAMPLES
the seven buyingexpectations(9-15) had fluctuationsonly about as great as
hence mildfluctuationscould not individutheirstandarderrorsof differences;
ally contributemuch new information.However, to conclude that these 7
itemsare uselesswould be too hasty; even these 7 scorescan detectlargerfluctuations,whichperhapsare moreimportantthoughrarerthan smallerfluctuations.
Because the sample was clustered (and multistage), the standard errors
actually computed in conformitywith the sample design were greaterthan
would appear fromformulasproper only for unrestrictedrandom sampling.
The ratiosof the actual standarderrorsto standarderrorsforthe unrestricted
to the "designeffect";its values
randommodel we denote as VDeff, referring
werecomputedand theirmeans assessed foreach of the 15 scoresas 1.21, 1.16,
1.42, 1.24, 1.33, 1.07, 1.46, 1.28, 1.08, 0.99, 1.32, 1.14, 0.97, 0.97, 0.96. The
average among the 15 values is 1.17; this representsan increaseof 1.172= 1.37
in the variance. Its chiefprobable sources are the clustersof about threeinterviews in somewhathomogeneoussegments;clustersof about 20 in less homogeneous counties; and perhaps the clustersof about 10 interviewsper interviewer [Kish, 1962].
We had also computed average values of -VDefffor the score changes,by
takingthe ratio ofthe actual standarderrorsof (r2- ri) to thestandarderrorsof
two independentunrestrictedrandom samples of the same numbersof interviews; forthe 15 items they were 1.07, 1.11, 1.28, 0.96, 1.08, 0.99, 1.36, 1.15,
1.04, 0.90, 1.25, 1.10, 0.88, 0.95, 0.95. Effectsof clusteringremain,but are not
as greatas forthe singleratiomeans: averaging1.07 forthe standarderror,and
1.072=1.15 forthe variance. The reductionof the effect(1.15/1.37=0.84) is
due to the positivecorrelationbetweenpairs of sample means, which reduces
the standard errorsof differences;
but an effect,thoughreduced,does not disappear. Our confidencein the magnitudesand relationshipsof these design
effectsare enhanced by similaritiesto hundredsof computationson similar
surveys.[Kish, 1965, 14.1 and 8.2].
2. STANDARD
ERRORS
FOR RELATIVES
AND THEIR
CHANGES
The relativeforany currentyear is the ratio of the score forthat year to the
scoreforthe base year R1= ri/ro:
= (yjxo)/(yoxi).
R, = r1/ro- (y1/xj)/(yo/xo)
(3)
The resultsconcerningrelativesin this sectionwere computedforfoursurveys, 2 in 1959 and 2 in 1960; 2 surveysin 1956 serve combinedas the base
period.The situationis similarforthe sums of relatives,the indexes,presented
in the next section.
Standard errorsforrelativesare computedwithformula(9"), foreach of the
foursurveys;the meansofthe fourstandarderrorsappear in column1 of Table
2. The chieflesson again concernsthe strikingcontrastbetweenthe 8 attitudinal items (1-8) on the one hand, and the 7 rare items (9-15) on the other.The
standard errorsfortheformerrange from1.6 to 3.1; forthe latter they range
from6.0 to over 13. The standard errorsof the relativesresembleclosely the
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
518
AMERICAN
STATISTlCAL
ASSOCIATION
JOURNAL,
1968
JUNE
TABLE 2.-STANDARD ERRORS OF RELATIVES (R1=r1/ro)AND
CHANGES OF RELATIVES (R2-R1)
Item
Fluctuations
se(R2 - R1)
in 17
+/C2
V\[Var(R2)+ Var(RI)]
Surveys
CR
se(R)
se(R2-Ri)
(1)
(2)
(3)
(4)
(5)
1
2
3
4
5
6
7
8
2.30
1.65
1.59
1.79
2.42
2.89
3.11
1.90
2.82
1.78
1.86
2.01
2.23
3.62
3.66
2.21
5.40
2.97
7.89
7.94
6.29
7.45
24.64
17.26
.87
.76
.82
.79
.65
.89
.84
.81
.88
.96
.94
.88
.95
.96
.97
.90
9
10
11
12
13
14
15
11.93
6.02
7.30
13.37
10.00
13.14
11.44
14.24
6.83
8.21
14.71
11.29
14.95
12.40
12.69
8.52
9.93
11.18
8.57
14.17
11.10
.83
.80
.79
.78
.79
.80
.77
.95
.90
.95
.98
.98
1.00
.98
Mean
.80
+ C2
rC0 +rC
.944
of variation forthe means (column 4 of Table 1), forreasons discoefficients
cussedlater.
In column2 of Table 2 we presentstandarderrorsforthe changes (R2-R1)
oftwo relatives.Each entryis the mean of six standard errorscomputed (with
formula11) forthe differences
betweenthe six possiblepairs of the fourperiods
investigated.Note that theseentriesare onlylittlegreaterthan the correspondentriesin column 1 forthe standard errorsof separate relatives;the entriesof
column2 divided by column 1 average 1.13, ratherthan \/2.The variances of
are reducedby strongcorrelationsbetweenthe pairs of relatives
the differences
R2 and R1; these effectsmay be seen in the entriesof column4. Their average
is 0.80, whichdenotesa variance ratio of0.64, the effectof a correlationof0.36
(strongerthan 0.16 in Table 1); a/1+1-2(0.36) =/1.28=1.13 on the standard errorof the change.
We also obtained most reassuringconfirmations
forour conjecturethat the
varianceofthe changein the relatives(R2-R1) may usually be computedmore
simplyin termsof the variance of change in the scores (r2-rl), takingadvantage of the approximationthat rO2 var(r2-ri) -var(r2-r1/ro) =var(R2-R1);
see formula(11'). We computedthe ratiosse(R2-Ri)/ [r7lse(r2-rl) ], and averaged these over the six possible pairs foreach of the 15 items; all 15 averages
werefoundwithin.014 of 1.00; therewas littlevariation,and the mean of the
15 was 1.00. Thus we can safely use this approximationwhen (r2-rl)/ro is
moderatelysmall. This holds usually,especiallywhenthe standarderrorof the
changeis mostneeded. Then we may computesimplythe standarderrorofthe
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
519
SAMPLES
change in means, instead of the more difficult
standard errorof the change in
relatives.
Searchingalso for a simplerapproximationforthe variance of R, we won- 2prroCrCro. We
Prrowas in CR-C +C
deredhowimportant
the correlation
found that the ratios [CR/(C +C0)r]112 varied only between 0.88 and 1.00,
averaging0.944 (column5). It seems that usingthe moreconvenientvalues of
(C1,+ Cr)1/2resultsin overestimating
CR, the coefficient
ofvariationof the relative, on the average by about 5.5 per cent. (This effectof a correlationof 0.12
agrees with the slightlygreatercorrelationof 0.16 found between successive
surveys.) This is an empiricalrelationshipwhich depends on the stabilityof
small correlationsPrro betweenthe base year and a seriesof periodic surveys,
and forvariousitems.
In addition to providingconvenientapproximationsforthe standard errors
ofR = r/roand (R2- R1),the above relationshipsalso have relevanceforthe design of the base ro.It is usefulto know that ordinarilyneitherthe variance of
ronor the correlationPrro have importanteffectson the variance of (R2-R1),
althoughthe correlationPr2r, does. On the otherhand, decreasingthe variance
of ro and increasedPrro would reduce the variance of R. Here I may add two
practical and personal considerationsregardingrelativesin periodicsurveys.
First, the change (R2-R1) is probably of greaterinterestthan the relative
itself.Secondly, although Pr2r,may be increasedwith judicious overlaps betweensuccessivesurveys,it may be difficult
to do this for Prro when the base
periodis farremovedfromthe currentyear.*
3. STANDARD
ERRORS
FOR INDEXES
AND THEIR
CHANGES
Table 3 presentsresults regardingfive indexes in separate columns, each
indexbeingthe mean of severalrelatives;theytake the form
1
J
1= -ERi
Ji
and fluctuatearound 100 as the relativesdo. The firstindexis the mean of the
relativesforthe 6 attitudes(1-6); the second index containsonly items9 plus
10 (home and car buying);the thirdis the mean ofthese 8 items.The fourthis
themean ofthe4 intentionsto buy appliances (12-15); thefifthadds item9 and
10 to them;thesetwo indexeshave not been actuallyused.
The standard errorsare on line 2, each the mean forthe fourperiods,computed with formula(13). The resultshere conformto the resultspresentedin
Table 2 for the separate relatives: the index for items 1-6 has low standard
error;that for items 9 plus 10 is much higher-so much higherthat adding
them to the first6 itemsincreasessubstantiallytheirstandard error.Indexes
* The largeaverage correlationof 0.36 foundin column4 is narrowlyempirical;we mustbe interestedin its
source.The base rois the mean oftwosamples; and C2, is about 0.5 C2. (An increaseto 0.58 by correlationof 0.16
betweentwosampleshappensto be counteracted,because the roin var(ro)/r2 are a littlehigherthan the average r
in the other4 surveys.) Thus approximatelyC ,+C2 =1.5C2 =3Cdo; and we found above C2 -_0.89(C2+Cr),
where0.89 =0.9442fromcolumn5. But we also foundthatVar (R2-Ri) -ro Var (r2 -rl) -ro2 2 Var (r) 0.84 (0.84)
2 C2 on theaverage,because Pr2r=0.16 on theaverage,and inasmuchas R =r/ro 1 on theaverage.Hence, Var (R)
-CR-=0.89 (1.5 C2) =1.33 C2 n the average; then Var(R2-R)/[(Var(R2)+Var(RO) I-0.84/1.33 =0.63 =0.802.
This explainsthesourcesoftheaveragevalue of0.80 foundin column4. The varianceofthe base rocontributesto
the denominatorbut not to the numerator.
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
520
AMERICAN
STATISTICAL
JOURNAL,
ASSOCIATION
JUNE 1968
E R,/J)
TABLE 3.-FACTORS RELATING TO THE INDEXES (I
AND INDEX CHANGES: I2 -1 =E R2j/JR1I/J
Items in Index
1. Fluctuations(s.d.) of I in 17
Surveys
2. StandardErrorsof the I
3. Effectsof Correlationson
Var (I)
4. Mean Correlationsamong the
Ri
5. Standard Errors of Change
(U2-lI)
6. Var (I2-Ii)/ [Var (12)+Var (II)l
7. Ratio ofVar to SimpleApproximation
8. Effectsof Correlationson Var
(I2-II)
12-15
12915
4.63
2.110
7.62
7.714
5.010
5.505
1.129
1.372
1.626
1.796
0.161
0.174
0.119
0.213
0.152
1.295
0.626
8.199
0.643
2.381
0.628
8.922
0.661
6.217
0.635
0.992
1.035
1.029
1.007
0.996
1.657
1.080
1.269
1.747
1.816
1-6
9, 10
4.16
1.165
8.33
7.171
1.754
1-6, 9, 10
based on the highlyvariable items12-15 would also have highstandarderrors.
On line 3 we investigatethe increaseof variance due to the correlationsbetweentherelativescomposingan index,computedas
J
Var(I')/J-2
EVar(Rj).
j
In thefirstcolumnwe have
16
Var(
I?R)foritems1-6 dividedby IR
16
EVar(Rj),
36 i
and findthis ratio to be 1.754 (actually the mean of such ratios computedfor
thefourperiods).In the absence ofcorrelationbetweentheR's, thisratiowould
be unity;the indexwould have a variance1/6 as largeas a singlerelativeon the
average.Actuallysince the ratiois 1.754,the variance of the indexis smallerin
in the
the proportion1.754/6 1/3.42; the correlationsreducethe information
6 itemsfrom6 to the equivalentof3.42 items.
These considerationsled us to computean approximateand syntheticcorrelation coefficient
as:
rs2
Var(E R;)
[
-
E
2-
Var(Ri)
Var(Rj)
+
j +
2
I Pi
? +
L
-
j-
p.
(
The results,the means offourcomputations,appear in row4. The averagingin
the last termis justifiedonly to the degreethat the variances are about equal.
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
521
SAMPLES
This conditionobtainsfairlywell forthe firstindex; also forthe fourthand fifth
indexes;it clearlyfails forthe thirdindex which mixes the low variance item
1-6 withthe highvariance items9 and 10. The increaseof the variance due to
positive correlationscan be put in the form1+p(J - 1); and the mean of J correlateditems will have a variance 1/J+p(J - 1)/J. It is encouragingto note
that the average correlationbetweenthe 6 attitudes(1-6)-also betweentheir
changes-is only about 0.16. This should overcome any suspicion that the
severalquestionsmerelyyieldthe individual'ssame generalfeelingofoptimism
or pessimism.It is also interestingthat the buyingbehaviors(12-15) are positivelycorrelatedwith an average of0.213; this may be due to generalfinancial
ability,or to buyingsituations,influencedby positionsin the lifecyclesof individual consumers. These positive tendencies appear to overcome negative
tendencieswe may suppose resultfromitemscompetingwithinbudgets.These
points could be better studied by noting the correlationsfor indivudal consumers.
Rows 5-8 of Table 3 contain results about changes in the index (I2-Il)
betweentwo periods.Each of the entriesagain representsthe mean of six computationsforthe six possibledifferences
betweenfourperiods.The outstanding
fact to note in this table is how well preservedare the several usefulrelationshipswe had notedbefore,eitherforthe singleindexorforthe changes(R2-R1)
ofsinglerelatives.Note on line 5 that the standarderrorsofdifferences
are only
slightlygreaterthan the standard errorsof individualindexes on line 2. This
relationshipis similarto that foundbetweenthe standarderrorsforthe relative
R and forthe changes (R2-R1). Again thisphenomenonis due to the highcorrelationbetween12 and I1. Its effectis measuredby the ratio ofthe Var(12- I1)
to the sum of the variances ofI2 and I,; note on line 6 that these come close to
0.64 = 0.802,the value we foundas the average in column4 ofTable 2.
We findagain, as we foundforthe difference
(R2-R1) of relatives,that the
variance of (I2-I1) can be approximatedvery well with the variance of the
mean of score changesover the base roi;the entrieson line 7 give the ratios of
the variance of (I2- 1)
to the approximatevariance of
(
J
(r2j - rj)/rojJ),
wherethe values of rojare treatedas constants.These values of rojare known
in advance; used as weightsin formula(14") this approximationleads to easier
computationsfor the complicatedvariances of (I2-Il), which are probably
more needed than those forthe indexesI. Furthermore,the easy approximations forthe variance of I2-I1 can also lead for easy approximationsfor the
variance ofthe indexesI, to the degreethat the factor0.64 ofline 6 can be consideredstable.
Finally, on line 8 we investigatethe increaseof the variance due to correlationsamongthe J different
changesof (R2j -R1.), the sum ofwhichis the index
change (U2-11), computed as Var(I2-I1)/J-22
Var(R2j-R1j). The increase
seen here forthe variances of index changes are similarto the increasesnoted
on line 3 forthe separate indexes.
4. SOME
CONCLUSIONS,
LIMITATIONS,
AND SPECULATIONS
This was not a theoreticalinvestigationwithan empiricalillustration;rather
it focusedon a connectedset of practical problemsof fairsize and generality
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
522
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
JUNE 1968
with whatevertools we could find,invent, and improvise.It explored some
analyticaluses ofcomplexsamples,and devisedmethodsfordealingwiththem.
It expands methodsof surveysamplingbeyond the estimationof means and
totals,withwhichits literaturehas been preoccupied.By concentratingon the
samplingerrorsof economicindicatorsand indexesderivedfromperiodicsample surveys,we could exploreat reasonabledepthyet withbrevitytheirnature
and sources.The resultshave directimplicationsand providea base forfurther
investigations.
Most importantare the results showing the good performancesof the 8
attitudinalvariables. They were obtained with precisionsadequate for such
smallsamples (1370 interviews),and probablyfortheirpresentpurposes.Their
are particularlygood for measuringchanges. Furthermore,the
performances
cumulativeinformationthey provide when summed into an index is particuthe correlationamongthe 6 itemsin the indexis low enoughto
larlygratifying;
equal 3.4 independentvariables. This raises questionsabout the possibleincorporationof morevariables.
Much ofthe value of the 8 attitudinalvariablesis due to apparentsuccess of
of variation. They are
theirsimple scales to yield reasonablylow coefficients
essentiallytrichotomieswith the middle groups somewhat larger than the
sides; they appear to be considerablybetter than dichotomies,so commonin
research.But a scale with5 or morepointsmay be foundwithinvestigationto
be betterstill.
of variation,
The 7 buyingintentionsare shownto possess large coefficients
because these approximatethe formV/(1-p)/pn fordichotomieswith low p
values. They are usefulonlyfordetectinglarge changes.How could theirsensitivitybe improved?A drastic increase in sample size may be feasiblefor us
only with differentfield methods,perhaps with auxiliary mail or telephone
responsesto reduce expenses.It is intriguingto speculate about possibilitiesof
increasedsensitivityby obtainingscaled responses,whichhave improvedcorrelationswithunderlyingdistributionsof actual buyingprobabilities.I believe
thatresponseresearchcould uncoverpracticalmeansforgettingmoreinformation, especially to uncover distinctbuying classes withinthe large group of
declarednonintenders.4
Concentratingon sampling errors,we cannot deal with the related basic
issuesof the validityand the predictivevalue of the economicindicators.First
come the choice of variables and methodsforobtaininganswersfromrespondents,discussedbrieflyabove.
Second, the constructionof the relatives and indexes may be scrutinized.
Could the indexesbe improvedby optimalweights,utilizingcumulatedinformation about the predictivevalue as well as the sampling precisionof the
several variables? Furthermore,the model for constructingthe index can be
4 The U. S. Census Bureau has conductedfrom1960 to 1967 muchlargerquarterlysurveysusingthe expectationsquestions9 to 15 [1967]. Investigationsof 10 pointscales forintentionsare now being put into effectby
variables that would con[Juster,1966]. I believe that empiricalpanel investigationscan also disclose different
to thepredictivevalue ofintentionvariablesand ofindexes.Some ofthesemaybe itemsofpast
tributesignificantly
status.The variablesused forpredictingvotingbehavior
and life-cycle
consumerbehavior,othersofsocio-economic
ofsimilarmethods[Campbellet al., 1960,Stokes 19661.
provideillustrations
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
523
SAMPLES
the subject of furtherinquiry.I believe that compositescoresforindividuals,
derivedfromhis several answers,could have analyticuse.
Third, the nonsamplingerrorsof responseand nonresponsecan be investigated. It has been shownthat interviewer
varianceforattitudinalvariables can
be held to reasonablelimits [Kish, 1962]; the low "designeffects"in thisstudy
tend to supportthose results.Moreover measurementbiases, that may seriously affectthe absolute validity of the variables, may still have negligible
effecton relativesand changes,to the extentthat thosebiases are held constant
over surveyswith good process control.This is probably the primaryreason
forthe use of relativesand indexes; they have meaningand comparabilitybeyondthescoreson whichtheyare based.
Fourth,should greateruse be made of reinterviews(panels of households),
and of alternativesto householdinterviews?The answersdepend not only on
samplingvariability,but also on fieldproblemsand procedures.Reinterviews
involveproblemsof responseand nonresponse,due to the multipleuses of the
same surveyrespondents.It seems likely that the standard errorsof changes
could be reduceddrasticallywithjudicious overlapsofthe comparedsuccessive
samples.On the otherhand, overlapswiththe base year appear neitherfeasible
nor needed.
5. VARIANCES
FOR COMBINATIONS
OF RATIO MEANS
The ratio mean used in these surveysand in many othershas the form(1)
explainedin section1; its varianceis estimatedby
var(r)
x [var(y) + r var(x)
=
-
2r cov(y,x)]
= xE
dzh
E2[dzh/x]2,
(4)
h
where
2
dZh -
ah
ah -
ah
E
[(Yha 1 a
-
Yh/ah)
Xh/ah) ]2.
r(Xha -
(5)
Similarto the variance,the covarianceoftwo scoresr1and rk,based on samples
fromthe same primaryselections,is:
cov(rj,
rk) =
>
h
)
(3h
=
X
(XjXk)
'E
h
dzjhdZkhd
(4')
Derivationsand justificationsof these simplifiedsingle-stagecomputationsfor
multi-stagesamples may be found in several places [Hansen, Hurwitz, and
Madow, 1953; Kish and Hess, 1959; Kish, 1965]; finitepopulationcorrections
forsamplingwithoutreplacementsare trivial and neglected.Often,as in our
sample, two primaryselections(a and b) are drawnfromeach stratum;with
ah = 2 the basic computingunit takes the form
dzh =
[(Yha -
Yhb) -
r(Xha -
Xhb)].
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
(5')
524
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
JUNE 1968
Two replicationsper stratumhas long been a basic design in the practice of
surveysampling,because it permitsutmoststratification
forrandomizedmodels ofsimplevariance estimates.It has been exploitedby Deming [1960], Kish
and Hess [1959],Kish [1965]; its simplicitywas emphasizedand generalizedby
Keyfitz[1956], including(6) below. However, contraryto some currentopinion, paired selectionsare neithernecessarynor sufficient
for simple variance
computations.First,the computingadvantage of ah =2 over ah> 2 consistsof
the type of advantage (5') has over (5); furthermore
forah> 2, it is possible to
formah(ah- 1)/2 paired contrasts.
Second, whetherah =2 or not, simple computingformsmust be based on
quadratic formsconsistingof simple additive termsof replicationaggregates.
Theirneglectmay explain partly,I believe,the lack of variance computations
forlinearcombinations,indexes,etc. in surveyliterature.The computingforms
belowhave the desiredsimplicity,and are mutuallyadaptable forlinearcombinations,relatives,indexes,and theircomparisons;eitherforthe desk computers
we used in 1961 or the electroniccomputerswe would use in 1968. For example,
either(5) 0r (5') permitreadilythe computationof variancesforlinear combinlationsZWjrjofseveralscores,wherethe Wj are constants:
var(
Wjir) =
w
E
i
h
E
E
) (wi?)
(Wj
Xj
j<kh
Xj
Xk
~~ (WjdZ?)2+Wj (w#)(WkdZk)
E
~[zEW
-
h
+ 2
(6)
dzjh]2
j
Xj
The finalformincorporatesall variancesand covariancesofcombinedvariables
withinstrata. An example of (6) is the changeof score(r2-rl) fromperiod 1 to
period2; hereJ= 2 with W2= 1 and W = -1:
var(r2
)
-
ri)
E(d
dZ2h
h
-
X2
2(x2xl)1
2
dZlh
=d
2
h2
d2
2
h
X1
2
dZ2?
+
l
2Ed
h
zlh
(7)
E dZ2hdzlh.
h
Anotherexamplewould be the simplesummedscore:
Jr =E
8ha
i
whenall wj = 1,all xi = x; and E
rj = E yj/x,
r
var (
i
yj/x) x-x2 E (
j
h
dZjh)2 =
j
X 2
Yjha=
(8)
E (dsh- Jrdxh)2.
h
The relativeforany currentyear is the ratio of the score forthat year to the
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
525
SAMPLES
FROM COMPLEX
FOR INDEXES
score for the base year, RI = ri/ro:
R, = ri/ro =
= (y1x0)/(0x).
(y1/x1)/(y0/X0)
(3)
The bases, ro,and yoare generallyset safelyfarfromzero,sometimesby setting
ro= 100 arbitrarily.The values of x in the denominatorsof all scores must be
such that, comparedto samplingvariability,we can disregardthe possibilities
ofvalues near zero. This requirementis met readilyin our sample,wherethe x
values denote sample sizes under good control [Kish, 1965, 7.1]. We treat
= ri/rosimplyas the ratio of two random variables, the variance of which
may be estimatedwith the same formula(4) that we use forr= y/x,then we
have
-y [var(ri) + R var(ro)
var(R1)
-
2R1 cov(r1,ro)].
(9)
ro
This bringsour onlyseriousnew theoreticalproblems.One approachis through
methodscalled "propagationof errors"or "delta method." Cramer [1946] and
C. R. Rao [1952] give classical presentations;the oij./Vn termsof the covariance matrixneed to be generalizedto accommodate the covariance termsof
complexsampling.The treatmentofKendall and Stuart [1958] is moregeneral.
Note the contributionsof Hajek [1958], and Brillingerand Tukey [19643, or
descriptionsin Deming [1960,390-396], and Kish [1965,12.11 and 14.3].
We must and can be satisfiedwith a large sample approximation;most surveys are based on large enough samples. The derivationsused to obtain (9)
need few additional restrictions.Mean square errorscan be used instead of
variancesforri and ro,and small biases will not destroytheserelationships.We
investigatedand foundthese biases to be trivialor absent. The bias of R1 was
of variationof
foundto be real but negligible;most important,the coefficient
the denominator,Croy is small enough(Section 6).
When yi z 0 can be safelyassumed,the variance ofthe "double ratio" can be
expressedin the symmetricalform:
var(R
)
=
R2
2
E
dyo dXO\]2
/dy1 dx1\
[dy-
h -
X1
Y
h)
Yo
-
j
Xo
(9')
This form,withoutstratificationis attributedby Yates [1953 and 1960] to
it.
Keyfitz,and derivedin detail by Rao [1957], who has been reinvestigating
It was mentionedbrieflyby Hansen, Hurwitz,and Madow [1953,I, 514], with
a simpleextensionto 11ujl/lvj,but thevalidityofthe approximationforproducts
of several variableswould requireappropriatechecks.
The followingformof the variance is preferableforcomputingconvenience,
and because it avoids divisionby yi,whichmay be small and unstable.Its first
expressionlinks(9') to (9):
var(Ri)
r-2E
h
dZlh X1
R1
Xo
)
=
E (elh- Rle0h),
h
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
(9")
526
AMERICAN
whereelh =
(dylh -
dzlh
~~=
STATISTICAL
rldxlh)
,
Yo
JOURNAL,
dyoh -
dzoh
an1deOh =
rox1
rox1
ASSOCIATION
-
JUNE 1968
rOdxOh
-
Yo
(10)
The computingunit (dZh/X) is commonto precedingformulas(5-8), and the
of two relasimplederivedunit eh is basic to (9-14) forindexes.The difference
tives measuresits change betweentwo periods and its variance appears term
by termas:
var(R2 -R1)
-
-
h
[(e2h -
Zh (e2h -
elh)2
rO
+
(R2 -
h
var(r2
-
h
eoh
(11)
elh)
+ (R?2-R1)2r0
r) -2~~~~~~~~
Rl)ro [cov(ro, r2)
-2(R2-
RleOh)]
R1)2
R1) Ej eoh(e2h-
- 2(R2 -
(elh -
R2eOh) -
-
var(ro)
cov(ro, r1)].
(11')
The last expressioncan also be obtained immediatelyby considering(R2-R1)
to
andapplying
(4) directly
= (r2-rl)/ro
as theratiooftworandom
variables,
thefirst
term,
approximation:
thatratio.Theexpression
(11')leadstoa useful
whichcan be computedwithsimplermethods(7) and with data onlyfromthe
currentsurveys,provesto be an excellentapproximationforthe entireexpression. Thus the easily computedvariance fora change in scores can be used to
varianceofa changein relatives.
estimatethe moredifficult
The variancefor(R2-R1) is but a specificexample (J = 2, W1= 1, W2 = -1)
ofvariancesforlinearcombinationsZjWIRV ofrelatives:
var
E WjR
E
[E
Wj(ejh -
Rjeojh)
(12)
This resultfollowsreadily,as (6) did, fromthe simple additive quadratic
Squaring the brackets yields all
formsof the computingunits (ejh-Rjejh).
Our present interestis in the
strata.
terms
within
variance and covariance
all
=
with
is
I
whose
variance
(12)
index EjRj/J,
Wj =1. This is the easier comto
the
wanted
also
sum of variances fromthe
separate
putingform,but we
relatives(3'); we used:
between
correlation
the
covariancesto estimate
average
(
- var(Ri)+ 2 E cov(R1,RIk)
Var , Ri)
j<k
=
S
h
E
j
(ejh - Rjeojh)2 + 2
E
h j<k
(13)
(ejh - Rjeojh) (ekh - Rke0kh).
of two indexes [12-11]; this
Finally,we want the variance of the difference
is (12) withall w2j = 1 and all wj= -1:
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
var( E2R2Y -
E2Rii)
-
[
[>2{
EE2
h
-
>2
h
-
-
E2{(e2jh
_
i
[ >2
j
R2jeO3h) -
(e2jh- elmh)-
(e2jh -
[
2> E
(e2jh
-
elh)]
(eljh -
RlfeOjh) }]
~~~~~~~~~~~~~~~~2
(14)
(R2j- R1j)eojh}
E
+
elih)]
527
SAMPLES
FROM COMPLEX
FOR INDEXES
h
[
[
E
j
(R2j
- Rij)eoj]
(R2j - Rij)eoj]
(14')
The threetermscorrespondto the analogous termsin equations (11) forthe
variance of the differenceof relatives,but comprisethe summationsfor all
J) relatives.The firsttermconsistsof the variances and covari(j= 1, 2,
ances of the score differences(r2j-r1j) for the differentrelatives (j= 1, 2,
* J), and includingthe factorsrWj2withoutsamplingvariation, i.e., as a
constantforeach j:
>2 [ >2(e2jhh
i
elih)]
>2roj var(r2J i
+ 2
rij)
>2rIj'r1 cov[(r2-
j<k
r), (r2k -
r1k)]
(14"1)
We find(line 8, Table 3) that the covariances among the change of relatives
are importantand that they are in proportionto the covariancesof the relatives,the second termof (13). But we also find(line 7, Table 3) that thisentire
firstterm(14") is an excellentapproximationforall the threetermsof (14') for
the variance of index changes-just as the firsttermof (11') was forthe variance ofchangesofindividualrelatives.Thus the second and thirdtermsof (14')
can be neglected often-when the factors (r2j-r1j)/ro are uniformlysmall
enough.
6. TECHNICAL
NOTES
Whenusingstandarderrorsforthe ratio mean r = y/x, one may be interested
in its bias. These werecomputedand all foundto be negligible,muchless than
1 per cent of the values oftthestandard errors,and appeared, when tested,to
vary haphazardlyaroundzero. This was expectedin lightof previousresearch,
ofvariationof about .04 forthe sample
and in lightofthe computedcoefficient
size x. [Kish, Namboodiri,and Pillai, 1962,and Kish, 1965,7.1]. We expectthe
relative R = r/ro to have a ratio bias, because the bias is known to have this
bias ratioto thestandarderror:Bias (R)/S.E.(R) =-PRrOCro.
Sincewe expect
the correlationPR,O to be negative,we shouldget a positivebias. We can also exofvariation
pect its ratioto the standarderrorto be less than Cr0,the coefficient
the
base
scorero.
of
Bias effectswere computed,and presented [Kish, 1962b]; we shall merely
summarizethem. 1) The correlationpR,rwas large; it averaged -0.52, ranging
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
528
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
JUNE 1968
from-0.37 to -0.61 forthe 15 variables. Because of this the existenceof a
positivebias could be measured. 2) Neverthelessthe bias was negligible;the
bias ratiosrangedfrom0.49 to 1.06 of 1 per cent forthe 8 attitudeitems,and
1.92 to 5.00 of 1 per cent forthe 7 expectationitems. 3) The bias ratios were
keptlow by low values of Cro.These rangedfrom1.04 to 1.88 of 1 per cent for
the 8 attitudeitems,and from4.12 to 8.32 of 1 per cent of the 7 expectation
items.The bias ratio of r/rocan be kept low by keepingCr0low. 4) The two
biases tend to cancel fromthe difference(R2-R1) of two relatives. 5) When
summedin indexes the bias ratios tend to grow,because the biases sum up,
whereasthe standarderrorsdo not increaseas fast.But the worstcase we could
findis forthe six-itemindexin column5 ofTable 3; the ratio ofrelativebias to
of variationis 0.438/5.505=.080; this would increasethe mean
the coefficient
square errorby onlythe factor1+.0802 = 1.0064. 6) The predictabilityof relationspermitsadjustmentwhenit becomesdesirable.
In each situation to estimate an average standard errorwe computed the
mean of the standard errorsratherthan the square root of the mean of the
variances.Thus we accept a small bias in orderto avoid a largerrandomerror
in the estimatedaverage.
In column3 ofTable 1 we not onlygave themeans ofthesixreplicatecomputations of standard errors,but also in parentheses the standard deviations
fromthe
betweenthesix values foreach. We took advantage ofthe information
repeated computationsperformedon what are essentiallysix replicationsto
ofvariationforthe standarderror.(That the
obtainestimatesofthe coefficient
replicationsare not independenthad no importanteffecton these computations.) The computationof the standarderrorsshould be subject theoretically
and approximatelyto a coefficientof variation of about 1/V/2X45=.106
because the 45 pairs of comparisonsresultin about 45 degreesof freedom.We
shouldactuallyexpectsomewhatmore,because of probableskewness,of f2> 3,
and because the 90 primaryselectionsvary somewhatin size. The 15 computed
values vary from.06 to .20 and they average .128. We should expect this to be
an overestimatebecause ofvariationsbetweensurveysin measurementsand in
of variationapplied to the values of VDeff
sample design.A similarcoefficient
is fortuitous,though
averages.120, also varyingfrom.06 to .20; the difference
in the expected direction;we expect VDeff to vary somewhat less than the
samplingerrorbecause herethe variationsin sample size and in the level of the
variatesbetweensurveysare largelyeliminated.But we can use .12 as a workofvariationofthe standarderrors.
ingestimateofthe coefficient
We also took advantage of the fourreplicatedcomputationsof the standard
ofvariations
errorofeach relativeto estimatetheirvariability.The coefficients
of the standard errorsaveraged .118, resemblingclosely the results for the
scores.
REFERENCES
Variances,Moments,Cumutants,
D. R. and Tukey,J.W. [19641,Asymptotic
[11 Brillinger,
and OtherAveraaeValues (Princeton:Memorandum).
[2] Campbell,A., Converse,P. E., Miller,W. E., and Stokes,D. E. [19601,TheAmerican
Voter,New York: JohnWileyand Sons,pp. 72-4.
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions
STANDARD
ERRORS
FOR INDEXES
FROM COMPLEX
SAMPLES
529
[3] Cramer,H. [1946],MathematicalMethodsofStatistics,Princeton:PrincetonUniversityPress,28.4.
[4] Deming,W. E. [1960],Sample Design in BusinessResearch,New York: JohnWiley
and Sons.
[5] Hajek, J. [1958], "On the theoryof ratio estimates,"Bulletin of the International
StatisticalInstitute,219-26.
[6] Hansen, M. H., Hurwitz,W. N. and Madow, W. G. [1953],Sample SurveyMethods
and Theory,New York: Wileyand Sons,VolumesI and II.
and Purchase Probability,"Journalof
[7] Juster,F. T., "ConsumerBuyingIntenitions
theAmericanStatisticalAssociation,61, 658-98.
Ann Arbor:Institutefor
[8] Katona, G. and Mueller,E. [1957],ConsumerExpectations,
Social Research.
Society,New York: McGraw-Hill.
[9] Katona, G. [19641,The Mass Consumption
[10] Katona, G., Mueller, E., Schmiedeskamp,J., Sonquist, J. A. [1966], 1966 Survey
ofConsumerFinances,Ann Arbor:InstituteforSocial Research.
[11] Katona, G. [1967], "Anticipationstatistics and consumer behavior," American
Statistician,21, 12-13.
[12] Kendall, M. G. and Stuiart,A. [1958], The AdvancedTheoryof Statistics,London:
Griffin
and Company,VolumeI, 10.6.
[13] Keyfitz,N. [1957],"Estimatesof Sampling VarianceWhereTwo UnitsAre Selected
fromEach Stratum,"JournaloftheAmericanStatisticalAssociation,52, 503-10.
[14] Kish, L. and Hess, I. [1959[,"On VariancesofRatios and TheirDifierencesin MultiStage Samples,"JournaloftheAmericanStatisticalAssociation,54, 416-46.
[15] Kish, L. [19621,"StudiesofInterviewerVarianceforAttitudinalVariables,"Journal
oftheAmericanStatisticalAssociation,57, 92-115.
[16] Kish, L. [1962b],"Variancesforindexesfromcomplexsamples,"Proceedingsof the
Social StatisticsSection,AmericanStatisticalAssociation.190-99.
[17] Kish, L. Namboodiri,N. K. and Pillai, R. K. [1962],"The Ratio Bias in Surveys,"
JournaloftheAmericanStatisticalAssociation,57, 863-76.
[18] Kish, L. [1965],SurveySampling,New York: JohnWileyand Sons.
and BuyingIntentions,Manuscriptto be
[19] Maynes, E. S. [19681,ConsumerAttitudes
publishedby the InstituteforSocial Research,Ann Arbor,Michigan.
[20] Mueller,E. [1963], "Ten years of consumerattitude surveys: Their forecasting
record,"Journalof theAmericanStatisticalAssociation,58, 899-917.
[21] Rao, C. R. [1952],AdvancedStatisticalMethodsin BiometricResearch,New York:
Wileyand Sons, 58.1.
[22] Rao, J. N. K. [19571,"Double ratio estimatesin forestsurveys,"JournalofIndian
Statistics,9, 191-204.
SocietyofAgricultural
[23] Stokes, D. E., [1966] "Some dynamic elementsof contestsfor the presidency,"
TheAmericanPoliticalScienceReviews,60, 19-28.
[24] U. S. Bureau of the Census [1967],ConsumerBuyingIndicators,CurrentPopulation
Reports,SeriesP-65.
[25] Yates, F. [19601,SamplingMethodsforCensusesand Surveys,ThirdEdition,London:
and Company,Ltd., pp. 343.
CharlesGriffin
This content downloaded from 193.0.66.11 on Fri, 25 Jul 2014 11:52:52 AM
All use subject to JSTOR Terms and Conditions