Randomization procedures and sexual dimorphism

Current events
Charles A. Lockwood
& Brian G. Richmond
Doctoral Program in Anthropological
Sciences, State University of New York,
Stony Brook, NewYork 11794-4364,
U.S.A.
Randomization procedures and sexual
dimorphism in Australopithecus
afarensis
William L. Jungers
Department of Anatomical Sciences,
State University of New York,
Stony Brook, New York 11794-8081,
U.S.A.
William H. Kimbel
Institute of Human Origins,
1288 Ninth St, Berkeley,
California 94710, U.S.A.
Journal of Human Evolution (1996) 31, 537–548
Randomization procedures have recently been used in a variety of applications in the field
of paleoanthropology, centered on the assessment of taxonomic composition and sexual
dimorphism of fossil samples (e.g., Richmond & Jungers, 1995; Kramer et al., 1995; Cope &
Lacy, 1995; Grine et al., 1996). In particular, Richmond & Jungers (1995) used the exact
randomization method to assess the probability that the range of size and shape variation in
specimens attributed to Australopithecus afarensis could be found in comparative samples of
extant hominoids. To compare size, the exact randomization method involves computing all
possible pairwise ratios of size variables in the comparative sample and determining what
percentage of these ratios exceed the maximum ratio found in the fossil sample.
Based on the available sample of mandibles, proximal femora and humeri attributed to
A. afarensis, Richmond & Jungers (1995) concluded that the size range of variation for each
element was so rarely exceeded by ratios between same-species pairs of extant hominoids that,
if the Hadar remains are accepted to represent a single species, a degree of sexual dimorphism
at least as great as that of the most dimorphic living apes (gorillas and orang-utans) is probable.
While exact randomization is an appropriate procedure to determine the relative affinities
between two fossil specimens, there are potential problems with focusing exclusively on the
extremes of a sample without taking into account the presence of intermediate specimens. One
alternative is to compare the total distribution of all pairwise comparisons among fossils with
those among reference samples. However, because fossil specimens are differentially preserved,
there is clearly value in finding a method with which the extremes of size in a given sample can
be used to test the null hypothesis of a single species. Bootstrapping is a technique for estimating
standard errors and confidence limits that involves resampling with replacement from the
population (Efron & Tibshirani, 1993). Resampling from reference samples at sample sizes
equal to that of the fossil sample is a form of bootstrapping that may be suitable for comparing
extremes or, in some cases, statistics of variation such as the coefficient of variation (CV). Given
the assumptions of independent and random sampling in the fossil record, the random samples
created using this method approximate potential ‘‘fossil samples’’ for each extant hominoid
species.
0047–2484/96/120537+12 $25.00/0
? 1996 Academic Press Limited
538
. . 
ET AL.
Using this more conservative, and probably more appropriate, resampling procedure, the
present study compares size variation in A. afarensis with that of the same reference samples
used by Richmond & Jungers (1995). We ask the question, what is the probability of sampling
a set of n individuals from an extant hominoid species whose size variation is greater than that
present in a sample of n individuals of A. afarensis? In other words, we acknowledge that more
than two individuals contribute to the pattern of variation in the fossil assemblage (cf. Kramer,
1993; Cope & Lacy, 1995).
The corresponding null hypotheses are that for each hominoid reference taxon and for each
skeletal element, the size variation in A. afarensis does not exceed that of the reference group.
Rejection of the null hypothesis for each group would support an alternative hypothesis that
multiple species are included in the Hadar sample. Rejection of the null hypothesis for some
but not all groups would support the hypothesis of a high degree of sexual dimorphism in
A. afarensis relative to those groups whose size variation is less.
Methods
The mandibular corpus, the proximal femur and the humerus are used here to examine sexual
dimorphism. The measurements of these bones and the reference samples of modern
hominoids are described in Table 1 of Richmond & Jungers (1995). In brief, the reference
samples are from multiple populations or subspecies of Homo sapiens, Gorilla gorilla, Pan
troglodytes, and Pongo pygmaeus. Total sample sizes for each element in each hominoid species
range from 48–50 for the mandible and humerus, and from 31–41 for the proximal femur.
Corpus breadth and height at P4/M1 are measured for mandibles. Eight measurements are
used to describe the proximal femur. The size variable for each individual mandible or
proximal femur is the geometric mean, which equals the nth root of the product of n
measurements. The geometric mean, therefore, condenses the size information from a number
of variables into a single dimension (Mosimann, 1970; Jungers et al., 1995). For comparisons
of humeri, the size variable is simply total humeral length.
Before we address questions of probability, it is useful and appropriate to establish some
estimate of average sexual dimorphism in A. afarensis for the elements being considered. This
is done herein only for the mandibles of A. afarensis, as the sample size and attendant accuracy
are higher than for the other elements.
To estimate the average degree of sexual dimorphism in the mandible, indices of sexual
dimorphism (ISDs) are first calculated for the extant hominoids. The ISD is the ratio of
male to female mean values for geometric means. If the CVs of geometric means for the
mandibular corpora are correlated with the ISDs for our reference samples of extant
hominoids, it may be postulated that the CV for the Hadar sample can be directly
compared with those of extant hominoids in order to ascertain the degree of sexual
dimorphism in A. afarensis. In addition, a mean index of sexual dimorphism may be
estimated from the CV by a reduced major axis regression analysis analogous to those used
by Fleagle et al. (1980), Kay (1982) and Leutenegger & Shell (1987) to estimate degrees of
sexual dimorphism in the canines of extinct anthropoid species without a priori knowledge
of specimen gender. Plavcan (1994) suggests that this method may overestimate the degree
of sexual dimorphism present in a species with low sexual dimorphism. This is a fault
common to most estimators of sexual dimorphism (see also Josephson et al., 1996), and we
regard the resampling procedures described below as another opportunity to test the validity
of the CV-based estimate of sexual dimorphism.
  
A. AFARENSIS
539
The method of hypothesis testing used here is a resampling method commonly referred to
as bootstrapping. Bootstrapping was developed to calculate standards errors for unconventional statistics, although the term is also applied to general methods of resampling to
determine the validity of a result from a single sample (Efron, 1979; Efron & Tibshirani, 1993;
Manly, 1991; Sokal & Rohlf, 1995). We use bootstrapping to simulate random samples of
extant taxa comparable in sample size with those of the fossil record. By assessing the
probability of finding a certain degree of size variation in a modern taxon, it is possible to
determine the validity of directly comparing statistics of size variation.
The assumption that these random and independent samples represent a legitimate set of
comparisons for the fossil record may be criticized on various grounds. For reasons related to
the socioecological structure of the group being sampled and the type of fossil deposition, a
fossil sample may be biased toward one sex or the other or toward one end of the body size
range. However, because there is no a priori reason to suspect a specific bias at Hadar, we have
chosen not to explore these alternatives here. Another consideration is that the time depth
represented in some fossil assemblages could artificially increase the apparent range of
variation. As suggested by Richmond & Jungers (1995), the inclusion of multiple subspecies or
populations in the extant reference samples may address this problem in part.
For each skeletal element, 1000 random samples of geometric means are selected with
replacement from each of the extant hominoid groups. This is a sufficient number of
replications to detect significance at the P=0·05 level for all but borderline cases (Manly, 1991;
Efron & Tibshirani, 1993). Two statistics of variation are used: the max/min ratio and the CV.
The bootstrap procedure is performed for pairs of size variables (for max/min ratios) and for
sample sizes equal to the number of specimens in the hypodigm of A. afarensis for each element
(for max/min ratios and CVs). Pairwise sampling of max/min ratios should produce results
similar to the exact randomization procedure used by Richmond & Jungers (1995). Using
larger sample sizes in the bootstrapping analysis should lead to larger ranges of variation in the
extant hominoids (Cope & Lacy, 1995).
The maximum sample size for fossil specimens is 17 for the mandible, five for the proximal
femur, and three for the humerus. The sample of 17 adult mandibles whose corpus height and
breadth can be measured at P4/M1 includes eight undescribed specimens. The measurements
for two of these, A.L. 444-2b and A.L. 417-1a, were also used by Richmond & Jungers (1995),
but six other undescribed specimens have been added for the current study. The largest
mandibular corpus is that of A.L. 438-1g (undescribed), and the smallest is that of A.L. 207-13.
For the proximal femur, the sample size of five contains those specimens of A. afarensis for
which absolute size in the dimensions used here can be compared, although the full suite of
measurements may not be available. These are A.L. 211-1, 288-1ap, 333-3, 333w-40
(Johanson et al., 1982; Lovejoy et al., 1982), and A.L. 333-123 (undescribed). A recently
discovered proximal femur from Hadar, A.L. 600-1, is larger than all of the others
(Johanson et al., 1994), and Richmond & Jungers (1995) discussed the possible implications
of this specimen for sexual dimorphism in A. afarensis. However, A.L. 600-1 is now
recognized as belonging to a large felid of the genus Homotherium. Therefore, of hominid
specimens, A.L. 288-1ap and A.L. 333-3 represent the smallest and largest specimens,
respectively, based on rank order determinations of overall size using original fossils or casts.
These two specimens are also the most complete and can be measured for each variable
described by Richmond & Jungers (1995). Therefore, the ratio of the geometric means
between these specimens is used in comparison with random samples of two and five from
extant hominoids.
. . 
540
Table 1
ET AL.
Summary statistics for mandibular geometric means
Mean (mm)
n
Min
Max
Standard deviation
CV
Standard error of CV
Male mean
Female mean
ISD*
Humans
Gorillas
Orang-utans
Chimpanzees
Australopithecus
afarensis
21·9
50
18·7
25·4
1·45
6·67
0·68
22·3
21·5
1·038
27·9
50
21·3
36·0
3·41
12·3
1·27
30·1
25·7
1·174
25·9
48
20·5
32·1
3·05
11·8
1·24
28·0
23·9
1·179
20·5
50
15·6
24·1
1·59
7·78
0·79
21·2
19·9
1·065
26·0
17
22·0
31·3
2·99
11·7
2·06
N/A
N/A
1·167
CVs are calculated using Sokal & Braumann’s (1980) correction. ISD=male mean/female mean.
*ISDs are reported to three decimal places for correspondence with Figure 1; that for A. afarensis is an estimate based
on CV comparisons.
For the humerus, three specimens are preserved well enough to assess overall size in the
sense of total humeral length, and of these A.L. 288-1m and MAK-VP-1/3 are the smallest
and largest, respectively (Richmond & Jungers, 1995).
A second method used to assess the degree of size variation is the coefficient of variation
(CV), which is less affected by sample size than is the max/min ratio. Cope & Lacy (1992,
1995) have suggested with simulated single- and multiple-species samples that CV comparisons
are more powerful than range-based statistics for assessing the taxonomic diversity within a
sample. On the other hand, the primary assumption made in calculating a CV—sampling
from a normal distribution—holds less and less in samples with increasing levels of sexual
dimorphism (Sokal and Braumann, 1980).
The chief logistical problem with using the CV is that to assess size variation using the
geometric mean, data for each measurement are required for each specimen in the hypodigm.
For the present purposes, the CV is only deemed applicable for the mandibles and perhaps the
humeri. Making inferences from a CV based on a sample size of three humeri may be
questioned. In an absolute sense, inferences would probably be unwarranted; relatively
speaking, the humeri serve as a good test for the applicability of the bootstrapping procedures.
CVs are calculated for each random sample of extant hominoids using Sokal & Braumann’s
(1980) correction for small sample sizes, and the frequencies with which these exceeded the
CV of the geometric means for specimens of A. afarensis are determined.
Results
Estimating average sexual dimorphism in the mandible
Summary statistics for the mandibular samples are presented in Table 1. For extant
hominoids, the species ISD values are highly correlated with the species CVs (r=0·996). The
CV of A. afarensis mandibles is 11·7, which is similar to those of orang-utans and gorillas. A
reduced major axis regression of ISD values on CVs yields an ISD estimate of 1·167 in
mandibular size (Figure 1). Note that this estimate is reported as 1·167 for correspondence to
Figure 1; it is 1·17 to the correct number of significant figures.
  
1.2
Orang-utans
r = 0.996
ISD
Gorillas
*
A. afarensis
(estimated ISD =
1.167)
ISD = (0.0258* CV) + 0.865
1.15
541
A. AFARENSIS
1.1
Chimpanzees
1.05
Humans
1
6
7
8
9
CV
10
11
12
13
Figure 1. Reduced major axis regression of ISD on CV for extant hominoid mandibular geometric means.
ISD=male mean/female mean.
Max/min ratios
It is clear from Table 2 that the bootstrap method using n=2 produces similar results
to the exact randomization procedure for max/min ratios. However, for some reference
groups, the results from the bootstrap diverge dramatically from those of the exact
randomization when sample sizes larger than two are considered. The probability of finding
a max/min ratio of geometric means greater than 1·42 (the ratio for A. afarensis) in
random samples of 17 gorilla mandibles is 83·3%, and for orang-utans 73·7%. These
proportions are expressed graphically in Figure 2. At the other extreme, a change in the
sample size used to randomly sample human mandibles does not affect the results,
because no two humans in the comparative sample of 50 individuals display a size
difference as great as that of A. afarensis. Chimpanzees are intermediate, as there is a
25·6% probability of sampling a max/min ratio of mandibular size greater than that of
A. afarensis.
The range of size among five proximal femora of A. afarensis is relatively great no
matter what comparative group is used (Table 2; Figure 3). Again, the results for max/min
ratios using 1000 samples of two individuals closely approximate those from exact
randomization. Those for samples of five individuals suggest that the probabilities of
randomly obtaining max/min ratios greater than 1·37 in gorillas and orang-utans are
35·6% and 14·7%, respectively. Some caution should be applied to the result for
orang-utans, as the sample size of the reference sample is n=31, the smallest for any
reference sample in the current study. As the max/min ratios for the entire reference
samples of either chimpanzees or humans are exceeded by that of five specimens of
A. afarensis, the probability of duplicating the size range of A. afarensis from samples of these
extant hominoids remains zero.
Although only three humeri of A. afarensis are preserved well enough to assess their
overall length, it is important to recognize the differences between repeated sampling of two
individuals from reference samples (comparable with exact randomization) and repeated
sampling of three individuals (Table 2). The probabilities of sampling a ratio greater than
1·24 more than double for each reference sample of extant hominoids when three
individuals are sampled, and the results in this case are similar to those obtained for the
proximal femur.
. . 
542
16%
14%
12%
10%
8%
6%
4%
2%
0%
ET AL.
Gorillas
20%
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
Orang-utans
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
Chimpanzees
30%
Humans
= 1.42
25%
20%
15%
10%
5%
0%
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Figure 2. Frequency histograms of geometric mean max/min ratios calculated from 1000 random samples
of 17 extant hominoid mandibles. The vertical line marks the max/min ratio of the Hadar sample (n=17).
Coefficient of variation
There is a 59·5% probability of obtaining a CV as high as that of A. afarensis (11·7) in 1000
samples of 17 gorilla mandibles (Table 2). For orang-utans, this probability is 49·8%, while in
chimpanzees and humans, the same result is highly improbable (1·4% and 0%, respectively).
Figure 4 illustrates the distinction between chimpanzees and humans on the one hand, and
gorillas and orang-utans on the other, in the distribution of randomly sampled CVs. The value
for A. afarensis falls in the more dimorphic group. Whether A. afarensis is compared with
chimpanzees, orang-utans or gorillas, there is less likelihood of sampling the CV of mandibular
  
Table 2
543
A. AFARENSIS
Probabilities (%) of sampling statistics of size variation in extant hominoid samples
greater than those that describe Australopithecus afarensis
Mandibles
Max/min
CV
Max/min (pairs)
Exact randomization (pairs)
Proximal femora
Max/min
Max/min (pairs)
Exact randomization (pairs)
Humeri
Max/min
CV
Max/min (pairs)
Exact randomization (pairs)
Sample size
Gorillas
Orang-utans
Chimpanzees
Humans
17
17
2
2
83·3
59·5
*3·2
*3·8
73·7
49·8
*2·4
*2·7
25·6
1·4*
0·8**
0·7**
0·0**
0·0**
0·0**
0·0**
5
2
2
35·6
7·0
6·5
14·7
*2·3
*2·6
0·0**
0·0**
0·0**
0·0**
0·0**
0·0**
3
3
2
2
33·7
29·2
15·5
17·5
19·5
15·4
8·6
7·6
5·3
3·2*
2·0*
1·9*
2·7*
1·5*
0·5**
0·9**
Based on 1000 random samples. All probabilities (%) for mandibles and proximal femora are reported for
comparisons of geometric means; those for humeri are based on total humeral length. Exact randomization results for
proximal femor and humeri are from Richmond & Jungers (1995); those for mandibles have been recalculated using
identical methods for the expanded sample. CVs are reported using Sokal & Braumann’s (1980) correction for small
samples. See text for explanation of ‘‘Sample size’’.
*<5·0% **<1·0%.
size than of sampling the max/min ratio (Table 2). In addition, a comparison between Figures
2 and 4 suggests that the CVs of 1000 samples from each hominoid species are more normally
distributed than are the max/min ratios.
For total humeral length, the probabilities of obtaining a CV greater than that of A. afarensis
(12·9) are consistently slightly less than when comparing max/min ratios. Nonetheless, the
results from the two techniques are compatible with each other and with those from max/min
comparisons using the proximal femur. Size differences comparable with those of A. afarensis in
the proximal femur and the humerus are less frequently found in extant hominoid samples
than are those in the mandible.
Discussion
The estimate of average sexual dimorphism for mandibular size in A. afarensis, based on the
CV of geometric means, gives a result (ISD=1·167) that is compatible with a single species
that possesses a relatively high degree of sexual dimorphism. This result naturally leads to
the testing of several hypotheses. The primary concern when we initiated this study was that
pairwise, exact randomization could lead to an unacceptably high rate of type I error—that
is, rejecting a null hypothesis that actually is true—in cases where the fossil sample size is
greater than two. A method of repeated sampling from the comparative samples, setting n
equal to that of the same element for A. afarensis, may be a more relevant comparison of like
with like.
This application of bootstrapping effectively documents the previously recognized
dichotomy between humans and chimpanzees on the one hand, and the strongly sexually
dimorphic orang-utans and gorillas on the other (e.g., O’Higgins et al., 1990; Wood et al.,
1991). While the probabilities using the most sexually dimorphic reference samples are much
. . 
544
14%
12%
10%
8%
6%
4%
2%
0%
ET AL.
Gorillas
20%
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
Orang-utans
Chimpanzees
16%
14%
12%
10%
8%
6%
4%
2%
0%
Humans
14%
12%
= 1.37
10%
8%
6%
4%
2%
0%
1
1.1
1.2
1.3
1.4
1.5
1.6
Figure 3. Frequency histograms of geometric mean max/min ratios calculated from 1000 random samples
of five proximal femora for each extant species. The vertical line marks the max/min ratio of the Hadar
sample (n=5).
higher than those from exact randomization, the probabilities using less dimorphic taxa are
more similar to the previous results (Richmond & Jungers, 1995). This is especially clear using
the coefficient of variation for mandibular size (Figure 4).
The results from the bootstrapping analyses further indicate that the null hypothesis can be
rejected for chimpanzees and humans; the size range of A. afarensis is not equivalent to that
  
12%
545
A. AFARENSIS
Gorillas
10%
8%
6%
4%
2%
0%
14%
12%
10%
8%
6%
4%
2%
0%
16%
14%
12%
10%
8%
6%
4%
2%
0%
Orang-utans
Chimpanzees
20% Humans
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
2.5
5
= 11.7
7.5
10
12.5
15
1.75
Figure 4. Frequency histograms of CVs calculated for geometric means of the mandibular corpus. One
thousand random samples of 17 extant hominoid mandibles are included. The CV of the Hadar sample
(n=17) is indicated by the vertical line.
displayed by either of these taxa. The max/min results for the mandible are somewhat more
conservative than those for CV-based comparisons. In all cases, however, the probabilities
indicate that a similar range of size can be sampled from gorillas or orang-utans with
frequencies that do not approach the probability level of 0·05. For comparisons of mandibles
(the element with the greatest sample size), the probabilities achieved with CVs or max/min
ratios are greater than 50% for gorillas and orang-utans. This implies that the size variation in
546
. . 
ET AL.
A. afarensis is consistent with that of single species if extant hominoids are used as comparative
baselines (cf. Kimbel et al., 1994; Aiello, 1994).
That the relative range of size present in A. afarensis is greater than in chimpanzees and
humans warrants the interpretation that sexual dimorphism in A. afarensis is almost certainly
greater than in these taxa (cf. McHenry, 1991). The use of multiple hominoid species for
comparative purposes enables us to bracket the degree of sexual dimorphism present in
A. afarensis between that of strongly dimorphic hominoids (gorillas and orang-utans) and those
that show reduced levels of sexual dimorphism (humans and chimpanzees). This may be an
overly conservative view, for if A. afarensis shows a tendency towards one group or the other,
it is to the more highly dimorphic hominoids.
The estimate of average dimorphism in mandibular size based on the correlation between
the ISD and the CV appears to be a legitimate estimate based on the other lines of evidence
presented here. However, Plavcan (1994) points out that the CV-based method of estimating
the degree of sexual dimorphism may provide an overestimate of sexual dimorphism in fossil
populations that exhibit low levels of sexual dimorphism. This caution is only applicable for
samples whose ISDs range between 1·0 and 1·1, unless intrasexual variation is thought to be
unusually high (Tables 2–6 in Plavcan, 1994). Because all of our results suggest a much higher
level of sexual dimorphism in A. afarensis, it is doubtful that the CV furnishes an overestimate
in this case.
Another issue raised here is that the results vary for different skeletal elements; that is, the
levels of size variation for proximal femora and humeri of A. afarensis are relatively greater for
their sample sizes than is that for the mandible. Although there is no a priori reason to expect
that dimorphism will be expressed identically throughout the skeleton (see, for instance,
McHenry, 1986, 1991; Lovejoy et al., 1989; O’Higgins et al., 1990; Wood et al., 1991), it is
pertinent to establish whether or not the differences convey any biological information. There
are several methodological explanations that may account for the apparent contrast between
mandible size and postcranial size, all based on the understanding that the A. afarensis sample
for each element is only one out of many possible outcomes. For example, the degree of sexual
dimorphism in A. afarensis may be greater than the mandible would suggest, the sampling of
postcranial elements that are highly divergent in size may represent an unusual sampling
event, or the CV could be a more appropriate measure of size variation than the max/min
ratio. The last explantion gains support from work by Cope & Lacy (1992, 1995) and suggests
greater consistency between the mandibular and postcranial results. The range may be low
relative to the CV in highly platykurtic (or bimodal) populations where the largest or smallest
individuals are not being sampled.
On the other hand, there are several biological explanations for why the mandible may be
less sexually dimorphic than the proximal femur or humerus relative to the standards of
extremely dimorphic modern hominoids. The first question is whether the results for
postcrania correspond to body weight dimorphism. The results for the proximal femur and
humerus are correlated because the bones of A.L. 288-1 serve as the lower bound in each case
(A.L. 288-1 is also a member of the mandibular sample but is not the smallest specimen). The
probabilities of sampling gorilla or orang-utan ranges as great as those of A. afarensis are
somewhat less than 50% for either the humerus or proximal femur, which indicates a
reasonable probability that sexual dimorphism in A. afarensis for these elements could be
slightly higher than in any of the extant species. McHenry (1991: p. 30) has proposed that
‘‘strong sexual dimorphism may have characterized forelimb size in A. afarensis, but hindlimb joint size
indicates only moderate levels of body size dimorphism in this species.’’
  
A. AFARENSIS
547
He cautioned that his earlier estimates of a high level of sexual dimorphism in the distal femur
(McHenry, 1986) were overestimates of body size dimorphism, because femoral shaft size in
early hominids is large relative to body weight (Ruff, 1988). Thus, although our results indicate
a similar degree of bone size dimorphism in the humerus and femur, these may not correspond
to body weight dimorphism if McHenry’s hypothesis is correct. Alternatively, if dimorphism in
bone size alone can be taken as a surrogate for body size dimorphism, it is clear that body size
dimorphism in A. afarensis was high.
With regard to the mandible, sexual dimorphism in mandibular size is probably related to
sexual dimorphism in canine size. As canine dimorphism in hominids is reduced relative to
body size dimorphism (Johanson & White, 1979; Kay, 1982; Leutenegger & Shell, 1987;
Fleagle, 1988; Plavcan & van Schaik, 1994), it may be that reduced sexual dimorphism in
mandibular size is a correlate of this reduced canine dimorphism. Distinguishing between this
possibility and the methodological arguments presented above can be facilitated by studying
the pattern of sexual dimorphism in other fossil hominids.
Conclusions
Our results demonstrate that to take into account the full samples available for fossil taxa
provides a more conservative view than is suggested from the results of pairwise comparisons using exact randomization, even when the fossil sample numbers only three to five
specimens. We caution against the use of pairwise comparisons for hypothesis testing with
fossil samples, although such methods retain utility to assess the phenetic affinities between
two specimens.
To account for sample size, bootstrapping offers a simple method of comparing statistics of
size variation from fossil samples to simulated ‘‘fossil samples’’ of modern hominoids. An
application of this procedure to range-based statistics and CVs for A. afarensis suggests with high
probability that A. afarensis was more variable in size than are modern humans and chimpanzees,
and that its degree of sexual dimorphism approached, but did not exceed that of the most
dimorphic modern hominoids. There is little justification for dividing the Hadar sample into two
species based on size alone, assuming that the sexual dimorphism displayed by gorillas and
orang-utans provides an appropriate extreme in living animals by which to judge early
hominids.
From a phylogenetic standpoint, A. afarensis represents the earliest hominid for which
sexual dimorphism has been examined in detail, and it apparently shares with the gorilla a
high degree of sexual dimorphism. For the two most likely interpretations of the evolutionary relationships among African apes and humans (a human/chimp clade and a chimp/
gorilla clade), the common ancestor of all three living species is most parsimoniously
interpreted to have exhibited a high level of sexual dimorphism. It follows that a reduced
level of sexual dimorphism has been derived independently in modern humans and
chimpanzees.
Acknowledgements
We extend our thanks to the Center for Research and Conservation of Cultural Heritage
(Ethiopian Ministry of Information and Culture) and the National Museum of Ethiopia for
permitting two of us (W.H.K., C.A.L.) to study the Hadar remains, and for their cooperation
in that endeavor. Dana Cope, John Fleagle and an anonymous reviewer provided helpful
548
. . 
ET AL.
comments on the manuscript. We are also grateful to Fred Grine and Ozzie Pearson for data
on Zulu humeri, and to the curators at the various institutions whose material is included in
the reference sample of modern taxa.
References
Aiello, L. C. (1994). Variable but singular. Nature 368, 399–400.
Cope, D. A. & Lacy, M. G. (1992). Falsification of a single species hypothesis using the coefficient of variation: a
simulation approach. Am. J. phys. Anthrop. 89, 359–378.
Cope, D. A. & Lacy, M. G. (1995). Comparative application of the coefficient of variation and range-based statistics
for assessing the taxonomic composition of fossil samples. J. hum. Evol. 29, 549–576.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7, 1–26.
Efron, B. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall.
Fleagle, J. G. (1988). Primate Adaptation and Evolution. New York: Academic Press.
Fleagle, J. G., Kay, R. F. & Simons, E. L. (1980). Sexual dimorphism in early anthropoids. Nature 287, 328–330.
Grine, F. E., Jungers, W. L. & Schultz, J. (1996). Phenetic affinities among early Homo crania from East and South
Africa. J. hum. Evol. 30, 189–225.
Johanson, D. C. & White, T. D. (1979). A systematic assessment of early African hominids. Science 202, 312–330.
Johanson, D. C., Lovejoy, C. O., Kimbel, W. H., White, T. D., Ward, S. C., Bush, M. E., Latimer, B. M. & Coppens,
Y. (1982). Morphology of the Pliocene partial hominid skeleton (AL288-1) from the Hadar Formation, Ethiopia.
Am. J. phys. Anthrop. 57, 403–451.
Johanson, D. C., Kimbel, W. H. & Rak, Y. (1994). New Pliocene hominids from the Hadar Formation, Ethiopia. Am.
J. phys. Anthrop. 18 (Suppl.), 116–117.
Josephson, S. C., Juell, K. E. & Rogers, A. R. (1996). Estimating sexual dimorphism by Method-of-Moments. Am. J.
phys. Anthrop. 100, 191–206.
Jungers, W. L., Falsetti, A. B. & Wall, C. E. (1995). Shape, relative size, and size-adjustments in morphometrics. Yearb.
phys. Anthrop. 38, 137–161.
Kay, R. F. (1982). Sexual dimorphism in Ramapithecinae. Proc. Nat. Acad. Sci. U.S.A. 79, 209–212.
Kimbel, W. H., Johanson, D. C. & Rak, Y. (1994). The first skull and other new discoveries of Australopithecus afarensis
at Hadar, Ethiopia. Nature 368, 449–451.
Kramer, A. (1993). Human taxonomic diversity in the Pleistocene: Does Homo erectus represent multiple hominid
species? Am. J. phys. Anthrop. 91, 161–171.
Leutenegger, W. & Shell, B. (1987). Variability and sexual dimorphism in canine size of Australopithecus and extant
hominoids. J. hum. Evol. 16, 359–367.
Lovejoy, C. O., Johanson, D. C. & Coppens, Y. (1982). Hominid lower limb bones recovered from the Hadar
Formation: 1974–1977 collections. Am. J. phys. Anthrop. 57, 679–700.
Lovejoy, C. O., Kern, K. F., Simpson, S. W. & Meindl, R. S. (1989). A new method for estimation of skeletal
dimorphism in fossil samples with an application to Australopithecus afarensis. In (G. Giacobini, Ed.) Hominidae,
pp. 103–108. Milano: Jaka Book.
Manly, B. F. J. (1991). Randomization and Monte Carlo Methods in Biology. New York: Chapman & Hall.
McHenry, H. M. (1986). Size variation in the postcranium of Australopithecus afarensis and extant species of
Hominoidea. J. hum. Evol. 15, 149–156.
McHenry, H. M. (1991). Sexual dimorphism in Australopithecus afarensis. J. hum. Evol. 20, 21–32.
Mosimann, J. E. (1970). Size allometry: size and shape variables with characteristics of the log normal and generalized
gamma distributions. J. Am. Stat. Assoc. 655, 930–945.
O’Higgins, P., Moore, W. J., Johnson, D. R., McAndrew, T. J. & Flinn, R. M. (1990). Patterns of cranial sexual
dimorphism in certain groups of extant hominoids. J. Zool. Lond. 222, 399–420.
Plavcan, J. M. (1994). Comparison of four simple methods of estimating sexual dimorphism in fossils. Am. J. phys.
Anthrop. 94, 465–476.
Plavcan, J. M. & van Schaik, C. P. (1994). Canine dimorphism. Evol. Anthropol. 2, 208–214.
Richmond, B. G. & Jungers, W. L. (1995). Size variation and sexual dimorphism in Australopithecus afarensis and living
hominoids. J. hum. Evol. 29, 229–245.
Ruff, C. (1988). Hindlimb articular surface allometry in Hominoidea and Macaca, with comparisons to diaphyseal
scaling. J. hum. Evol. 17, 687–714.
Sokal, R. R. & Braumann, C. A. (1980). Significance tests for coefficients of variation and variability profiles. Syst. Zool.
34, 449–456.
Sokal, R. R. and Rohlf, F. J. (1995). Biometry. New York: W. H. Freeman and Co.
Wood, B. A., Li, Y. & Willoughby, C. (1991). Intraspecific variation and sexual dimorphism in cranial and dental
variables among higher primates and their bearing on the hominid fossil record. J. Anat. 174, 185–205.