Distance and cosine measures of niche overlap

Social Networks 23 (2001) 141–165
Distance and cosine measures of niche overlap
Min-Woong Sohn∗
Department of Health Studies, The University of Chicago,
5841 S Maryland Avenue MC 2007, Chicago, IL 60637 1470, USA
Abstract
Niche overlap is increasingly used as a way of measuring the intensity of interorganizational
competition. This paper examines and compares various distance and cosine measures of niche
overlap. The analysis in this paper shows that Euclidean distance applied to the raw data as well as
to transformed data is not an appropriate measure of niche overlap. An alternative measure from the
cosine family is discussed and compared to Euclidean distance. Niche overlap theory promises to be
a big leap forward in measuring competition in sociological studies of organizations. But meticulous
attention to the measurement of overlap is needed to fulfill the promise. © 2001 Elsevier Science
B.V. All rights reserved.
Keywords: Niche overlap; Competition; Euclidean distance; Cosine; Asymmetry
1. Introduction
Niche overlap theory provides a way of operationalizing competition in organizational
analysis and thereby an important link between organizational ecology and network analysis (see Burt, 1992; Burt and Talmud, 1993). Organizational ecologists (McPherson, 1983;
Hannan and Freeman, 1989) originally borrowed the concept of niche overlap from evolutionary ecology (MacArthur and Levins, 1967) and applied it to organizational populations.
They have found that the theory has an intuitive application to organizational as well as
biological populations.
The overlap in habitats between two species is directly analogous to the overlap in resource
niches for organizational populations. To the extent that two species rely on the same
kind of food to sustain their populations, to that extent they are competitive with each
other. Organizations rely on their environment for resources to sustain themselves, and
so organizational populations (e.g. wine and beer producers) compete with each other to
∗ Tel.: +1-773-834-1793; fax: 1-773-702-1979.
E-mail address: [email protected] (M.-W. Sohn).
0378-8733/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 8 - 8 7 3 3 ( 0 1 ) 0 0 0 3 9 - 9
142
M.-W. Sohn / Social Networks 23 (2001) 141–165
the extent that they rely on the same customer base or to the extent that the customers
find their products substitutable. Putting it differently, the level of competition between
two populations is proportional to the extent of overlap in their common resource niches 1
(Baum and Haveman, 1997).
Recently, sociologists started to apply niche overlap to other observational units such
as markets or individual organizations (see Burt, 1988; Burt and Carlton, 1989; Baum and
Singh, 1994; Podolny et al., 1996; Baum and Haveman, 1997). Despite its importance and
increasing popularity, relatively little attention has been paid to how niche overlap can be
measured. In this paper, I will compare several existing measures of niche overlap including
alpha coefficient (MacArthur and Levins, 1967), cosine (Pianka, 1973), and Euclidean distance (Burt, 1992; Burt and Talmud, 1993), examine whether they can validly measure what
they intend to measure, and propose a new operationalization based on cosine of the angle.
In examining these measures, I will first establish some criteria for a good measure of niche
overlap and evaluate them against these criteria. I will then use a real-life example from the
hospital industry and examine how validly these measures detect overlap patterns in niches
for hospitals. In the following, I will only consider niche overlap among organizations, but
the same discussion can easily be extended to organizational populations.
Special attention will be paid to Burt’s proposal that niche overlap be conceptualized
as a special case of structural equivalence and that it be measured by Euclidean distance.
Euclidean distance is an extremely popular method of measuring association or similarity
in social science in general and network analysis in particular.
In the following, I will establish conceptual criteria for a good method of measuring niche
overlap and evaluate various methods against them. I will then use data from a convenience
sample of six hospitals in Los Angeles and evaluate how various methods produce measures
of niche overlap consistent with overlap patterns that the data suggest.
2. Criteria for a measure of niche overlap
A niche typically consists of resources that either a species or an organization critically
depends on for survival and sustenance. Hannan and Freeman (1977) provides a useful
definition of niche: “The niche, then, consists of all those combinations of resource levels
at which the population can survive and reproduce itself” (p. 947). For a species, its niche
may consist of a natural habitat from which it collects food. An organization likewise has
resource needs, which it tries to satisfy from its own niches. An organization may require
many different kinds of resources, which it collects from many different resource dimensions
in the environment. I will assume for much of the discussion in the paper that there is only
1 Pianka (1981) and Hannan and Freeman (1989) both believe that niche overlap leads to direct competition
under conditions of resource scarcity and mutual exclusion. However, intentionality and perception of human
and corporate actors may also need to be considered in applying the niche overlap theory to organizations. For
example, firms may intentionally use collusion and other strategic maneuvers to avoid competition. On the other
hand, competitive intensity in the environment may not be fully perceived by decision makers within organizations
due to their cognitive and perceptual limitations (Tenbrunsel et al., 1996; Schwenk, 1988; Starbuck and Milliken,
1988). For these reasons, one must be cautious in inferring direct and deterministic relationship between niche
overlap and competition. The relationship is rather stochastic and indicates competitive potential that may not
directly impede organizational performance but may still affect it by placing constraints on organizations.
M.-W. Sohn / Social Networks 23 (2001) 141–165
143
one kind of resources distributed throughout many resource segments, and come back to
the issue of niche dimensionality in the last section.
Typical firm–niche relationships can be represented by a two-mode rectangular matrix.
The matrix X contains data for n organizations and r resource segments. A resource segment
(or position) for a business firm may consist of an organization, an individual, or a group
of them. Firm i acquires resources from one or more resource segments, which collectively
comprise the niche for the firm, xik represents the amount of resources that firm i takes from
resource segment k. Examples include units of goods shipped, dollar amount received, or,
in the case of hospitals, patients coming from a ZIP code. For business firms, niches consist
of not only buyers of their goods and services but also their suppliers of input resources.
Measuring niche overlap involves reducing this two-mode matrix to a one-mode n × n
square matrix whose cells contain the amount of firm i’s niche overlapped by firm j’s. The
value in each cell is commonly called the competition coefficient (Pianka, 1983; Hannan and
Freeman, 1989). The coefficient should ideally have the following properties (cf. Bonacich,
1972). It is zero if there is no overlap between the two niches and one when there is complete
overlap. It is in ratio scale so that overlap in niches for two firms can be expressed in ratios. It
is important to have a set of coefficients in ratio scale because it would make the cross-dyadic
comparisons more meaningful within a universe of firms. Competition coefficient in ratio
scale would permit one to state that a pair of firms is twice as competitive with each other as
another pair is. Such a coefficient also satisfies all the requirements of a metric (see Laumann,
1973, pp. 217–222). Lastly but most importantly, it must be able to take the differences in
organizational size into account. Organizational size is a crucially important dimension in
competition. Organizations tend to be different in their sizes and have different levels of
resource needs, and they try to grow in size to be more competitive. So neutralizing the effect
of size in measuring niche overlap will make the measure less useful or downright invalid.
There are two problems that arise from the disparity in size between organizations.
Bonacich (1972) discussed the first problem in the context of measuring centrality that is
independent of group size (p. 178). He required that his measure of centrality be independent
of group size because increasing or decreasing group size should not affect a measure if the
overlap between two groups also increases or decreases proportionately with their sizes.
When the niche size 2 and the overlap change proportionately, overlap relative to the niche
size of the firms involved does not change and a valid measure of niche overlap must be
able to detect no change.
The second problem is the mirror image of the first. It is how to explicitly incorporate
size differences into the measure of niche overlap without allowing the size differences to
dominate it. This problem occurs for firms with niches of unequal sizes. Fig. 1C illustrates
this point. The area of overlap occupies a smaller share of the niche for S1 than it does
2 The niche size is determined by the total amount of resources consumed by an organization. It is best illustrated
by the area
under the curves in Fig. 1. More formally, the niche size of a firm is the sum of all resources utilized by
this firm, k xik or ||xx i ||1 . In vector geometry, the niche size can also be indexed by the scalar, ||xxii ||2 . A simplifying
assumption is made in this paper that the niche size is directly proportional to the size of an organization. It is
needed for semantic rather than substantive reasons. In ecological literature, the terms such as niche width, breadth,
or size does not convey the meaning of the level of resource utilization. Organizational size as I use it here (unless
otherwise noted) is a vector concept that represent niche width and resource utilization levels in each niche segment,
which together indicate the total amount of resource utilization.
144
M.-W. Sohn / Social Networks 23 (2001) 141–165
Fig. 1. Possible niche relationships (adapted from Pianka (1983), p. 243).
for S2 . This means that the overlap has less competitive significance to S1 than to S2 , or
that the amount of competition from S2 to S1 is smaller than that from S1 to S2 . This is
the source of asymmetry in competitive intensity generated by a pair of firms towards each
other (McPherson, 1983; Brittain and Wholey, 1988).
One can capture this asymmetry in competition between two organizations with two
numeric quantities such that one value indicates the competitive intensity going from S1 to
S2 and the other going from S2 to S1 . And both problems of size can be solved if one can
express the size of overlap relative to the total niche size of the firms involved. Thus, one
way of building asymmetry into the measure and solving the problem of size at the same
time is to express overlap as the proportion of a focal firm’s niche size that is overlapped
by the other firm (see McPherson, 1983).
In general, there can be four possible patterns of overlap as shown in Fig. 1. There is
no overlap between two firms in the disjunct relationship (A). The amount of competition
between two firms due to niche overlap is zero. The two firms can still exercise competitive
pressure on each other either through the threat of entry or indirectly through a common
competitor.
The equal overlap (B) shows that the niche sizes of two firms are the same and the amount
of overlap in niches between these two has the same competitive effect on both. On the other
hand, the unequal overlap (C) is a pattern already discussed above. It is the most frequently
observed pattern and a good measure of niche overlap must be able to handle this pattern well.
The included pattern (D) is a special case of the unequal overlap where the niche of
the smaller firm is completely included in that of the larger firm. The dotted line shown
in Fig. 1D depicts the possibility that a smaller firm can outcompete a larger one in its
narrow range of resource segments. In the included relationship, the maximum amount of
competition must be limited by the niche size of the smaller firm. For the unequal patterns,
the measure of overlap must be able to discriminate the asymmetric nature of overlap, i.e.
they must be able to build size differences explicitly into the measure of overlap.
These four patterns exhaust possible niche relationships. A valid measure of niche overlap
should be able to discriminate competitive strengths in these four relationships. In summary,
M.-W. Sohn / Social Networks 23 (2001) 141–165
145
Table 1
Beds, total discharges, and number of patient-originating ZIP codes of six hospitals in Los Angeles, 1991a
Name
Bedsb
Total discharges
Patient ZIP codes
Los Angeles Community hospital
California medical center
Queen of Angels Hollywood Presbyterian medical center
Hollywood Community hospital
Cedars-Sinai medical center
Midway hospital medical center
138
308
433
99
922
230
6353
17361
27249
1476
45752
5014
131
178
206
84
393
91
a Data source: California office of statewide healthcare planning and development (OSHPD) patient discharge
abstracts and OSHPD financial disclosure file, 1991.
b Beds are the average available beds and obtained from the OSHPD financial disclosure file. Patient ZIP codes
show the number of ZIP codes that sent at least five patients to these hospitals. Total discharges are the total number
of discharges from these hospitals; patients originated from ZIP codes with less than five patients are excluded
from the total as are patients from outside the state.
the conceptual criteria for evaluating a valid method of measuring niche overlap include:
(1) it must measure the size of overlap relative to the size of niches for firms involved in
order to get around the problems of size; (2) it must be able to sensitively distinguish the
four patterns.
3. An example
As an example of firm–niche relationship, Table 1 presents data for six hospitals that will
be used for illustrative purposes in this paper. The data are extracted from the California
patient discharge abstracts for 1991 from the office of statewide healthcare planning and
development (OSHPD). This database includes nearly four million discharges for the year
from over 500 short-term general hospitals in the state. Among them, a convenience sample
of six hospitals in Los Angeles is selected for analysis so that they can show some meaningful
overlap in niches as well as the effect of size differences. The names of the hospitals and
some of their characteristics are shown in Table 1.
They are within 12 miles of each other and are widely different in their sizes (measured
in average available beds), total number of discharges, and number of ZIP code areas
from which they received their patients. If one takes the patient-originating ZIP code as
a resource segment for a hospital, the patient origin–destination matrix would produce an
example of firm–niche relationship data. The rationale for the use of a ZIP code as a resource
segment for hospitals has been discussed elsewhere (see Sohn, 2001). All told, there are six
hospitals and 402 patient-originating ZIP codes in the data, resulting in 6 × 402 two-mode
matrix. 3
Fig. 2 plots for each pair of hospitals the number of patients who originated from each
ZIP code and went to these hospitals. Each dot in these plots represents a ZIP code. Both
3
ZIP code areas that sent less than five patients to at least one of the hospitals are deleted.
146
M.-W. Sohn / Social Networks 23 (2001) 141–165
M.-W. Sohn / Social Networks 23 (2001) 141–165
147
axes are in the same scale and the maximum is 2000 patients. 4 A dot in the middle of a
plot along the 45◦ line indicates that the corresponding ZIP code sent the same number
of patients to both hospitals, while a dot on either axis indicates that the ZIP code sent all
patients to one of the two hospitals and none to the other. Therefore, a plot in which ZIP
codes center around the 45◦ line indicates strong competition between two hospitals. All
ZIP codes falling neatly on the 45◦ line is a limiting case that indicates the maximum level of
competition; for each patient going to one hospital, there is another patient going to the other
hospital.
On the other hand, a plot where ZIP codes are clustered around both axes indicates that
the ZIP codes sent many more patients to one hospital than the other. This pattern indicates
weak competition between two hospitals. The limiting case in which all ZIP codes fall on
one axis or the other indicates no overlap. In general, the closer a ZIP code is to an axis, the
more “loyal” the ZIP code is to the hospital the axis represents. And the more ZIP codes
there are that are loyal to either of the hospitals, the less the two hospitals are subject to the
competitive pressure from each other.
Notice that three of the four possible niche relationships described in Fig. 1 are found in
these plots. California medical center, Queen of Angels Hollywood Presbyterian medical
center and Cedars-Sinai medical center show patterns of unequal overlap (Fig. 1C) with
one another. Los Angeles Community and Midway approximates the disjunct relationship
(Fig. 1A) in that, except for a small number of ZIP codes, almost all of them fall on either
axis. And the patterns of overlap involving Hollywood Community hospital, the smallest of
the six hospitals in bed size, shows the included pattern (Fig. 1D). All ZIP codes that sent
patients to this hospital sent many more patients to other hospitals in the sample. In terms of
competitive intensity, Hollywood Community receives strong competition from the other
five hospitals, while it generates only a minimal amount of competition towards them. This
case most clearly shows the asymmetric effect of differences in organizational size.
These patterns of niche overlap can be used to evaluate various methods discussed below.
A method is valid to the extent that it indicates small competition between Los Angeles Community and Midway, considerable competition among California medical center, Queen of
Angels, and Cedars-Sinai, and clear asymmetry in competition between Hollywood Community and all the others.
4. Euclidean distance
Euclidean distance measures the straight line distance between two points in a multidimensional space. It is elegant in concept, easy to understand, and simple to compute.
For these reasons, it is by far the most widely used measure of association between two
4 In this paper, I used three modes of representing the firm–niche relationship data. The first is to use an axis to
represent a firm and use a dot to represent a niche segment as in Fig. 2. The second is to turn the first mode inside
out by using an axis to represent a niche segment and a dot a firm (see Figs. 3, 4 and 7). And the third is to use
an axis to represent a niche dimension and another to represent the fitness or the resource utilization level. Figs. 1
and 6 are examples of this mode. Note that a resource segment in these figures (see Fig. 6B) is represented by a
line segment on the relevant axis. This last mode allows me to represent niches and overlap as areas.
148
M.-W. Sohn / Social Networks 23 (2001) 141–165
sets of scores. Given a matrix X as shown above, Euclidean distance is computed as
follows
Dij =
(xik − xjk )2
(1)
k
Burt (1992) and Burt and Talmud (1993) proposed to use Euclidean distance as a measure
of niche overlap. 5 They argued that niche overlap could be conceptualized as a special case
of structural equivalence (Burt and Talmud, 1993, p. 137).
Levins (1968, p. 40) also suggested that the geometric distance might be used to indicate
the competitive intensity between two species represented as two points in a multidimensional hyperspace (see Footnote 4).
Because the mathematical properties of Euclidean distance and how it compares with
other measures of association (especially with correlation) have been thoroughly studied
elsewhere (see Cronbach and Gleser, 1953; Fox, 1982; Faust and Romney, 1985), I will
focus on showing its limitations as a measure of niche overlap.
The first and most obvious difficulty with Euclidean distance is its inability to handle
asymmetry in size between organizations. Even if it can be used to measure niche overlap,
it can be applied only to firms with niches of equal size. The second difficulty comes from
the nature of typical niche overlap data. Niche overlap data (e.g. membership affiliation,
hospital patient flow, food intake from resource niches, dollar amount in input–output tables
among industrial sectors, etc.) are almost always expressed in scales that have a meaningful
zero point (i.e. no food intake from a particular resource segment, no patient admitted from
a ZIP code area, no money transfers between two sectors, etc.). These data are said to be in
ratio scale and are different from those in interval scale such as temperature or geographic
coordinates in that the latter examples cannot be expressed in ratios (e.g. 80◦ is not twice as
warm as 40◦ ). When one applies Euclidean distance on a typical raw firm–niche relationship
data, 6 the resulting overlap measure is heavily influenced by size differences between firms.
Fig. 3 illustrates this point.
Fig. 3A shows three pairs of firms (A and B, C and D, and E and F) and their use of two
resources X and Y. Suppose the distance between these pairs in the two-dimensional space
is exactly the same. If one used the Euclidean distance to measure the niche overlap among
the three pairs of firms, one would obtain the same overlap for these three pairs of firms.
The three pairs in Fig. 3A, however, have very different overlap patterns in that A and B
have larger overlap than C and D, and that E’s niche is completely included in F’s (i.e. E
takes fewer resources from both segments than F does). And so a good measure must be
able to distinguish these three different patterns of overlap. On the other hand, the pairs A
and B, and C and D in Fig. 3B are equal in their niche sizes and the overlap in niches is
proportionately the same in both cases. So given Bonacich’s requirement of independence
of size, an appropriate measure of overlap is expected to produce the same result. But
5 They recommend that the raw data be transformed before Euclidean distance is applied. Much of the discussion
in this section therefore, does not apply to their method. Below, I will discuss the issue of data transformation and
what it exactly accomplishes.
6 The term “raw” here means “untransformed”. Sometimes the data may be collected in such a way that the data
as they are originally collected are on a transformed scale. They are not “raw” data as I use the term in this paper.
M.-W. Sohn / Social Networks 23 (2001) 141–165
149
Fig. 3. Distance and cosine.
as the difference in lengths between the two pairs clearly show, Euclidean distance errs
in indicating different amount of overlap for these two pairs. This demonstrates why the
geometric distance between two points (representing two firms) cannot be used to indicate
niche overlap between them.
It is due to the well-known property of Euclidean distance: it is sensitive to any numeric
transformation. When two positions are located on a completely relative world with no
origin (e.g. (0, 0) point on a two-dimensional space), the same distance would mean the
same thing no matter where the points are located on the plane. However, when the size
of the resource niche matters, the same distance for pairs of firms as in Fig. 3A means
completely different things depending on where the pair are located relative to the origin.
5. Cosine of the angle
Cosine of the angle was first used as a measure of niche overlap by Pianka (1973). It is
computed by
xik xjk
Cij∗ = cos θ = k (2)
2
2
x
x
k ik k jk
Cij∗ varies between 0 and 1, it gets smaller as θ gets larger from 0 to ␲/2. In Fig. 3B, the two
pairs of firms A and B, and C and D, have the same cosine values because they are both
separated by the same angle θ . On the other hand, all three pairs of firms in Fig. 3A have
different cosine values.
The cosine produces similarity measures in the sense that the cosine value gets larger
as the two vectors become more nearly parallel in the r-dimensional space (they become
150
M.-W. Sohn / Social Networks 23 (2001) 141–165
more similar or the overlap becomes larger). The cosine is only invariant to multiplicative transformation, and it is “a many-to-one transformation which effectively ignores the
relative magnitudes between the vectors” (Anderberg, 1973, p. 73). Pearson correlation
is a special case of cosine; correlation can be obtained by applying Eq. (2) to centered
data (i.e. data from which vector means are subtracted). Cosine is a ratio measure, and,
like Euclidean distance, produces symmetric measure of overlap, that is to say, Cij∗ =
Cji∗ .
Cosine solves one source of the problem of size, namely, size indifference for niches of
equal size. However, it ignores size differences for pairs A and D, and B and C, as well as
for A and B, and C and D in Fig. 3B. Cosine will produce the same measure of overlap
for all these pairs, whereas Euclidean distance is too sensitive to differences in niche sizes.
In this sense, both are unsatisfactory in dealing with the problem of size and thereby the
asymmetry in competitive intensity.
6. Data transformation
There are many ways raw data can be transformed before any algorithm for detecting
overlap is applied. The first common way of transforming the data might be to remove the
individual mean to make the data centered about an individual (or a firm). Another way is to
transform the raw data into z-scores by removing both the mean and the standard deviation
from each vector. Cronbach and Gleser (1953) and Faust and Romney (1985) considered
these modes of data transformation. I will instead focus on two modes of data transformation
that are particularly relevant to measuring niche overlap. These will be called proportional
and marginal transformations, respectively.
Proportional transformation
(PT) is achieved by dividing vector entries by the vector
margin total (xik / k xik or x i /||xx i ||1 ). Let us define an n × r matrix P that is obtained by
dividing each element of X by appropriate row margin totals. Each value pik then indicates
the proportion of firm i’s total resources taken from the kth resource segment.
It is a widely used method of data transformation. The alpha coefficient (discussed below)
that early evolutionary ecologists used as a measure of competitive intensity between two
species is based on proportionally transformed data, pik in this case refers to the relative
utilization of resource k by species i (Levins, 1968). Griffith (1972) called the similarly
transformed data the commitment index in his study of hospital markets in the sense that
it indicates how “committed” each market segment is to a hospital. Burt (1988) used the
proportionally transformed data in his earlier study of the structure of American markets.
This is one way of neutralizing the size effect, because this transformation guarantees that
the row margin sums up to unity and all firms in the market then have the same row margin
totals.
The second method is obtained by dividing xik by the maximum value on the ith row
(xxi /||xxi ||∞ ). Burt and Carlton (1989) argued that, for some data, more meaningful comparisons within the data could be made if they are transformed to indicate the strength
of relationship relative to the strongest relationship. Burt and Bittner (1981) applied this
mode of transformation to their analysis of ham’s cognitive relations data. Burt and Carlton
(1989) analyzed the Department of Commerce’s input–output tables using MT and obtained
M.-W. Sohn / Social Networks 23 (2001) 141–165
151
Fig. 4. Marginal and proportional transformation on a two-dimensional space.
a different market structure from Burt’s earlier analysis (1988) that was based on PT. 7 Burt
(1992) and Burt and Talmud (1993) recommend MT be used in measuring niche overlap.
Fig. 4 provides a graph that geometrically shows some mathematical properties of these
transformations. This graph represents a simple hypothetical situation where there are only
two firms, A and B, and two resource segments, X and Y. A point in this graph (xi , yi )
represents firm i that utilizes xi amount of resource X and yi amount of resource Y. Note
that, in this graph, all points on a straight line that passes through the origin are a set of
values that are proportionally equivalent. The ratio of resource utilization between the two
resource segments is the same regardless of the total amount of resource utilization.
The points Am and Ap in this figure refer to the points to which point A or all other proportionally equivalent points will be mapped if it is marginally and proportionally transformed,
respectively. 8 By applying Euclidean distance to the transformed data, one measures the
straight line distance between Am and Bm (marginally transformed data) or Ap and Bp (pro7
These two studies are based on two different data sets and used two different multidimensional scaling algorithms to spatially represent the relationships among markets. However, Burt and Carlton (1989) report that these
differences are not the source of differences in the final “topology maps” in the two studies. Applying proportional
transformation (PT) to the data used in 1989 paper, Burt and Carlton (1989) could obtain a multidimensional
scaling map that is “just like” the map reported in Burt (1988). See Burt and Carlton (1989, p. 751) for further
discussion.
8 The proof is straightforward. All values in a vector will add up to unity in PT; the line that passes one in X and
one in Y is a line whose points always add up to unity. As an illustration, Ap is the only point that is proportionally
equivalent to A and whose X and Y values add up to unity. Therefore, A maps to Ap and B to Bp when proportionally
transformed. On the other hand, when marginally transformed, one of the values in a (xi , yi ) pair will be one (i.e.
it means that the point will always lie on either upper or right edge of a square), and the other value will be the
slope that indicates how much the line will be removed from the X- or Y-axis at xi = 1 or yi = 1. So, MT maps
A and B to their proportionally equivalent points, Am and Bm . When both points are located below or above the
45◦ line, they are mapped to the same edge.
152
M.-W. Sohn / Social Networks 23 (2001) 141–165
portionally transformed data). 9 The distance between two points for both proportionally
and marginally transformed data varies between
√ zero when two points are on the same line
(i.e. they are proportionally equivalent) and 2 when they are located at (1, 0) and (0, 1).
These transformations have the following mathematical properties.
First, both transformations effectively ignore the size differences in the raw scale. No
matter where points A and B are located in the plane they will be either mapped to the
diagonal line or the upper or right edge of a unit square. For both transformations, what
determines where the point in the raw scale maps to is not the total amount of resource
utilization but the proportionate use of resources. Put differently, it is not how far a point
is away from the origin (total amount of resource utilization) but how far it is away from
either axis (proportionate use of resources) that determines where the transformed data
map. Therefore, what is actually left in the transformed data after both PT and MT is the
angular separation between two points (i.e. θ in Figs. 3 and 4).
Second, the distance measured between two points on the transformed data (both proportional and marginal) does not linearly increase as the angular separation formed by them
increases. This poses a problem of how the distance may be interpreted. This point is illustrated by Fig. 5A. Suppose that two points on the arc in Fig. 4 are chosen, the first point is
fixed at (1, 0), and the second point changes by 5◦ from the first until it reaches 90◦ away
from the first. For each change in angular separation between two points, their proportionally equivalent points are calculated by a proper transformation, the distance between them
for both transformations is measured and plotted against the angular separation in Fig. 5A.
The line with an inflection point in the middle in Fig. 5A is the distance between two
marginally transformed points. For MT, the rate of change in distance increases at first, but,
once the second point passes the 45◦ line, the rate of change flattens out at first and then
slowly increases again. This is because until the second point is separated by 45◦ from the
first, both points are mapped to the same edge and the distance is measured on that line.
However, once it passes the 45◦ line, they are mapped to different edges and the distance
between them is not measured on the same line any more (see Footnote 9). This shows that
Euclidean distance on MT does not scale as well as the angular separation, nor does it form
a ratio scale.
On the other hand, PT seems to work much better, although its rate of increase is not
uniform throughout. The straight line represents
the angular separation between two points
√
in radians scaled to vary between zero and 2. Since, as mentioned above, the angular separation is what is left in the transformed data, any deviation from this line represents error
in measurement. Euclidean distance applied to marginally transformed data always over9 Another mode of transformation is to divide the row vector by its two-norm (x
x /||xx ||2 ). This transformation maps
A and B to Ac and Bc in Fig. 5. A valid way of measuring the distance between Ac and Bc is to use the length
of the arc rather than that of the straight line, because the arc constitutes the space that all points are mapped
to in this mode of transformation. The length of the arc is the angle θ in radians. By the same token, a valid
measure of distance for the marginally transformed data is not a straight line distance between two points Am and
Bm as in Fig. 5; it is rather the distance between these points along the edges. Forcing Euclidean distance on the
marginally transformed data is like producing the straight line distance between two geographic points when the
travel distance is called for. Euclidean distance applied to the MT is not a valid measure of distance any more than
the straight line distance is a valid measure of travel distance. This does not mean that the distance measured along
the edges will produce a valid measure of association between two points; that is an entirely different matter.
M.-W. Sohn / Social Networks 23 (2001) 141–165
153
Fig. 5. Measurement error in distance measures owing to proportional and√marginal transformations (the straight
line in (A) represents the angle in radians scaled to vary between 0 and 2. The straight line in (B) shows the
angle in radians).
estimates the true extent of angular separation, while distance applied to the proportionally
transformed data sometimes under- and sometimes overestimates it.
Third, both transformations do not always produce identical distance between two points
with the same angular separation. Suppose two points are chosen on the arc in Fig. 4 such
that they are separated by 15◦ . Their locations are varied so that the lower point starts from
(1, 0) and moves away from it by 5◦ . Fig. 5B shows how the two points with the same angle
154
M.-W. Sohn / Social Networks 23 (2001) 141–165
in different locations produce different distances for both transformations. The values on
the X-axis show the location of the second point in terms of how much it is separated from
the X-axis in degrees. The PT shows a U-shape; the distance is the greatest when two points
are closest to either axis, while the distance is the smallest when they are closest to the 45◦
line.
The marginal transformation again shows an inconsistent pattern. In Fig. 5B, the distance
is the smallest when two points are closest to either axis; it is the largest when one point
is located at the vertex (1, 1). When the two points are located on two different edges,
the distance between them is smaller than another set of two points that are only 15◦ away
from them. This clearly shows that Euclidean distance applied to either transformation does
not produce a consistent measure. Two points that are separated by the same angle does
not necessarily have the same meaning when Euclidean distance is used as a method of
measuring association between them. In summary, given the problems discussed above,
Euclidean distance applied to either the PT or the MT is not a reliable measure of niche
overlap.
7. Cosine family measures of niche overlap
I will now introduce three asymmetric measures of niche overlap that belong to what I
would call “the cosine family”. These measures share the same operations in measuring
the extent of overlap that are easier to illustrate than to explain (see Fig. 7). The earliest
formula in this family was called the alpha coefficient (MacArthur and Levins, 1967) and
was based on the proportionally transformed data:
k pik pjk
αij = (3)
2
k pik
where α ij indicates the amount of competition species i receives from species j. Because
the denominators are different, α ij and α j i have different values and are asymmetric. Many
ecologists subsequently noted several problems with this formula (Colwell and Futuyama,
1971; see Krebs, 1989). For organizations such as hospitals which differ widely in size,
the amount of resources organizations require from their environment varies widely as well
and the difference in size is one of the most important sources of asymmetry. By
using the
proportions, however, the alpha coefficient sets the margin totals to one (e.g. k pik = 1)
and ignores the difference in the raw margin totals (i.e. total resource utilization). It is not
a true asymmetric measure as Fig. 5B graphically illustrates. For example, in a two-firm,
two-niche hypothetical example as shown in Fig. 7, the alpha coefficient is computed as the
ratio in the length of two line segments: that is, αba = (OA
p /OBp ). Note that a firm whose
point is located closer to either axis will always have a larger coefficient regardless of its
total resource use.
An alternative is to use the raw numbers rather than proportions as in
k xik xjk
Aij = (4)
2
k xik
M.-W. Sohn / Social Networks 23 (2001) 141–165
155
Fig. 6. Niche overlap in continuous and discrete environments.
Aij varies from zero to infinity. Aij can be directly estimated by regressing xik on xj k with
no intercept (i.e. Aij = β). One big problem with Aij is that a large xj k relative to xik
can completely dominate the amount of competition being measured (i.e. Aij > 1 when
xik < xjk ). This formula does not meet the requirement that the measure of niche overlap
be limited by the total niche size of the focal firm.
To capture the overlap correctly, one can start with continuous resource niches. The gray
area in Fig. 6A indicates the overlap
by integrating the lower of the
x and can be calculated
∞
two curves, i.e. min(f(x), g(x)): 0 1 g(x) dx + x1 f (x) dx. If we assume that f(x) is the
function for the resource utilization pattern of hospital i and g(x) the function for j, then
x1
Cij =
0
∞
∞
g(x) dx + x1 f (x) dx
min(f (x), g(x)) dx
∞
∞
= 0
0 f (x) dx
0 f (x) dx
156
M.-W. Sohn / Social Networks 23 (2001) 141–165
The same graph for discrete niche states is shown in Fig. 6B. The general formula for Cij
for the discrete niche segments is obtained by
Cij =
k wik min(xik , xjk )
k wik xik
(5)
The weight, wik , is necessary if the discrete niche segments are not evenly divided (see Colwell and Futuyama, 1971). The weight (wik ) indicates the relative abundance of resources
in each discrete resource segment (i.e. total amount of resources available from each segment), or it alternatively indicates how significant a segment is to a firm. When all the niche
segments are the same, wik cancels out and the resulting formula is essentially the same
as the one for the continuous niche positions given above. However, if the niche segments
are different in their widths or resource endowment, the overlap in each niche segment has
different significance and the same overlap in different niche segments may have different
competitive implications for the firms involved.
If the weight is chosen such that wik = pik (i.e. the proportion of firm i’s total resource
utilization that comes from k’s niche segment), Eq. (5) becomes
Cij =
k xik min(xik , xjk )
2
k xik
(6)
When the resource segments are the same, the weight in Eq. (5) cancels out. When the
weights are different, however, only the denominator in the weight cancels out and Eq. (6)
is obtained. When xik > xjk for all k, Eq. (6) is identical to Eq. (4) and has values between 0
and 1, not inclusive. When xik < xjk (i.e. when the niche of i is included in that of j), Eq. (6)
reduces to one. The only difference between Eqs. (4) and (6) is that xj k is substituted for
min(xik , xj k ) so that Cij is restricted to lie between 0 and 1. Fig. 7 graphically shows this:
Cij = (OA
/OB), where OA
= OB if OA
> OB. Cij = Aij whenever OA
is less than or
equal to OB. However, OA
will be set to OB whenever the former is larger than the latter.
Therefore, Cij will always be less than or equal to Aij .
The numerator in Eq. (6) then provides the total area of overlap and the denominator
the total resource niche of firm i. The competition coefficient (Cij ) varies between 0 and
1, with one indicating the complete overlap and zero no overlap. Eq. (6) is a measure that
satisfies all requirements of a good niche overlap as discussed above. The competition
coefficient is interpreted as the proportion of the resource niche of a firm overlapped by a
competitor; a competition coefficient of 0.5 means an overlap of 50% of a firm’s resource
niche. Notice however, that Eq. (6) is obtained when one uses pik as the weight in Eq. (5);
different weights can also be used if one so prefers or if there is a theoretical reason to
do so. 10
10 The p is used in calculating alpha coefficient (Eq. (3)) and in a method of computing market concentration in
ik
the hospital industry proposed by Zwanziger et al. (1990). Other weights can be considered such as the perceived
importance of a resource segment from a focal firm’s point of view (e.g. Brooks, 1995) or the proportion of total
revenue that comes from a resource segment.
M.-W. Sohn / Social Networks 23 (2001) 141–165
157
Fig. 7. Cosine family measures of overlap (note: αba = OA
p /OBp ; cos θ = OA
c /OBc ; Aba = OA
/OB;
Cba = OA
/OB; where OA
= OB if OA
> OB.
8. Comparison of methods
Both symmetric and asymmetric measures of niche overlap are applied to the sample of
six hospitals for comparison. Table 2 shows descriptive statistics of various niche overlap
measures for 30 pairs of hospitals obtained from the following methods: Euclidean distance
using Eq. (1) applied to the untransformed data (ED/Raw), Euclidean distance applied to
the proportionally transformed data (ED/PT), Euclidean distance applied to the marginally
Table 2
Descriptive statistics of various niche overlap measures
Variable name
n
Mean
S.D.
Minimum
Maximum
ED/Rawa
30
30
30
30
30
30
30
4159.117
0.209
2.188
0.382
0.398
1.592
0.322
1766.539
0.043
0.497
0.203
0.258
3.486
0.323
1078.663
0.117
1.390
0.073
0.061
0.019
0.013
6359.657
0.296
2.964
0.777
1.266
14.792
1.000
ED/PT
ED/MT
Cosine
Alpha
A
C
a ED/Raw is obtained by applying Euclidean distance algorithm (Eq. (1)) to the untransformed data, ED/PT
from Eq. (1) applied to the proportionally transformed data, and ED/MT from Eq. (1) applied to the marginally
transformed data. Cosine is obtained by Eq. (2) applied to the untransformed data. Alpha is obtained by Eq. (3)
(using the proportionally transformed data). Both A and C are obtained by applying Eqs. (4) and (5) to the
untransformed data, respectively.
158
M.-W. Sohn / Social Networks 23 (2001) 141–165
Table 3
Correlation coefficients among niche overlap measures for six selected Los Angeles hospitals (n = 30)
ED/Rawa
(1)
(2) ED/PT
(3) ED/MT
(4) Cosine
(5) Alpha
(6) A
(7) C
a
1
2
3
4
5
6
7
1.0000
−0.5090
0.0027
0.1775
0.1680
0.1415
0.1019
1.0000
0.6271
−0.8439
−0.6999
−0.4727
−0.3696
1.0000
−0.8866
−0.7528
−0.3625
−0.4282
1.0000
0.8416
0.4234
0.4592
1.0000
0.2227
0.1604
1.0000
0.7470
1.0000
See the footnote to Table 2 for the meaning of row labels.
transformed data (ED/MT), cosine of the angle using Eq. (2) applied to the untransformed
data (cosine), alpha coefficient using Eq. (3) (alpha), asymmetric measure using Eq. (4)
applied to the untransformed data (A), and competition coefficient using Eq. (6) applied to
the untransformed data (C). The first three are distance measures, while the last four are
measures in the cosine family. Euclidean distance measures dissimilarity in the sense that
a smaller value indicates larger overlap, while all algorithms in the cosine family (cosine,
alpha, A, and C) measure similarity and indicate larger overlap with a larger value.
Table 3 shows correlations among seven measures. Since Euclidean distance is a dissimilarity measure and cosine is a similarity measure, a negative correlation between distanceand cosine-based measures indicates consistent results. Three patterns stand out. First, the
correlations between ED/Raw and other methods are low and ED/Raw and ED/PT in particular show a negative and significant correlation. As we have seen above, this pattern of
correlations between ED/Raw and other distance-based methods is not surprising and is
evidence that ED/Raw is unduly influenced by differences in size.
Second, the four measures, ED/PT, ED/MT, cosine, and alpha, show strong correlations
among themselves, indicating that they measure something similar. Cosine has the strongest
correlations with the other three (over 0.8), while ED/PT and ED/MT show the weakest
correlation with each other of all possible pairs. Given Fig. 4 and related discussions above,
one can infer from this that cosine is the preferred method of computing the symmetric
measure of niche overlap. Euclidean distance on either PT or MT creates measurement
error that makes them less appropriate (see Section 6 above). The pattern of correlations
further suggests that alpha is much closer to a symmetric than to an asymmetric measure
such as A or C. The asymmetry that alpha captures does not come from differences in size
but from the angle at which the line connecting the origin and the location of the firm
intersects the X-axis. Again, since alpha is not a true asymmetric measure of overlap, cosine
may be preferred to alpha (see Pianka, 1975; May, 1975).
Third, C correlates strongly with A (r = 0.75), but either A or C does not correlate
above 0.5 with any other method. Especially notable is the weak correlation between C and
alpha. With the absence of a gold standard with which these methods can be compared, the
correlational analysis can only allow one to make some conjectures about these patterns
of similarity and difference among the methods. In order to better understand how these
methods work, one can however, visually inspect the overlap patterns shown in Fig. 2
M.-W. Sohn / Social Networks 23 (2001) 141–165
159
and compare it to various indices of niche overlap shown in Table 3. As discussed above,
Fig. 2 shows patterns of (1) no or minimal overlap between Los Angeles Community and
Midway (Fig. 1A); (2) market dominance between Hollywood Community and the others
(Fig. 1D), and, finally; (3) unequal overlap among California medical center, Queen of
Angels Hollywood Presbyterian, and Cedars-Sinai (Fig. 1C).
The figures in Table 4 are calculated using the seven methods discussed above and are
all converted into a common scale so that they have consistent meanings with the value
of one indicating the maximum competition among all pairs of hospitals. The first four
are symmetric measures and have identical values in corresponding cells along the main
diagonal. For the two true measures of asymmetry, Aij or Cij indicates the amount of
hospital i’s niche overlapped by hospital j’s, where i indicates the hospital on the row and j
one on the column in Table 4.
8.1. No or minimal overlap
The scatter plots in Fig. 2 suggest that ZIP codes that sent patients to either Los Angeles
Community or Midway did not generally send them to the other hospital. There are a small
number of ZIP codes that did but the magnitude of boundary-crossing was small. So a
good method should produce an overlap measure that is small or close to zero for this pair
of hospitals. ED/Raw indicates considerably high (0.890) and ED/MT moderate (0.155)
overlap between them, while all other methods consistently indicate low or no overlap.
On the other hand, C61 is 0.051 and C16 is 0.019; both indicate low levels of overlap as
expected.
8.2. Patterns of market dominance
Cedars-Sinai is the largest among the six hospitals in the data set, while Hollywood
Community is the smallest. Therefore, a pattern of market dominance or the included
relationship is expected of these hospitals. ED/Raw indicates moderate overlap, while the
other three symmetric measures indicate a high degree of overlap. For the asymmetric
measures, Hollywood Community generates almost no competition towards Cedars-Sinai
judging from the minimal overlap (A54 = 0.000 and C54 = 0.009), while the latter generates
extremely strong competition towards the first (A45 = 0.849 and C45 = 0.981). Notice that
Hollywood Community according to alpha is a stronger competitor than Cedars-Sinai. It
shows that all the other symmetric measures as well as alpha cannot accurately capture this
pattern of overlap. They consistently indicate moderate to strong competition between the
two (except for Cedars-Sinai and Hollywood, where ED/Raw finds low competition), and
this is what is expected of symmetric measures that ignore asymmetry.
8.3. Unequal overlap
Three largest hospitals among the six show unequal overlap patterns. They are California
medical center, Queen of Angels, and Cedars-Sinai. Unequal overlap here means a pattern
that falls somewhere between dominance and equal overlap. California medical center and
160
M.-W. Sohn / Social Networks 23 (2001) 141–165
Table 4
Various indices of niche overlap fox six hospitals in Los Angeles (n = 30)a
1
(1) Los Angeles Community hospital
2
3
4
5
6
0.5619
0.3214
0.3334
0.2979
0.1860
0.0523
0.2714
0.2947
0.3425
0.0000
0.2020
0.1072
0.0555
0.3540
0.9718
0.3532
0.2061
0.2137
0.1122
0.0018
0.0298
0.0358
0.3738
0.0055
0.0250
0.0001
0.0286
0.2185
0.8897
0.0000
0.1554
0.0000
0.0207
0.0034
0.0189
0.4301
0.6765
0.6378
0.7112
0.3666
0.0528
0.5806
0.5527
0.4661
0.6501
0.4105
0.2107
0.0005
0.0126
0.0237
0.4829
0.4794
0.2716
0.0964
0.0299
0.3424
0.5865
0.2933
0.7775
0.4397
0.3217
0.0074
0.1032
0.3112
1.0000
0.9318
1.0000
0.5894
0.0015
0.0278
0.0000
0.6536
0.2907
0.4039
0.1762
0.0290
0.3390
0.3402
0.3744
0.4562
0.4750
0.4015
0.0054
0.0806
0.0636
0.7918
0.6909
0.6444
0.2862
0.8487
0.9806
1.0000
0.4539
0.7811
0.5813
0.4884
0.1478
0.7600
(2) California medical center
0.5619
0.3214
0.3334
0.2979
0.1812
0.0056
0.0721
(3) Queen of Angels Hollywood
Presbyterian medical center
0.2947
0.3425
0.0000
0.2020
0.1504
0.0025
0.0301
0.4301
0.6765
0.6378
0.7112
0.4918
0.0266
0.2977
(4) Hollywood Community hospital center
0.9718
0.3532
0.2061
0.2137
0.1596
0.0724
0.2701
0.5527
0.4661
0.6501
0.4105
0.2839
0.3332
0.6999
0.3112
1.0000
0.9318
1.0000
0.5984
1.0000
0.9780
(5) Cedars-Sinai medical center
0.0358
0.3738
0.0055
0.0250
0.0595
0.0000
0.0000
0.0237
0.4829
0.4794
0.2716
0.2752
0.0090
0.0895
0.0000
0.6536
0.2907
0.4039
0.3360
0.0180
0.1647
0.0636
0.7918
0.6909
0.6444
0.5155
0.0002
0.0088
(6) Midway hospital medical center
0.8897
0.0000
0.1554
0.0000
0.0000
0.0040
0.0509
0.5865
0.2933
0.7775
0.4397
0.2194
0.0761
0.5726
0.3402
0.3744
0.4562
0.4750
0.2016
0.1136
0.8157
1.0000
0.4539
0.7811
0.5813
0.2459
0.0059
0.0876
a
0.1886
0.7266
1.0000
0.9257
1.0000
0.0081
0.1271
0.1886
0.7266
1.0000
0.9257
0.2933
0.2547
1.0000
Seven values in the cells represent indices of niche overlap calculated using the following methods: (1)
ED/Raw; (2) ED/PT; (3) ED/MT; (4) cosine; (5) alpha; (6) A; (7) C. All the values are scaled to be in the same
metric: the pair with the minimum level of overlap has the value of 0 and the pair with the maximum overlap
has 1.
M.-W. Sohn / Social Networks 23 (2001) 141–165
161
Queen of Angels show considerable overlap according to C (C23 = 0.581 and C32 =
0.300), but A indicates only a moderate to low overlap. This overlap pattern suggests that
the latter is the stronger competitor of the two. On the other hand, Cedars-Sinai proves
to be a much stronger competitor vis-à-vis either California medical center or Queen of
Angels. The amount of competition generated by Cedars-Sinai towards these two hospitals
is considerable (C25 = 0.342 and C35 = 0.339), while the amount it receives from them is
low (C52 = 0.090 and C53 = 0.165). This larger than expected asymmetry comes from a
number of ZIP codes that sent patients exclusively to Cedars-Sinai. Note that many of the
ZIP codes that sent patients to either California medical center or Queen of Angels also
sent substantial number of patients to Cedars-Sinai. Again the symmetric measures as well
as alpha cannot correctly capture the asymmetry for the unequal patterns of overlap, and
A does not produce coefficients that are consistent with the patterns of overlap in Fig. 2 in
some cases.
In summary, these comparisons show that the symmetric measures and alpha cannot
handle the patterns of dominance and unequal overlap adequately and that A in some cases
suggests overlap that is not consistent with the visual pattern in the plots. On the other hand,
C is consistent with the patterns in Fig. 2. These comparisons do not provide conclusive
evidence to accept C as the best measure of niche overlap. However, the visual comparisons help eliminate the first five methods (three distance-based methods plus cosine and
alpha) as inadequate measures of niche overlap because of their inability to handle some
overlap patterns (e.g. unequal overlap and included patterns) and their inconsistency with
the patterns shown in the plots. A on the other hand has its own problem as discussed above
and it sometimes suggests overlap not consistent with the actual patterns. Thus, by way of
elimination, C can be accepted as the best and most valid among the alternatives considered
in this paper.
9. Discussion and conclusion
I have so far reviewed and compared several proposed methods of measuring niche
overlap. Among them, a special attention was paid to whether Euclidean distance could
be used as an appropriate measure of niche overlap. Mathematical properties as well as
potential measurement error resulting from applying Euclidean distance to the raw and
transformed data were examined.
I conclude that Euclidean distance is not an appropriate method for measuring niche
overlap for the following three reasons. First, Euclidean distance on a raw (untransformed)
data set is not a good method of measuring niche overlap, because it is overly sensitive to
size differences between organizations.
Second, data transformation is not a solution to the problem with Euclidean distance
just discussed. Both PT and MT ignore differences in organizational size, and whatever
is measured from the transformed data is the angular separation rather than the extent
of overlap in niches. Moreover, Euclidean distance applied to the transformed data lacks
consistency and comparability within the measure itself.
Third, a more fundamental problem with Euclidean distance is that it is a symmetric
measure. No matter what transformation Euclidean distance is applied to, it only captures
162
M.-W. Sohn / Social Networks 23 (2001) 141–165
the symmetric aspect of niche overlap. However, when there is a large disparity in size
among organizations and in their resource needs, their competitive relationship is necessarily asymmetric and directional. A method that detects asymmetry in the competitive
relationship is thus preferable, and Euclidean distance does not qualify as one. In general,
Euclidean distance is not the right method of measuring association between two vectors
in a ratio scale.
How much difference does the choice of one method over another make in measuring
niche overlap? Burt (1988) and Burt and Carlton (1989) provide a good illustration: they
applied Euclidean distance to similar data using respectively PT and MT and obtained
substantially different results. Burt and Carlton (1989) claimed that ED/MT is “the more
useful measure for sociological studies of market boundaries for organizational analysis because they more clearly reveal variation in the resource-flow patterns that define structurally
equivalent (substitutable) production activities as a market” (p. 749). But the analysis in
this paper suggests that the different results obtained in these two papers may derive more
from methodological artifacts than from any meaningful difference in substance. These two
papers highlight the different results obtained by two different data transformation methods.
Faust and Romney (1985) earlier showed how “profound” differences applying the ED/MT
or the Pearson correlation to the same data could make.
It was shown that, among the measures considered in this paper, C is the only adequate
measure of overlap that can also correctly detect asymmetric and directional nature of
overlap between two firms. In applying the niche overlap theory for the purpose of obtaining
competitive intensity between organizations in general and in using C in particular, one
must bear in mind that the measure is only as good as how well the resource segments
are identified for the organizations under study. Two issues need to be emphasized when
resource segments are defined in order to detect overlap between two organizations using
C or other methods of computing overlap.
First, all critical resource segments need to be identified and included in the computation. Some resources may be important for organizational survival but may generate little
competitive pressure for the organizations involved. These would be resources over which
an organization enjoys structural autonomy (Burt, 1992). They need not be included, but
others with competitive implications must all be included in order for the niche overlap to
accurately reflect the true competitive intensity among organizations.
Second, niches need to be broken down into as homogeneous segments as possible.
Competition is multidimensional and many factors contribute to the competitive intensity
between organizations in complicated ways. A method of niche overlap must then be able
to combine multiple factors into a single summary measure. When resource segments are
broadly defined, it is possible that these segments contain qualitatively disparate resources
and organizations might utilize them in different proportions. The resulting measure of
overlap then might not correctly reflect the way resource niches overlap. This is an issue of
niche dimensionality: two species that occupy the same niche may not necessarily compete
with each other if they tap into different dimensions in their niches (Pianka, 1983). For
example, for two species that feed on the same kind of trees, one species may collect food
only from the lower part of the trees and the other from the higher part. Another example
from the hospital industry: two hospitals that draw patients from the same area may provide
services to two different populations, one to children and the other to adult patients. Cij can
M.-W. Sohn / Social Networks 23 (2001) 141–165
be extended to multidimensional niches as follows
Cij =
w ∗ Cijk
163
(7)
k
where w∗ is the weight that indicates the relative importance of each dimension to firm i
and Cij k the competition coefficient (Cij ) computed for each resource dimension k.
As an illustration, Sohn (2001) divided patients from each ZIP code into 49 groups
according to their diagnoses to make sure that patients within each segment are as similar
to one another as possible from the service provider’s point of view. He applied Eq. (6)
separately to these 49 groups of patients, obtained a matrix of competition coefficients for
each patient group, and combined them into a single summary matrix by using weighted
averages. The weights were the proportions of total number of admissions that were in these
service groups.
There are several other methods not considered in this paper. One notable method was
proposed by McPherson (1983). This in fact was the first sociological application of the niche
overlap theory to organizational analysis and is still used in many studies (McPherson and
Smith-Lovin, 1988; Popielarz and McPherson, 1995; McPherson and Rotolo, 1996). It relies
on identifying the niche breadth on each niche dimension and computing the “rectangular”
area of overlap by multiplying breadths of all relevant dimensions. 11 This method is similar
to the method proposed in this paper in that they can both detect asymmetry and their
coefficients vary between 0 and 1.
However, this method ignores resource utilization levels at each resource position and
only uses information about how widely an organizational population collects its resources
from. One consequence of such an omission is that generalists with broader niches are
always found to be stronger competitor than specialists with narrower niches (McPherson,
1983, p. 526) and that it cannot correctly detect a pattern described by the dotted line in
Fig. 1D (i.e. a specialist outcompeting a generalist in its own narrow range). In addition, the
resource dimensions like age, education, and occupation must be correlated and deriving
the area of overlap multiplicatively would more than likely overestimate the actual overlap
(see McPherson, 1983, p. 529). One can get around both problems by using Cij on data
stratified by occupational groups, sex, and educational attainment (e.g. each stratum would
comprise a n × r matrix with n being the number of organizational types and r the number
of age groups). So, for example, if there were nine occupational groups, two sexes, and four
educational levels, there would be 48 (6 × 2 × 4) separate Cij k ’s to be computed and to be
combined into a single index by Eq. (7).
The competition coefficient is the cornerstone of dynamically modeling the growth and
decline of organizations and in describing the equilibrium state where many organizations
can coexist (Levins, 1968; McPherson, 1983). The competition coefficient as discussed in
this paper assumes a static environment. However, it can be used in a longitudinal analysis
if one observes the competitive intensity among firms in a period in time and examines
how competition affects organizational performance or effectiveness in the next period (see
Sohn et al., 1999).
11 Niche breadth for the age dimension, for example, is defined as the ±1.5 standard deviation around the mean
age of people.
164
M.-W. Sohn / Social Networks 23 (2001) 141–165
The new economic sociology takes as its starting point the realization that economic action
is structurally embedded (Granovetter, 1985). Most studies have so far looked at the effect
of positive or cooperative ties (formal and informal) that affect organizational performance
and decision-making (e.g. Uzzi, 1996, 1997). However, there is a dearth of studies that focus
on the structural embeddedness of organizations in negative or competitive ties. Much of it
can be attributed to the lack of a good theoretical framework that allows a more sociological
conceptualization of competition and a robust method of measuring it. Niche overlap theory
can fill the void by allowing competition to be defined and measured in a more rigorous
way than previously possible. When the niches are fully specified, the resulting overlap
measure can accurately indicate the intensity of competition an organization faces from its
neighbors. Currently, the availability of fine firm–niche relationship data seems to hamper
a wider acceptance of the theory, but this situation should improve as time goes on. The
theory promises to be a big step forward in both network analysis of organizations and
economic sociology. However, before we pick the fruits of this theory, we need to pay more
attention to the measurement issues involved in computing niche overlap.
Acknowledgements
I thank Willard Manning, Thomas D’Aunno, Ronald Thisted, Edward Laumann, and Phil
Schumm, and Ted Karrison for their helpful comments on earlier drafts of this paper.
References
Anderberg, M.R., 1973. Cluster Analysis for Applications. Academic Press, New York.
Baum, J.A.C., Singh, J.V., 1994. Organizational niches and the dynamics of organizational mortality. American
Journal of Sociology 100, 346–380.
Baum, J.A.C., Haveman, H.A., 1997. Love thy neighbor: differentiation and agglomeration in the Manhattan hotel
industry, 1898–1990. Administrative Science Quarterly 42, 304–338.
Bonacich, P., 1972. Technique for analyzing overlapping memberships. In: Costner, H.L. (Ed.), Sociological
Methodology. Jossey-Bass, San Francisco, pp. 176–185.
Brittain, J.W., Wholey, D.R., 1988. Competition and coexistence in organizational communities: population
dynamics in electronic components manufacturing. In: Glenn, R.C. (Ed.), Ecological Models of Organizations.
Ballinger Publishing Company, Cambridge, MA, pp. 195–222.
Brooks, G.R., 1995. Defining market boundaries. Strategic Management Journal 16, 535–549.
Burt, R.S., 1988. The stability of American markets. American Journal of Sociology 93, 356–395.
Burt, R.S., 1992. Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge,
MA.
Burt, R.S., Talmud, I., 1993. Market niche. Social Networks 15, 133–149.
Burt, R.S., Bittner, W.M., 1981. A note on inferences regarding network subgroups. Social Networks 3, 71–88.
Burt, R.S., Carlton, D.S., 1989. Another look at the network boundaries of American markets. American Journal
of Sociology 94, 723–753.
Colwell, R.K., Futuyama, D.J., 1971. On the measurement of niche breadth and overlap. Ecology 52, 567–576.
Cronbach, L.J., Gleser, G.C., 1953. Assessing similarity between profiles. Psychological Bulletin 50, 456–473.
Faust, K., Romney, A.K., 1985. Does STRUCTURE find structure? A critique of Burt’s use of distance as a
measure of structural equivalence. Social Networks 7, 77–103.
Fox, J., 1982. Selective aspects of measuring resemblance for taxonomy. In: Hudson, H.C. (Ed.), Classifying
Social Data. Jossey-Bass, San Francisco, pp. 127–151.
M.-W. Sohn / Social Networks 23 (2001) 141–165
165
Granovetter, M., 1985. Economic action and social structure: the problem of embeddedness. American Journal of
Sociology 91, 481–510.
Griffith, J.R., 1972. Quantitative Techniques for Hospital Planning and Control. Lexington Books, Lexington,
MA.
Hannan, M.T., Freeman, J., 1977. The population ecology of organizations. American Journal of Sociology 82,
929–964.
Hannan, M.T., Freeman, J., 1989. Organizational Ecology. Harvard University Press, Cambridge, MA.
Krebs, C.J., 1989. Ecological Methodology. Harper and Row, New York.
Laumann, E.O., 1973. Bonds of Pluralism: The Form and Substance of Urban Social Networks. Wiley, New York.
Levins, R., 1968. Evolution in Changing Environments. Princeton University Press, Princeton, NJ.
McPherson, J.M., 1983. An ecology of affiliation. American Sociological Review 48, 519–535.
McPherson J.M., Smith-Lovin, L., 1988. A comparative ecology of five nations. In: Glenn, R.C. (Ed.), Ecological
Models of Organizations. Ballinger Publishing Company, Cambridge, MA, pp. 85–109.
McPherson, J.M., Rotolo, T., 1996. Testing a dynamic model of social composition: diversity and change in
voluntary groups. American Sociological Review 61, 179–202.
MacArthur, R.H., Levins, R., 1967. The limiting similarity, convergence, and divergence of coexisting species.
American Naturalist 101, 377–385.
May, R.M., 1975. Some note on estimating the competition matrix, α. Ecology 56, 737–741.
Pianka, E.R., 1973. The structure of lizard communities. Annual Review of Ecology and Systematics 4, 53–74.
Pianka, E.R., 1975. Niche relations of desert lizards. In: Martin, L.C., Jared, M.D. (Eds.), Ecology and Evolution
of Communities. The Belknap Press of Harvard University, Cambridge, MA, pp. 292–314.
Pianka, E.R., 1981. Competition and Niche Theory. In: Robert, M.M. (Ed.), Theoretical Ecology, 2nd Edition.
Blackwell, Oxford, pp. 167–196.
Pianka, E.R., 1983. Evolutionary Ecology, 2nd Edition. Harper and Row, New York.
Podolny, J.M., Stuart, T.E., Hannan, M.T., 1996. Networks, knowledge, and niches: competition in the worldwide
semiconductor industry, 1984–1991. American Journal of Sociology 102, 659–689.
Popielarz, P.A., McPherson, J.M., 1995. On the edge or in between: niche position, niche overlap, and the duration
of voluntary association memberships. American Journal of Sociology 101, 628–720.
Schwenk, C.R., 1988. The cognitive perspective on strategic decision making. Journal of Management Studies 25,
41–55.
Sohn, M.-W., 2001. Relational approach to measuring competition among hospitals. Health Services Research, in
press.
Sohn, M.-W., Manheim, L.M., Pearce, W.F., 1999. Market competition among hospitals and the diffusion of
medical technology: the case of PTCA. In: Paper Presented at the Association for Health Services Researchers
Meeting, Chicago, 1999.
Starbuck, W.H., Milliken, F.J., 1988. Executives’ perceptual filters: what they notice and how they make sense.
In: Donald, C.H. (Ed.), The Executive Effect: Concepts and Methods for Studying Top Managers. JAI Press,
Greenwich, CT, pp. 35–65.
Tenbrunsel, A.E., Galvin, T.L., Neale, M.A., Bazerman, M.H., 1996. Cognitions in organizations. In: Stewart,
R.C., Cynthia, H., Walter, R.N. (Eds.), Handbook of Organization Studies. Sage Publications, London.
Uzzi, B., 1996. The embeddedness and economic performance: the network effect. American Sociological Review
61, 674–698.
Uzzi, B., 1997. Social structure and competition in interfirm networks: the paradox of embeddedness.
Administrative Science Quarterly 42, 35–67.
Zwanziger, J., Glenn, A.M., Mann. J.M., 1990. Measures of hospital market structure: a review of the alternatives
and a proposed approach. Socio-Economic Planning Science, 24, 81–95.