Social Networks 23 (2001) 141–165 Distance and cosine measures of niche overlap Min-Woong Sohn∗ Department of Health Studies, The University of Chicago, 5841 S Maryland Avenue MC 2007, Chicago, IL 60637 1470, USA Abstract Niche overlap is increasingly used as a way of measuring the intensity of interorganizational competition. This paper examines and compares various distance and cosine measures of niche overlap. The analysis in this paper shows that Euclidean distance applied to the raw data as well as to transformed data is not an appropriate measure of niche overlap. An alternative measure from the cosine family is discussed and compared to Euclidean distance. Niche overlap theory promises to be a big leap forward in measuring competition in sociological studies of organizations. But meticulous attention to the measurement of overlap is needed to fulfill the promise. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Niche overlap; Competition; Euclidean distance; Cosine; Asymmetry 1. Introduction Niche overlap theory provides a way of operationalizing competition in organizational analysis and thereby an important link between organizational ecology and network analysis (see Burt, 1992; Burt and Talmud, 1993). Organizational ecologists (McPherson, 1983; Hannan and Freeman, 1989) originally borrowed the concept of niche overlap from evolutionary ecology (MacArthur and Levins, 1967) and applied it to organizational populations. They have found that the theory has an intuitive application to organizational as well as biological populations. The overlap in habitats between two species is directly analogous to the overlap in resource niches for organizational populations. To the extent that two species rely on the same kind of food to sustain their populations, to that extent they are competitive with each other. Organizations rely on their environment for resources to sustain themselves, and so organizational populations (e.g. wine and beer producers) compete with each other to ∗ Tel.: +1-773-834-1793; fax: 1-773-702-1979. E-mail address: [email protected] (M.-W. Sohn). 0378-8733/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 8 - 8 7 3 3 ( 0 1 ) 0 0 0 3 9 - 9 142 M.-W. Sohn / Social Networks 23 (2001) 141–165 the extent that they rely on the same customer base or to the extent that the customers find their products substitutable. Putting it differently, the level of competition between two populations is proportional to the extent of overlap in their common resource niches 1 (Baum and Haveman, 1997). Recently, sociologists started to apply niche overlap to other observational units such as markets or individual organizations (see Burt, 1988; Burt and Carlton, 1989; Baum and Singh, 1994; Podolny et al., 1996; Baum and Haveman, 1997). Despite its importance and increasing popularity, relatively little attention has been paid to how niche overlap can be measured. In this paper, I will compare several existing measures of niche overlap including alpha coefficient (MacArthur and Levins, 1967), cosine (Pianka, 1973), and Euclidean distance (Burt, 1992; Burt and Talmud, 1993), examine whether they can validly measure what they intend to measure, and propose a new operationalization based on cosine of the angle. In examining these measures, I will first establish some criteria for a good measure of niche overlap and evaluate them against these criteria. I will then use a real-life example from the hospital industry and examine how validly these measures detect overlap patterns in niches for hospitals. In the following, I will only consider niche overlap among organizations, but the same discussion can easily be extended to organizational populations. Special attention will be paid to Burt’s proposal that niche overlap be conceptualized as a special case of structural equivalence and that it be measured by Euclidean distance. Euclidean distance is an extremely popular method of measuring association or similarity in social science in general and network analysis in particular. In the following, I will establish conceptual criteria for a good method of measuring niche overlap and evaluate various methods against them. I will then use data from a convenience sample of six hospitals in Los Angeles and evaluate how various methods produce measures of niche overlap consistent with overlap patterns that the data suggest. 2. Criteria for a measure of niche overlap A niche typically consists of resources that either a species or an organization critically depends on for survival and sustenance. Hannan and Freeman (1977) provides a useful definition of niche: “The niche, then, consists of all those combinations of resource levels at which the population can survive and reproduce itself” (p. 947). For a species, its niche may consist of a natural habitat from which it collects food. An organization likewise has resource needs, which it tries to satisfy from its own niches. An organization may require many different kinds of resources, which it collects from many different resource dimensions in the environment. I will assume for much of the discussion in the paper that there is only 1 Pianka (1981) and Hannan and Freeman (1989) both believe that niche overlap leads to direct competition under conditions of resource scarcity and mutual exclusion. However, intentionality and perception of human and corporate actors may also need to be considered in applying the niche overlap theory to organizations. For example, firms may intentionally use collusion and other strategic maneuvers to avoid competition. On the other hand, competitive intensity in the environment may not be fully perceived by decision makers within organizations due to their cognitive and perceptual limitations (Tenbrunsel et al., 1996; Schwenk, 1988; Starbuck and Milliken, 1988). For these reasons, one must be cautious in inferring direct and deterministic relationship between niche overlap and competition. The relationship is rather stochastic and indicates competitive potential that may not directly impede organizational performance but may still affect it by placing constraints on organizations. M.-W. Sohn / Social Networks 23 (2001) 141–165 143 one kind of resources distributed throughout many resource segments, and come back to the issue of niche dimensionality in the last section. Typical firm–niche relationships can be represented by a two-mode rectangular matrix. The matrix X contains data for n organizations and r resource segments. A resource segment (or position) for a business firm may consist of an organization, an individual, or a group of them. Firm i acquires resources from one or more resource segments, which collectively comprise the niche for the firm, xik represents the amount of resources that firm i takes from resource segment k. Examples include units of goods shipped, dollar amount received, or, in the case of hospitals, patients coming from a ZIP code. For business firms, niches consist of not only buyers of their goods and services but also their suppliers of input resources. Measuring niche overlap involves reducing this two-mode matrix to a one-mode n × n square matrix whose cells contain the amount of firm i’s niche overlapped by firm j’s. The value in each cell is commonly called the competition coefficient (Pianka, 1983; Hannan and Freeman, 1989). The coefficient should ideally have the following properties (cf. Bonacich, 1972). It is zero if there is no overlap between the two niches and one when there is complete overlap. It is in ratio scale so that overlap in niches for two firms can be expressed in ratios. It is important to have a set of coefficients in ratio scale because it would make the cross-dyadic comparisons more meaningful within a universe of firms. Competition coefficient in ratio scale would permit one to state that a pair of firms is twice as competitive with each other as another pair is. Such a coefficient also satisfies all the requirements of a metric (see Laumann, 1973, pp. 217–222). Lastly but most importantly, it must be able to take the differences in organizational size into account. Organizational size is a crucially important dimension in competition. Organizations tend to be different in their sizes and have different levels of resource needs, and they try to grow in size to be more competitive. So neutralizing the effect of size in measuring niche overlap will make the measure less useful or downright invalid. There are two problems that arise from the disparity in size between organizations. Bonacich (1972) discussed the first problem in the context of measuring centrality that is independent of group size (p. 178). He required that his measure of centrality be independent of group size because increasing or decreasing group size should not affect a measure if the overlap between two groups also increases or decreases proportionately with their sizes. When the niche size 2 and the overlap change proportionately, overlap relative to the niche size of the firms involved does not change and a valid measure of niche overlap must be able to detect no change. The second problem is the mirror image of the first. It is how to explicitly incorporate size differences into the measure of niche overlap without allowing the size differences to dominate it. This problem occurs for firms with niches of unequal sizes. Fig. 1C illustrates this point. The area of overlap occupies a smaller share of the niche for S1 than it does 2 The niche size is determined by the total amount of resources consumed by an organization. It is best illustrated by the area under the curves in Fig. 1. More formally, the niche size of a firm is the sum of all resources utilized by this firm, k xik or ||xx i ||1 . In vector geometry, the niche size can also be indexed by the scalar, ||xxii ||2 . A simplifying assumption is made in this paper that the niche size is directly proportional to the size of an organization. It is needed for semantic rather than substantive reasons. In ecological literature, the terms such as niche width, breadth, or size does not convey the meaning of the level of resource utilization. Organizational size as I use it here (unless otherwise noted) is a vector concept that represent niche width and resource utilization levels in each niche segment, which together indicate the total amount of resource utilization. 144 M.-W. Sohn / Social Networks 23 (2001) 141–165 Fig. 1. Possible niche relationships (adapted from Pianka (1983), p. 243). for S2 . This means that the overlap has less competitive significance to S1 than to S2 , or that the amount of competition from S2 to S1 is smaller than that from S1 to S2 . This is the source of asymmetry in competitive intensity generated by a pair of firms towards each other (McPherson, 1983; Brittain and Wholey, 1988). One can capture this asymmetry in competition between two organizations with two numeric quantities such that one value indicates the competitive intensity going from S1 to S2 and the other going from S2 to S1 . And both problems of size can be solved if one can express the size of overlap relative to the total niche size of the firms involved. Thus, one way of building asymmetry into the measure and solving the problem of size at the same time is to express overlap as the proportion of a focal firm’s niche size that is overlapped by the other firm (see McPherson, 1983). In general, there can be four possible patterns of overlap as shown in Fig. 1. There is no overlap between two firms in the disjunct relationship (A). The amount of competition between two firms due to niche overlap is zero. The two firms can still exercise competitive pressure on each other either through the threat of entry or indirectly through a common competitor. The equal overlap (B) shows that the niche sizes of two firms are the same and the amount of overlap in niches between these two has the same competitive effect on both. On the other hand, the unequal overlap (C) is a pattern already discussed above. It is the most frequently observed pattern and a good measure of niche overlap must be able to handle this pattern well. The included pattern (D) is a special case of the unequal overlap where the niche of the smaller firm is completely included in that of the larger firm. The dotted line shown in Fig. 1D depicts the possibility that a smaller firm can outcompete a larger one in its narrow range of resource segments. In the included relationship, the maximum amount of competition must be limited by the niche size of the smaller firm. For the unequal patterns, the measure of overlap must be able to discriminate the asymmetric nature of overlap, i.e. they must be able to build size differences explicitly into the measure of overlap. These four patterns exhaust possible niche relationships. A valid measure of niche overlap should be able to discriminate competitive strengths in these four relationships. In summary, M.-W. Sohn / Social Networks 23 (2001) 141–165 145 Table 1 Beds, total discharges, and number of patient-originating ZIP codes of six hospitals in Los Angeles, 1991a Name Bedsb Total discharges Patient ZIP codes Los Angeles Community hospital California medical center Queen of Angels Hollywood Presbyterian medical center Hollywood Community hospital Cedars-Sinai medical center Midway hospital medical center 138 308 433 99 922 230 6353 17361 27249 1476 45752 5014 131 178 206 84 393 91 a Data source: California office of statewide healthcare planning and development (OSHPD) patient discharge abstracts and OSHPD financial disclosure file, 1991. b Beds are the average available beds and obtained from the OSHPD financial disclosure file. Patient ZIP codes show the number of ZIP codes that sent at least five patients to these hospitals. Total discharges are the total number of discharges from these hospitals; patients originated from ZIP codes with less than five patients are excluded from the total as are patients from outside the state. the conceptual criteria for evaluating a valid method of measuring niche overlap include: (1) it must measure the size of overlap relative to the size of niches for firms involved in order to get around the problems of size; (2) it must be able to sensitively distinguish the four patterns. 3. An example As an example of firm–niche relationship, Table 1 presents data for six hospitals that will be used for illustrative purposes in this paper. The data are extracted from the California patient discharge abstracts for 1991 from the office of statewide healthcare planning and development (OSHPD). This database includes nearly four million discharges for the year from over 500 short-term general hospitals in the state. Among them, a convenience sample of six hospitals in Los Angeles is selected for analysis so that they can show some meaningful overlap in niches as well as the effect of size differences. The names of the hospitals and some of their characteristics are shown in Table 1. They are within 12 miles of each other and are widely different in their sizes (measured in average available beds), total number of discharges, and number of ZIP code areas from which they received their patients. If one takes the patient-originating ZIP code as a resource segment for a hospital, the patient origin–destination matrix would produce an example of firm–niche relationship data. The rationale for the use of a ZIP code as a resource segment for hospitals has been discussed elsewhere (see Sohn, 2001). All told, there are six hospitals and 402 patient-originating ZIP codes in the data, resulting in 6 × 402 two-mode matrix. 3 Fig. 2 plots for each pair of hospitals the number of patients who originated from each ZIP code and went to these hospitals. Each dot in these plots represents a ZIP code. Both 3 ZIP code areas that sent less than five patients to at least one of the hospitals are deleted. 146 M.-W. Sohn / Social Networks 23 (2001) 141–165 M.-W. Sohn / Social Networks 23 (2001) 141–165 147 axes are in the same scale and the maximum is 2000 patients. 4 A dot in the middle of a plot along the 45◦ line indicates that the corresponding ZIP code sent the same number of patients to both hospitals, while a dot on either axis indicates that the ZIP code sent all patients to one of the two hospitals and none to the other. Therefore, a plot in which ZIP codes center around the 45◦ line indicates strong competition between two hospitals. All ZIP codes falling neatly on the 45◦ line is a limiting case that indicates the maximum level of competition; for each patient going to one hospital, there is another patient going to the other hospital. On the other hand, a plot where ZIP codes are clustered around both axes indicates that the ZIP codes sent many more patients to one hospital than the other. This pattern indicates weak competition between two hospitals. The limiting case in which all ZIP codes fall on one axis or the other indicates no overlap. In general, the closer a ZIP code is to an axis, the more “loyal” the ZIP code is to the hospital the axis represents. And the more ZIP codes there are that are loyal to either of the hospitals, the less the two hospitals are subject to the competitive pressure from each other. Notice that three of the four possible niche relationships described in Fig. 1 are found in these plots. California medical center, Queen of Angels Hollywood Presbyterian medical center and Cedars-Sinai medical center show patterns of unequal overlap (Fig. 1C) with one another. Los Angeles Community and Midway approximates the disjunct relationship (Fig. 1A) in that, except for a small number of ZIP codes, almost all of them fall on either axis. And the patterns of overlap involving Hollywood Community hospital, the smallest of the six hospitals in bed size, shows the included pattern (Fig. 1D). All ZIP codes that sent patients to this hospital sent many more patients to other hospitals in the sample. In terms of competitive intensity, Hollywood Community receives strong competition from the other five hospitals, while it generates only a minimal amount of competition towards them. This case most clearly shows the asymmetric effect of differences in organizational size. These patterns of niche overlap can be used to evaluate various methods discussed below. A method is valid to the extent that it indicates small competition between Los Angeles Community and Midway, considerable competition among California medical center, Queen of Angels, and Cedars-Sinai, and clear asymmetry in competition between Hollywood Community and all the others. 4. Euclidean distance Euclidean distance measures the straight line distance between two points in a multidimensional space. It is elegant in concept, easy to understand, and simple to compute. For these reasons, it is by far the most widely used measure of association between two 4 In this paper, I used three modes of representing the firm–niche relationship data. The first is to use an axis to represent a firm and use a dot to represent a niche segment as in Fig. 2. The second is to turn the first mode inside out by using an axis to represent a niche segment and a dot a firm (see Figs. 3, 4 and 7). And the third is to use an axis to represent a niche dimension and another to represent the fitness or the resource utilization level. Figs. 1 and 6 are examples of this mode. Note that a resource segment in these figures (see Fig. 6B) is represented by a line segment on the relevant axis. This last mode allows me to represent niches and overlap as areas. 148 M.-W. Sohn / Social Networks 23 (2001) 141–165 sets of scores. Given a matrix X as shown above, Euclidean distance is computed as follows Dij = (xik − xjk )2 (1) k Burt (1992) and Burt and Talmud (1993) proposed to use Euclidean distance as a measure of niche overlap. 5 They argued that niche overlap could be conceptualized as a special case of structural equivalence (Burt and Talmud, 1993, p. 137). Levins (1968, p. 40) also suggested that the geometric distance might be used to indicate the competitive intensity between two species represented as two points in a multidimensional hyperspace (see Footnote 4). Because the mathematical properties of Euclidean distance and how it compares with other measures of association (especially with correlation) have been thoroughly studied elsewhere (see Cronbach and Gleser, 1953; Fox, 1982; Faust and Romney, 1985), I will focus on showing its limitations as a measure of niche overlap. The first and most obvious difficulty with Euclidean distance is its inability to handle asymmetry in size between organizations. Even if it can be used to measure niche overlap, it can be applied only to firms with niches of equal size. The second difficulty comes from the nature of typical niche overlap data. Niche overlap data (e.g. membership affiliation, hospital patient flow, food intake from resource niches, dollar amount in input–output tables among industrial sectors, etc.) are almost always expressed in scales that have a meaningful zero point (i.e. no food intake from a particular resource segment, no patient admitted from a ZIP code area, no money transfers between two sectors, etc.). These data are said to be in ratio scale and are different from those in interval scale such as temperature or geographic coordinates in that the latter examples cannot be expressed in ratios (e.g. 80◦ is not twice as warm as 40◦ ). When one applies Euclidean distance on a typical raw firm–niche relationship data, 6 the resulting overlap measure is heavily influenced by size differences between firms. Fig. 3 illustrates this point. Fig. 3A shows three pairs of firms (A and B, C and D, and E and F) and their use of two resources X and Y. Suppose the distance between these pairs in the two-dimensional space is exactly the same. If one used the Euclidean distance to measure the niche overlap among the three pairs of firms, one would obtain the same overlap for these three pairs of firms. The three pairs in Fig. 3A, however, have very different overlap patterns in that A and B have larger overlap than C and D, and that E’s niche is completely included in F’s (i.e. E takes fewer resources from both segments than F does). And so a good measure must be able to distinguish these three different patterns of overlap. On the other hand, the pairs A and B, and C and D in Fig. 3B are equal in their niche sizes and the overlap in niches is proportionately the same in both cases. So given Bonacich’s requirement of independence of size, an appropriate measure of overlap is expected to produce the same result. But 5 They recommend that the raw data be transformed before Euclidean distance is applied. Much of the discussion in this section therefore, does not apply to their method. Below, I will discuss the issue of data transformation and what it exactly accomplishes. 6 The term “raw” here means “untransformed”. Sometimes the data may be collected in such a way that the data as they are originally collected are on a transformed scale. They are not “raw” data as I use the term in this paper. M.-W. Sohn / Social Networks 23 (2001) 141–165 149 Fig. 3. Distance and cosine. as the difference in lengths between the two pairs clearly show, Euclidean distance errs in indicating different amount of overlap for these two pairs. This demonstrates why the geometric distance between two points (representing two firms) cannot be used to indicate niche overlap between them. It is due to the well-known property of Euclidean distance: it is sensitive to any numeric transformation. When two positions are located on a completely relative world with no origin (e.g. (0, 0) point on a two-dimensional space), the same distance would mean the same thing no matter where the points are located on the plane. However, when the size of the resource niche matters, the same distance for pairs of firms as in Fig. 3A means completely different things depending on where the pair are located relative to the origin. 5. Cosine of the angle Cosine of the angle was first used as a measure of niche overlap by Pianka (1973). It is computed by xik xjk Cij∗ = cos θ = k (2) 2 2 x x k ik k jk Cij∗ varies between 0 and 1, it gets smaller as θ gets larger from 0 to /2. In Fig. 3B, the two pairs of firms A and B, and C and D, have the same cosine values because they are both separated by the same angle θ . On the other hand, all three pairs of firms in Fig. 3A have different cosine values. The cosine produces similarity measures in the sense that the cosine value gets larger as the two vectors become more nearly parallel in the r-dimensional space (they become 150 M.-W. Sohn / Social Networks 23 (2001) 141–165 more similar or the overlap becomes larger). The cosine is only invariant to multiplicative transformation, and it is “a many-to-one transformation which effectively ignores the relative magnitudes between the vectors” (Anderberg, 1973, p. 73). Pearson correlation is a special case of cosine; correlation can be obtained by applying Eq. (2) to centered data (i.e. data from which vector means are subtracted). Cosine is a ratio measure, and, like Euclidean distance, produces symmetric measure of overlap, that is to say, Cij∗ = Cji∗ . Cosine solves one source of the problem of size, namely, size indifference for niches of equal size. However, it ignores size differences for pairs A and D, and B and C, as well as for A and B, and C and D in Fig. 3B. Cosine will produce the same measure of overlap for all these pairs, whereas Euclidean distance is too sensitive to differences in niche sizes. In this sense, both are unsatisfactory in dealing with the problem of size and thereby the asymmetry in competitive intensity. 6. Data transformation There are many ways raw data can be transformed before any algorithm for detecting overlap is applied. The first common way of transforming the data might be to remove the individual mean to make the data centered about an individual (or a firm). Another way is to transform the raw data into z-scores by removing both the mean and the standard deviation from each vector. Cronbach and Gleser (1953) and Faust and Romney (1985) considered these modes of data transformation. I will instead focus on two modes of data transformation that are particularly relevant to measuring niche overlap. These will be called proportional and marginal transformations, respectively. Proportional transformation (PT) is achieved by dividing vector entries by the vector margin total (xik / k xik or x i /||xx i ||1 ). Let us define an n × r matrix P that is obtained by dividing each element of X by appropriate row margin totals. Each value pik then indicates the proportion of firm i’s total resources taken from the kth resource segment. It is a widely used method of data transformation. The alpha coefficient (discussed below) that early evolutionary ecologists used as a measure of competitive intensity between two species is based on proportionally transformed data, pik in this case refers to the relative utilization of resource k by species i (Levins, 1968). Griffith (1972) called the similarly transformed data the commitment index in his study of hospital markets in the sense that it indicates how “committed” each market segment is to a hospital. Burt (1988) used the proportionally transformed data in his earlier study of the structure of American markets. This is one way of neutralizing the size effect, because this transformation guarantees that the row margin sums up to unity and all firms in the market then have the same row margin totals. The second method is obtained by dividing xik by the maximum value on the ith row (xxi /||xxi ||∞ ). Burt and Carlton (1989) argued that, for some data, more meaningful comparisons within the data could be made if they are transformed to indicate the strength of relationship relative to the strongest relationship. Burt and Bittner (1981) applied this mode of transformation to their analysis of ham’s cognitive relations data. Burt and Carlton (1989) analyzed the Department of Commerce’s input–output tables using MT and obtained M.-W. Sohn / Social Networks 23 (2001) 141–165 151 Fig. 4. Marginal and proportional transformation on a two-dimensional space. a different market structure from Burt’s earlier analysis (1988) that was based on PT. 7 Burt (1992) and Burt and Talmud (1993) recommend MT be used in measuring niche overlap. Fig. 4 provides a graph that geometrically shows some mathematical properties of these transformations. This graph represents a simple hypothetical situation where there are only two firms, A and B, and two resource segments, X and Y. A point in this graph (xi , yi ) represents firm i that utilizes xi amount of resource X and yi amount of resource Y. Note that, in this graph, all points on a straight line that passes through the origin are a set of values that are proportionally equivalent. The ratio of resource utilization between the two resource segments is the same regardless of the total amount of resource utilization. The points Am and Ap in this figure refer to the points to which point A or all other proportionally equivalent points will be mapped if it is marginally and proportionally transformed, respectively. 8 By applying Euclidean distance to the transformed data, one measures the straight line distance between Am and Bm (marginally transformed data) or Ap and Bp (pro7 These two studies are based on two different data sets and used two different multidimensional scaling algorithms to spatially represent the relationships among markets. However, Burt and Carlton (1989) report that these differences are not the source of differences in the final “topology maps” in the two studies. Applying proportional transformation (PT) to the data used in 1989 paper, Burt and Carlton (1989) could obtain a multidimensional scaling map that is “just like” the map reported in Burt (1988). See Burt and Carlton (1989, p. 751) for further discussion. 8 The proof is straightforward. All values in a vector will add up to unity in PT; the line that passes one in X and one in Y is a line whose points always add up to unity. As an illustration, Ap is the only point that is proportionally equivalent to A and whose X and Y values add up to unity. Therefore, A maps to Ap and B to Bp when proportionally transformed. On the other hand, when marginally transformed, one of the values in a (xi , yi ) pair will be one (i.e. it means that the point will always lie on either upper or right edge of a square), and the other value will be the slope that indicates how much the line will be removed from the X- or Y-axis at xi = 1 or yi = 1. So, MT maps A and B to their proportionally equivalent points, Am and Bm . When both points are located below or above the 45◦ line, they are mapped to the same edge. 152 M.-W. Sohn / Social Networks 23 (2001) 141–165 portionally transformed data). 9 The distance between two points for both proportionally and marginally transformed data varies between √ zero when two points are on the same line (i.e. they are proportionally equivalent) and 2 when they are located at (1, 0) and (0, 1). These transformations have the following mathematical properties. First, both transformations effectively ignore the size differences in the raw scale. No matter where points A and B are located in the plane they will be either mapped to the diagonal line or the upper or right edge of a unit square. For both transformations, what determines where the point in the raw scale maps to is not the total amount of resource utilization but the proportionate use of resources. Put differently, it is not how far a point is away from the origin (total amount of resource utilization) but how far it is away from either axis (proportionate use of resources) that determines where the transformed data map. Therefore, what is actually left in the transformed data after both PT and MT is the angular separation between two points (i.e. θ in Figs. 3 and 4). Second, the distance measured between two points on the transformed data (both proportional and marginal) does not linearly increase as the angular separation formed by them increases. This poses a problem of how the distance may be interpreted. This point is illustrated by Fig. 5A. Suppose that two points on the arc in Fig. 4 are chosen, the first point is fixed at (1, 0), and the second point changes by 5◦ from the first until it reaches 90◦ away from the first. For each change in angular separation between two points, their proportionally equivalent points are calculated by a proper transformation, the distance between them for both transformations is measured and plotted against the angular separation in Fig. 5A. The line with an inflection point in the middle in Fig. 5A is the distance between two marginally transformed points. For MT, the rate of change in distance increases at first, but, once the second point passes the 45◦ line, the rate of change flattens out at first and then slowly increases again. This is because until the second point is separated by 45◦ from the first, both points are mapped to the same edge and the distance is measured on that line. However, once it passes the 45◦ line, they are mapped to different edges and the distance between them is not measured on the same line any more (see Footnote 9). This shows that Euclidean distance on MT does not scale as well as the angular separation, nor does it form a ratio scale. On the other hand, PT seems to work much better, although its rate of increase is not uniform throughout. The straight line represents the angular separation between two points √ in radians scaled to vary between zero and 2. Since, as mentioned above, the angular separation is what is left in the transformed data, any deviation from this line represents error in measurement. Euclidean distance applied to marginally transformed data always over9 Another mode of transformation is to divide the row vector by its two-norm (x x /||xx ||2 ). This transformation maps A and B to Ac and Bc in Fig. 5. A valid way of measuring the distance between Ac and Bc is to use the length of the arc rather than that of the straight line, because the arc constitutes the space that all points are mapped to in this mode of transformation. The length of the arc is the angle θ in radians. By the same token, a valid measure of distance for the marginally transformed data is not a straight line distance between two points Am and Bm as in Fig. 5; it is rather the distance between these points along the edges. Forcing Euclidean distance on the marginally transformed data is like producing the straight line distance between two geographic points when the travel distance is called for. Euclidean distance applied to the MT is not a valid measure of distance any more than the straight line distance is a valid measure of travel distance. This does not mean that the distance measured along the edges will produce a valid measure of association between two points; that is an entirely different matter. M.-W. Sohn / Social Networks 23 (2001) 141–165 153 Fig. 5. Measurement error in distance measures owing to proportional and√marginal transformations (the straight line in (A) represents the angle in radians scaled to vary between 0 and 2. The straight line in (B) shows the angle in radians). estimates the true extent of angular separation, while distance applied to the proportionally transformed data sometimes under- and sometimes overestimates it. Third, both transformations do not always produce identical distance between two points with the same angular separation. Suppose two points are chosen on the arc in Fig. 4 such that they are separated by 15◦ . Their locations are varied so that the lower point starts from (1, 0) and moves away from it by 5◦ . Fig. 5B shows how the two points with the same angle 154 M.-W. Sohn / Social Networks 23 (2001) 141–165 in different locations produce different distances for both transformations. The values on the X-axis show the location of the second point in terms of how much it is separated from the X-axis in degrees. The PT shows a U-shape; the distance is the greatest when two points are closest to either axis, while the distance is the smallest when they are closest to the 45◦ line. The marginal transformation again shows an inconsistent pattern. In Fig. 5B, the distance is the smallest when two points are closest to either axis; it is the largest when one point is located at the vertex (1, 1). When the two points are located on two different edges, the distance between them is smaller than another set of two points that are only 15◦ away from them. This clearly shows that Euclidean distance applied to either transformation does not produce a consistent measure. Two points that are separated by the same angle does not necessarily have the same meaning when Euclidean distance is used as a method of measuring association between them. In summary, given the problems discussed above, Euclidean distance applied to either the PT or the MT is not a reliable measure of niche overlap. 7. Cosine family measures of niche overlap I will now introduce three asymmetric measures of niche overlap that belong to what I would call “the cosine family”. These measures share the same operations in measuring the extent of overlap that are easier to illustrate than to explain (see Fig. 7). The earliest formula in this family was called the alpha coefficient (MacArthur and Levins, 1967) and was based on the proportionally transformed data: k pik pjk αij = (3) 2 k pik where α ij indicates the amount of competition species i receives from species j. Because the denominators are different, α ij and α j i have different values and are asymmetric. Many ecologists subsequently noted several problems with this formula (Colwell and Futuyama, 1971; see Krebs, 1989). For organizations such as hospitals which differ widely in size, the amount of resources organizations require from their environment varies widely as well and the difference in size is one of the most important sources of asymmetry. By using the proportions, however, the alpha coefficient sets the margin totals to one (e.g. k pik = 1) and ignores the difference in the raw margin totals (i.e. total resource utilization). It is not a true asymmetric measure as Fig. 5B graphically illustrates. For example, in a two-firm, two-niche hypothetical example as shown in Fig. 7, the alpha coefficient is computed as the ratio in the length of two line segments: that is, αba = (OA p /OBp ). Note that a firm whose point is located closer to either axis will always have a larger coefficient regardless of its total resource use. An alternative is to use the raw numbers rather than proportions as in k xik xjk Aij = (4) 2 k xik M.-W. Sohn / Social Networks 23 (2001) 141–165 155 Fig. 6. Niche overlap in continuous and discrete environments. Aij varies from zero to infinity. Aij can be directly estimated by regressing xik on xj k with no intercept (i.e. Aij = β). One big problem with Aij is that a large xj k relative to xik can completely dominate the amount of competition being measured (i.e. Aij > 1 when xik < xjk ). This formula does not meet the requirement that the measure of niche overlap be limited by the total niche size of the focal firm. To capture the overlap correctly, one can start with continuous resource niches. The gray area in Fig. 6A indicates the overlap by integrating the lower of the x and can be calculated ∞ two curves, i.e. min(f(x), g(x)): 0 1 g(x) dx + x1 f (x) dx. If we assume that f(x) is the function for the resource utilization pattern of hospital i and g(x) the function for j, then x1 Cij = 0 ∞ ∞ g(x) dx + x1 f (x) dx min(f (x), g(x)) dx ∞ ∞ = 0 0 f (x) dx 0 f (x) dx 156 M.-W. Sohn / Social Networks 23 (2001) 141–165 The same graph for discrete niche states is shown in Fig. 6B. The general formula for Cij for the discrete niche segments is obtained by Cij = k wik min(xik , xjk ) k wik xik (5) The weight, wik , is necessary if the discrete niche segments are not evenly divided (see Colwell and Futuyama, 1971). The weight (wik ) indicates the relative abundance of resources in each discrete resource segment (i.e. total amount of resources available from each segment), or it alternatively indicates how significant a segment is to a firm. When all the niche segments are the same, wik cancels out and the resulting formula is essentially the same as the one for the continuous niche positions given above. However, if the niche segments are different in their widths or resource endowment, the overlap in each niche segment has different significance and the same overlap in different niche segments may have different competitive implications for the firms involved. If the weight is chosen such that wik = pik (i.e. the proportion of firm i’s total resource utilization that comes from k’s niche segment), Eq. (5) becomes Cij = k xik min(xik , xjk ) 2 k xik (6) When the resource segments are the same, the weight in Eq. (5) cancels out. When the weights are different, however, only the denominator in the weight cancels out and Eq. (6) is obtained. When xik > xjk for all k, Eq. (6) is identical to Eq. (4) and has values between 0 and 1, not inclusive. When xik < xjk (i.e. when the niche of i is included in that of j), Eq. (6) reduces to one. The only difference between Eqs. (4) and (6) is that xj k is substituted for min(xik , xj k ) so that Cij is restricted to lie between 0 and 1. Fig. 7 graphically shows this: Cij = (OA /OB), where OA = OB if OA > OB. Cij = Aij whenever OA is less than or equal to OB. However, OA will be set to OB whenever the former is larger than the latter. Therefore, Cij will always be less than or equal to Aij . The numerator in Eq. (6) then provides the total area of overlap and the denominator the total resource niche of firm i. The competition coefficient (Cij ) varies between 0 and 1, with one indicating the complete overlap and zero no overlap. Eq. (6) is a measure that satisfies all requirements of a good niche overlap as discussed above. The competition coefficient is interpreted as the proportion of the resource niche of a firm overlapped by a competitor; a competition coefficient of 0.5 means an overlap of 50% of a firm’s resource niche. Notice however, that Eq. (6) is obtained when one uses pik as the weight in Eq. (5); different weights can also be used if one so prefers or if there is a theoretical reason to do so. 10 10 The p is used in calculating alpha coefficient (Eq. (3)) and in a method of computing market concentration in ik the hospital industry proposed by Zwanziger et al. (1990). Other weights can be considered such as the perceived importance of a resource segment from a focal firm’s point of view (e.g. Brooks, 1995) or the proportion of total revenue that comes from a resource segment. M.-W. Sohn / Social Networks 23 (2001) 141–165 157 Fig. 7. Cosine family measures of overlap (note: αba = OA p /OBp ; cos θ = OA c /OBc ; Aba = OA /OB; Cba = OA /OB; where OA = OB if OA > OB. 8. Comparison of methods Both symmetric and asymmetric measures of niche overlap are applied to the sample of six hospitals for comparison. Table 2 shows descriptive statistics of various niche overlap measures for 30 pairs of hospitals obtained from the following methods: Euclidean distance using Eq. (1) applied to the untransformed data (ED/Raw), Euclidean distance applied to the proportionally transformed data (ED/PT), Euclidean distance applied to the marginally Table 2 Descriptive statistics of various niche overlap measures Variable name n Mean S.D. Minimum Maximum ED/Rawa 30 30 30 30 30 30 30 4159.117 0.209 2.188 0.382 0.398 1.592 0.322 1766.539 0.043 0.497 0.203 0.258 3.486 0.323 1078.663 0.117 1.390 0.073 0.061 0.019 0.013 6359.657 0.296 2.964 0.777 1.266 14.792 1.000 ED/PT ED/MT Cosine Alpha A C a ED/Raw is obtained by applying Euclidean distance algorithm (Eq. (1)) to the untransformed data, ED/PT from Eq. (1) applied to the proportionally transformed data, and ED/MT from Eq. (1) applied to the marginally transformed data. Cosine is obtained by Eq. (2) applied to the untransformed data. Alpha is obtained by Eq. (3) (using the proportionally transformed data). Both A and C are obtained by applying Eqs. (4) and (5) to the untransformed data, respectively. 158 M.-W. Sohn / Social Networks 23 (2001) 141–165 Table 3 Correlation coefficients among niche overlap measures for six selected Los Angeles hospitals (n = 30) ED/Rawa (1) (2) ED/PT (3) ED/MT (4) Cosine (5) Alpha (6) A (7) C a 1 2 3 4 5 6 7 1.0000 −0.5090 0.0027 0.1775 0.1680 0.1415 0.1019 1.0000 0.6271 −0.8439 −0.6999 −0.4727 −0.3696 1.0000 −0.8866 −0.7528 −0.3625 −0.4282 1.0000 0.8416 0.4234 0.4592 1.0000 0.2227 0.1604 1.0000 0.7470 1.0000 See the footnote to Table 2 for the meaning of row labels. transformed data (ED/MT), cosine of the angle using Eq. (2) applied to the untransformed data (cosine), alpha coefficient using Eq. (3) (alpha), asymmetric measure using Eq. (4) applied to the untransformed data (A), and competition coefficient using Eq. (6) applied to the untransformed data (C). The first three are distance measures, while the last four are measures in the cosine family. Euclidean distance measures dissimilarity in the sense that a smaller value indicates larger overlap, while all algorithms in the cosine family (cosine, alpha, A, and C) measure similarity and indicate larger overlap with a larger value. Table 3 shows correlations among seven measures. Since Euclidean distance is a dissimilarity measure and cosine is a similarity measure, a negative correlation between distanceand cosine-based measures indicates consistent results. Three patterns stand out. First, the correlations between ED/Raw and other methods are low and ED/Raw and ED/PT in particular show a negative and significant correlation. As we have seen above, this pattern of correlations between ED/Raw and other distance-based methods is not surprising and is evidence that ED/Raw is unduly influenced by differences in size. Second, the four measures, ED/PT, ED/MT, cosine, and alpha, show strong correlations among themselves, indicating that they measure something similar. Cosine has the strongest correlations with the other three (over 0.8), while ED/PT and ED/MT show the weakest correlation with each other of all possible pairs. Given Fig. 4 and related discussions above, one can infer from this that cosine is the preferred method of computing the symmetric measure of niche overlap. Euclidean distance on either PT or MT creates measurement error that makes them less appropriate (see Section 6 above). The pattern of correlations further suggests that alpha is much closer to a symmetric than to an asymmetric measure such as A or C. The asymmetry that alpha captures does not come from differences in size but from the angle at which the line connecting the origin and the location of the firm intersects the X-axis. Again, since alpha is not a true asymmetric measure of overlap, cosine may be preferred to alpha (see Pianka, 1975; May, 1975). Third, C correlates strongly with A (r = 0.75), but either A or C does not correlate above 0.5 with any other method. Especially notable is the weak correlation between C and alpha. With the absence of a gold standard with which these methods can be compared, the correlational analysis can only allow one to make some conjectures about these patterns of similarity and difference among the methods. In order to better understand how these methods work, one can however, visually inspect the overlap patterns shown in Fig. 2 M.-W. Sohn / Social Networks 23 (2001) 141–165 159 and compare it to various indices of niche overlap shown in Table 3. As discussed above, Fig. 2 shows patterns of (1) no or minimal overlap between Los Angeles Community and Midway (Fig. 1A); (2) market dominance between Hollywood Community and the others (Fig. 1D), and, finally; (3) unequal overlap among California medical center, Queen of Angels Hollywood Presbyterian, and Cedars-Sinai (Fig. 1C). The figures in Table 4 are calculated using the seven methods discussed above and are all converted into a common scale so that they have consistent meanings with the value of one indicating the maximum competition among all pairs of hospitals. The first four are symmetric measures and have identical values in corresponding cells along the main diagonal. For the two true measures of asymmetry, Aij or Cij indicates the amount of hospital i’s niche overlapped by hospital j’s, where i indicates the hospital on the row and j one on the column in Table 4. 8.1. No or minimal overlap The scatter plots in Fig. 2 suggest that ZIP codes that sent patients to either Los Angeles Community or Midway did not generally send them to the other hospital. There are a small number of ZIP codes that did but the magnitude of boundary-crossing was small. So a good method should produce an overlap measure that is small or close to zero for this pair of hospitals. ED/Raw indicates considerably high (0.890) and ED/MT moderate (0.155) overlap between them, while all other methods consistently indicate low or no overlap. On the other hand, C61 is 0.051 and C16 is 0.019; both indicate low levels of overlap as expected. 8.2. Patterns of market dominance Cedars-Sinai is the largest among the six hospitals in the data set, while Hollywood Community is the smallest. Therefore, a pattern of market dominance or the included relationship is expected of these hospitals. ED/Raw indicates moderate overlap, while the other three symmetric measures indicate a high degree of overlap. For the asymmetric measures, Hollywood Community generates almost no competition towards Cedars-Sinai judging from the minimal overlap (A54 = 0.000 and C54 = 0.009), while the latter generates extremely strong competition towards the first (A45 = 0.849 and C45 = 0.981). Notice that Hollywood Community according to alpha is a stronger competitor than Cedars-Sinai. It shows that all the other symmetric measures as well as alpha cannot accurately capture this pattern of overlap. They consistently indicate moderate to strong competition between the two (except for Cedars-Sinai and Hollywood, where ED/Raw finds low competition), and this is what is expected of symmetric measures that ignore asymmetry. 8.3. Unequal overlap Three largest hospitals among the six show unequal overlap patterns. They are California medical center, Queen of Angels, and Cedars-Sinai. Unequal overlap here means a pattern that falls somewhere between dominance and equal overlap. California medical center and 160 M.-W. Sohn / Social Networks 23 (2001) 141–165 Table 4 Various indices of niche overlap fox six hospitals in Los Angeles (n = 30)a 1 (1) Los Angeles Community hospital 2 3 4 5 6 0.5619 0.3214 0.3334 0.2979 0.1860 0.0523 0.2714 0.2947 0.3425 0.0000 0.2020 0.1072 0.0555 0.3540 0.9718 0.3532 0.2061 0.2137 0.1122 0.0018 0.0298 0.0358 0.3738 0.0055 0.0250 0.0001 0.0286 0.2185 0.8897 0.0000 0.1554 0.0000 0.0207 0.0034 0.0189 0.4301 0.6765 0.6378 0.7112 0.3666 0.0528 0.5806 0.5527 0.4661 0.6501 0.4105 0.2107 0.0005 0.0126 0.0237 0.4829 0.4794 0.2716 0.0964 0.0299 0.3424 0.5865 0.2933 0.7775 0.4397 0.3217 0.0074 0.1032 0.3112 1.0000 0.9318 1.0000 0.5894 0.0015 0.0278 0.0000 0.6536 0.2907 0.4039 0.1762 0.0290 0.3390 0.3402 0.3744 0.4562 0.4750 0.4015 0.0054 0.0806 0.0636 0.7918 0.6909 0.6444 0.2862 0.8487 0.9806 1.0000 0.4539 0.7811 0.5813 0.4884 0.1478 0.7600 (2) California medical center 0.5619 0.3214 0.3334 0.2979 0.1812 0.0056 0.0721 (3) Queen of Angels Hollywood Presbyterian medical center 0.2947 0.3425 0.0000 0.2020 0.1504 0.0025 0.0301 0.4301 0.6765 0.6378 0.7112 0.4918 0.0266 0.2977 (4) Hollywood Community hospital center 0.9718 0.3532 0.2061 0.2137 0.1596 0.0724 0.2701 0.5527 0.4661 0.6501 0.4105 0.2839 0.3332 0.6999 0.3112 1.0000 0.9318 1.0000 0.5984 1.0000 0.9780 (5) Cedars-Sinai medical center 0.0358 0.3738 0.0055 0.0250 0.0595 0.0000 0.0000 0.0237 0.4829 0.4794 0.2716 0.2752 0.0090 0.0895 0.0000 0.6536 0.2907 0.4039 0.3360 0.0180 0.1647 0.0636 0.7918 0.6909 0.6444 0.5155 0.0002 0.0088 (6) Midway hospital medical center 0.8897 0.0000 0.1554 0.0000 0.0000 0.0040 0.0509 0.5865 0.2933 0.7775 0.4397 0.2194 0.0761 0.5726 0.3402 0.3744 0.4562 0.4750 0.2016 0.1136 0.8157 1.0000 0.4539 0.7811 0.5813 0.2459 0.0059 0.0876 a 0.1886 0.7266 1.0000 0.9257 1.0000 0.0081 0.1271 0.1886 0.7266 1.0000 0.9257 0.2933 0.2547 1.0000 Seven values in the cells represent indices of niche overlap calculated using the following methods: (1) ED/Raw; (2) ED/PT; (3) ED/MT; (4) cosine; (5) alpha; (6) A; (7) C. All the values are scaled to be in the same metric: the pair with the minimum level of overlap has the value of 0 and the pair with the maximum overlap has 1. M.-W. Sohn / Social Networks 23 (2001) 141–165 161 Queen of Angels show considerable overlap according to C (C23 = 0.581 and C32 = 0.300), but A indicates only a moderate to low overlap. This overlap pattern suggests that the latter is the stronger competitor of the two. On the other hand, Cedars-Sinai proves to be a much stronger competitor vis-à-vis either California medical center or Queen of Angels. The amount of competition generated by Cedars-Sinai towards these two hospitals is considerable (C25 = 0.342 and C35 = 0.339), while the amount it receives from them is low (C52 = 0.090 and C53 = 0.165). This larger than expected asymmetry comes from a number of ZIP codes that sent patients exclusively to Cedars-Sinai. Note that many of the ZIP codes that sent patients to either California medical center or Queen of Angels also sent substantial number of patients to Cedars-Sinai. Again the symmetric measures as well as alpha cannot correctly capture the asymmetry for the unequal patterns of overlap, and A does not produce coefficients that are consistent with the patterns of overlap in Fig. 2 in some cases. In summary, these comparisons show that the symmetric measures and alpha cannot handle the patterns of dominance and unequal overlap adequately and that A in some cases suggests overlap that is not consistent with the visual pattern in the plots. On the other hand, C is consistent with the patterns in Fig. 2. These comparisons do not provide conclusive evidence to accept C as the best measure of niche overlap. However, the visual comparisons help eliminate the first five methods (three distance-based methods plus cosine and alpha) as inadequate measures of niche overlap because of their inability to handle some overlap patterns (e.g. unequal overlap and included patterns) and their inconsistency with the patterns shown in the plots. A on the other hand has its own problem as discussed above and it sometimes suggests overlap not consistent with the actual patterns. Thus, by way of elimination, C can be accepted as the best and most valid among the alternatives considered in this paper. 9. Discussion and conclusion I have so far reviewed and compared several proposed methods of measuring niche overlap. Among them, a special attention was paid to whether Euclidean distance could be used as an appropriate measure of niche overlap. Mathematical properties as well as potential measurement error resulting from applying Euclidean distance to the raw and transformed data were examined. I conclude that Euclidean distance is not an appropriate method for measuring niche overlap for the following three reasons. First, Euclidean distance on a raw (untransformed) data set is not a good method of measuring niche overlap, because it is overly sensitive to size differences between organizations. Second, data transformation is not a solution to the problem with Euclidean distance just discussed. Both PT and MT ignore differences in organizational size, and whatever is measured from the transformed data is the angular separation rather than the extent of overlap in niches. Moreover, Euclidean distance applied to the transformed data lacks consistency and comparability within the measure itself. Third, a more fundamental problem with Euclidean distance is that it is a symmetric measure. No matter what transformation Euclidean distance is applied to, it only captures 162 M.-W. Sohn / Social Networks 23 (2001) 141–165 the symmetric aspect of niche overlap. However, when there is a large disparity in size among organizations and in their resource needs, their competitive relationship is necessarily asymmetric and directional. A method that detects asymmetry in the competitive relationship is thus preferable, and Euclidean distance does not qualify as one. In general, Euclidean distance is not the right method of measuring association between two vectors in a ratio scale. How much difference does the choice of one method over another make in measuring niche overlap? Burt (1988) and Burt and Carlton (1989) provide a good illustration: they applied Euclidean distance to similar data using respectively PT and MT and obtained substantially different results. Burt and Carlton (1989) claimed that ED/MT is “the more useful measure for sociological studies of market boundaries for organizational analysis because they more clearly reveal variation in the resource-flow patterns that define structurally equivalent (substitutable) production activities as a market” (p. 749). But the analysis in this paper suggests that the different results obtained in these two papers may derive more from methodological artifacts than from any meaningful difference in substance. These two papers highlight the different results obtained by two different data transformation methods. Faust and Romney (1985) earlier showed how “profound” differences applying the ED/MT or the Pearson correlation to the same data could make. It was shown that, among the measures considered in this paper, C is the only adequate measure of overlap that can also correctly detect asymmetric and directional nature of overlap between two firms. In applying the niche overlap theory for the purpose of obtaining competitive intensity between organizations in general and in using C in particular, one must bear in mind that the measure is only as good as how well the resource segments are identified for the organizations under study. Two issues need to be emphasized when resource segments are defined in order to detect overlap between two organizations using C or other methods of computing overlap. First, all critical resource segments need to be identified and included in the computation. Some resources may be important for organizational survival but may generate little competitive pressure for the organizations involved. These would be resources over which an organization enjoys structural autonomy (Burt, 1992). They need not be included, but others with competitive implications must all be included in order for the niche overlap to accurately reflect the true competitive intensity among organizations. Second, niches need to be broken down into as homogeneous segments as possible. Competition is multidimensional and many factors contribute to the competitive intensity between organizations in complicated ways. A method of niche overlap must then be able to combine multiple factors into a single summary measure. When resource segments are broadly defined, it is possible that these segments contain qualitatively disparate resources and organizations might utilize them in different proportions. The resulting measure of overlap then might not correctly reflect the way resource niches overlap. This is an issue of niche dimensionality: two species that occupy the same niche may not necessarily compete with each other if they tap into different dimensions in their niches (Pianka, 1983). For example, for two species that feed on the same kind of trees, one species may collect food only from the lower part of the trees and the other from the higher part. Another example from the hospital industry: two hospitals that draw patients from the same area may provide services to two different populations, one to children and the other to adult patients. Cij can M.-W. Sohn / Social Networks 23 (2001) 141–165 be extended to multidimensional niches as follows Cij = w ∗ Cijk 163 (7) k where w∗ is the weight that indicates the relative importance of each dimension to firm i and Cij k the competition coefficient (Cij ) computed for each resource dimension k. As an illustration, Sohn (2001) divided patients from each ZIP code into 49 groups according to their diagnoses to make sure that patients within each segment are as similar to one another as possible from the service provider’s point of view. He applied Eq. (6) separately to these 49 groups of patients, obtained a matrix of competition coefficients for each patient group, and combined them into a single summary matrix by using weighted averages. The weights were the proportions of total number of admissions that were in these service groups. There are several other methods not considered in this paper. One notable method was proposed by McPherson (1983). This in fact was the first sociological application of the niche overlap theory to organizational analysis and is still used in many studies (McPherson and Smith-Lovin, 1988; Popielarz and McPherson, 1995; McPherson and Rotolo, 1996). It relies on identifying the niche breadth on each niche dimension and computing the “rectangular” area of overlap by multiplying breadths of all relevant dimensions. 11 This method is similar to the method proposed in this paper in that they can both detect asymmetry and their coefficients vary between 0 and 1. However, this method ignores resource utilization levels at each resource position and only uses information about how widely an organizational population collects its resources from. One consequence of such an omission is that generalists with broader niches are always found to be stronger competitor than specialists with narrower niches (McPherson, 1983, p. 526) and that it cannot correctly detect a pattern described by the dotted line in Fig. 1D (i.e. a specialist outcompeting a generalist in its own narrow range). In addition, the resource dimensions like age, education, and occupation must be correlated and deriving the area of overlap multiplicatively would more than likely overestimate the actual overlap (see McPherson, 1983, p. 529). One can get around both problems by using Cij on data stratified by occupational groups, sex, and educational attainment (e.g. each stratum would comprise a n × r matrix with n being the number of organizational types and r the number of age groups). So, for example, if there were nine occupational groups, two sexes, and four educational levels, there would be 48 (6 × 2 × 4) separate Cij k ’s to be computed and to be combined into a single index by Eq. (7). The competition coefficient is the cornerstone of dynamically modeling the growth and decline of organizations and in describing the equilibrium state where many organizations can coexist (Levins, 1968; McPherson, 1983). The competition coefficient as discussed in this paper assumes a static environment. However, it can be used in a longitudinal analysis if one observes the competitive intensity among firms in a period in time and examines how competition affects organizational performance or effectiveness in the next period (see Sohn et al., 1999). 11 Niche breadth for the age dimension, for example, is defined as the ±1.5 standard deviation around the mean age of people. 164 M.-W. Sohn / Social Networks 23 (2001) 141–165 The new economic sociology takes as its starting point the realization that economic action is structurally embedded (Granovetter, 1985). Most studies have so far looked at the effect of positive or cooperative ties (formal and informal) that affect organizational performance and decision-making (e.g. Uzzi, 1996, 1997). However, there is a dearth of studies that focus on the structural embeddedness of organizations in negative or competitive ties. Much of it can be attributed to the lack of a good theoretical framework that allows a more sociological conceptualization of competition and a robust method of measuring it. Niche overlap theory can fill the void by allowing competition to be defined and measured in a more rigorous way than previously possible. When the niches are fully specified, the resulting overlap measure can accurately indicate the intensity of competition an organization faces from its neighbors. Currently, the availability of fine firm–niche relationship data seems to hamper a wider acceptance of the theory, but this situation should improve as time goes on. The theory promises to be a big step forward in both network analysis of organizations and economic sociology. However, before we pick the fruits of this theory, we need to pay more attention to the measurement issues involved in computing niche overlap. Acknowledgements I thank Willard Manning, Thomas D’Aunno, Ronald Thisted, Edward Laumann, and Phil Schumm, and Ted Karrison for their helpful comments on earlier drafts of this paper. References Anderberg, M.R., 1973. Cluster Analysis for Applications. Academic Press, New York. Baum, J.A.C., Singh, J.V., 1994. Organizational niches and the dynamics of organizational mortality. American Journal of Sociology 100, 346–380. Baum, J.A.C., Haveman, H.A., 1997. Love thy neighbor: differentiation and agglomeration in the Manhattan hotel industry, 1898–1990. Administrative Science Quarterly 42, 304–338. Bonacich, P., 1972. Technique for analyzing overlapping memberships. In: Costner, H.L. (Ed.), Sociological Methodology. Jossey-Bass, San Francisco, pp. 176–185. Brittain, J.W., Wholey, D.R., 1988. Competition and coexistence in organizational communities: population dynamics in electronic components manufacturing. In: Glenn, R.C. (Ed.), Ecological Models of Organizations. Ballinger Publishing Company, Cambridge, MA, pp. 195–222. Brooks, G.R., 1995. Defining market boundaries. Strategic Management Journal 16, 535–549. Burt, R.S., 1988. The stability of American markets. American Journal of Sociology 93, 356–395. Burt, R.S., 1992. Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge, MA. Burt, R.S., Talmud, I., 1993. Market niche. Social Networks 15, 133–149. Burt, R.S., Bittner, W.M., 1981. A note on inferences regarding network subgroups. Social Networks 3, 71–88. Burt, R.S., Carlton, D.S., 1989. Another look at the network boundaries of American markets. American Journal of Sociology 94, 723–753. Colwell, R.K., Futuyama, D.J., 1971. On the measurement of niche breadth and overlap. Ecology 52, 567–576. Cronbach, L.J., Gleser, G.C., 1953. Assessing similarity between profiles. Psychological Bulletin 50, 456–473. Faust, K., Romney, A.K., 1985. Does STRUCTURE find structure? A critique of Burt’s use of distance as a measure of structural equivalence. Social Networks 7, 77–103. Fox, J., 1982. Selective aspects of measuring resemblance for taxonomy. In: Hudson, H.C. (Ed.), Classifying Social Data. Jossey-Bass, San Francisco, pp. 127–151. M.-W. Sohn / Social Networks 23 (2001) 141–165 165 Granovetter, M., 1985. Economic action and social structure: the problem of embeddedness. American Journal of Sociology 91, 481–510. Griffith, J.R., 1972. Quantitative Techniques for Hospital Planning and Control. Lexington Books, Lexington, MA. Hannan, M.T., Freeman, J., 1977. The population ecology of organizations. American Journal of Sociology 82, 929–964. Hannan, M.T., Freeman, J., 1989. Organizational Ecology. Harvard University Press, Cambridge, MA. Krebs, C.J., 1989. Ecological Methodology. Harper and Row, New York. Laumann, E.O., 1973. Bonds of Pluralism: The Form and Substance of Urban Social Networks. Wiley, New York. Levins, R., 1968. Evolution in Changing Environments. Princeton University Press, Princeton, NJ. McPherson, J.M., 1983. An ecology of affiliation. American Sociological Review 48, 519–535. McPherson J.M., Smith-Lovin, L., 1988. A comparative ecology of five nations. In: Glenn, R.C. (Ed.), Ecological Models of Organizations. Ballinger Publishing Company, Cambridge, MA, pp. 85–109. McPherson, J.M., Rotolo, T., 1996. Testing a dynamic model of social composition: diversity and change in voluntary groups. American Sociological Review 61, 179–202. MacArthur, R.H., Levins, R., 1967. The limiting similarity, convergence, and divergence of coexisting species. American Naturalist 101, 377–385. May, R.M., 1975. Some note on estimating the competition matrix, α. Ecology 56, 737–741. Pianka, E.R., 1973. The structure of lizard communities. Annual Review of Ecology and Systematics 4, 53–74. Pianka, E.R., 1975. Niche relations of desert lizards. In: Martin, L.C., Jared, M.D. (Eds.), Ecology and Evolution of Communities. The Belknap Press of Harvard University, Cambridge, MA, pp. 292–314. Pianka, E.R., 1981. Competition and Niche Theory. In: Robert, M.M. (Ed.), Theoretical Ecology, 2nd Edition. Blackwell, Oxford, pp. 167–196. Pianka, E.R., 1983. Evolutionary Ecology, 2nd Edition. Harper and Row, New York. Podolny, J.M., Stuart, T.E., Hannan, M.T., 1996. Networks, knowledge, and niches: competition in the worldwide semiconductor industry, 1984–1991. American Journal of Sociology 102, 659–689. Popielarz, P.A., McPherson, J.M., 1995. On the edge or in between: niche position, niche overlap, and the duration of voluntary association memberships. American Journal of Sociology 101, 628–720. Schwenk, C.R., 1988. The cognitive perspective on strategic decision making. Journal of Management Studies 25, 41–55. Sohn, M.-W., 2001. Relational approach to measuring competition among hospitals. Health Services Research, in press. Sohn, M.-W., Manheim, L.M., Pearce, W.F., 1999. Market competition among hospitals and the diffusion of medical technology: the case of PTCA. In: Paper Presented at the Association for Health Services Researchers Meeting, Chicago, 1999. Starbuck, W.H., Milliken, F.J., 1988. Executives’ perceptual filters: what they notice and how they make sense. In: Donald, C.H. (Ed.), The Executive Effect: Concepts and Methods for Studying Top Managers. JAI Press, Greenwich, CT, pp. 35–65. Tenbrunsel, A.E., Galvin, T.L., Neale, M.A., Bazerman, M.H., 1996. Cognitions in organizations. In: Stewart, R.C., Cynthia, H., Walter, R.N. (Eds.), Handbook of Organization Studies. Sage Publications, London. Uzzi, B., 1996. The embeddedness and economic performance: the network effect. American Sociological Review 61, 674–698. Uzzi, B., 1997. Social structure and competition in interfirm networks: the paradox of embeddedness. Administrative Science Quarterly 42, 35–67. Zwanziger, J., Glenn, A.M., Mann. J.M., 1990. Measures of hospital market structure: a review of the alternatives and a proposed approach. Socio-Economic Planning Science, 24, 81–95.
© Copyright 2026 Paperzz