Appendix 2: Identifying Physician Networks Our approach to

Appendix 2: Identifying Physician Networks
Our approach to identifying PPCs used group-finding techniques common to social network
analysis (SNA). SNA is the term used to capture a set of mathematical and graphical techniques for
measuring and analyzing how units are connected to each other).1 In general, a social network consists of
two classes of data: nodes (physicians in this case), and edges, which capture the level of interaction
between pairs of nodes. Physicians are connected (edges) by the patients they share. The joint sets of all
nodes and edges in a population constitute the network. Our method for finding PPCs involved
constructing the entire network of physicians who share patients and then identifying smaller groups
within the network as the “physician practice communities.”
Defining physician networks
The first stage of this process is illustrated in the figure at
right. Any patient visiting more than one physician creates a
link between each pair of physicians they visit. For example,
Patient 3 creates links between AE, EF, and AF. Networks
extend
through
overlapping
patients;
for
example,
physicians A and F share two patients, and F and C also share two. A and C have no patients in common
but are connected indirectly through F. The set of all nodes reachable through a chain of shared patients is
called the largest component of the graph.
Our interest is in identifying local physician practice communities, so we emphasized strong and
relatively local ties in three ways that differ from the simple counting model implied in the figure above.
First, we coded each patient’s contribution to the strength of tie between two physicians as the minimum
number of times either physician had seen the patient. We then summed over all common patients to
identify the total edge strength between the two physicians. Second, if a pair of physicians shared only a
single-visit, single patient tie (value 1), we recoded edge value to 0.025. Finally, we excluded ties
between pairs that were in the top decile for geographic distance between two physicians.
The logic behind weighting ties as the minimum number of distinct visits to either physician,
rather than a simple count of shared patients, is to capture the joint familiarity between physicians through
shared treatment. If, for example, physician A sees a patient 10 different times and physician B only once,
it seems incorrect to count their connection as equivalent to two physicians who each see the same patient
10 times. By choosing the minimum, we discount patients who were shared in a limited way in favor of
stronger, more established relationships. Since this is summed overa all shared patients, the number of
shared patients is embedded in the measure.
Across our 15 state-year combinations, the average
correlation between the simple count of patients shared and the sum across all shared patients of the
minimum number of days the patient was seen by either physician is 0.87 (std dev 0.01) showing a very
clearly linear relation in the scatter plot (on log-log scale, figure available on request). Since the group
detection model uses a weighted tie value and these correlate very highly, we expect minimal differences
in the final results.
The primary purpose of recoding low-value ties created by a single shared patient-visit from 1 to
a small-but-nonzero value (we used 0.025) was to mitigate the variability created due to patient sampling
and interaction with boarder state physicians and emphasize substantive shared patient relations (in
keeping with the insights from Barnett et al (2011)). Upon examination, many low-volume edges were
either “pendant” ties -- composed of a single node sharing one patient with another physician in the
largest component -- or a wider set of weak ties amongst non-state physicians. While we contemplated
removing these edges entirely, we felt giving them a light weight would allow pendant nodes to link to
their partners while minimizing the effect of ad hoc ties in creating non-substantive bridges between real
groups. To check model sensitivity to this transformation, we compared the results in Pennsylvania in
2010 – the largest network so it has the most recoded values – by running the clustering routines with and
without the recoded values. The resulting partition is very similar. For the first stage clustering, used to
identify large regional clusters, the resulting clustering between the two approaches matches with a
Cramer’s V of 0.97 and an adjusted Rand index of 0.93; indicating a very high correspondence. We then
ran the 2nd local-level clustering within each of the larger groups. The average adjusted rand score was
0.88 with a median of 0.92, the difference due to a single low outlier, which upon investigation turned out
to be composed entirely of low-volume, out-of-state relations. Both coding schemes consistently assigned
within-state physician pairs. Since we remove any cluster that is majority out-of-state from the sample
for the care analyses, these results would have had no effect on our ultimate modeling. Thus while we
think the lower weighting is a reasonable hedge against ad hoc connections in these lower-sampled ties, it
appears to have low substantive effects on the set of resulting PPCs we use in our modeling
The choice to exclude long-distance ties is primarily to help bound the network around the region
in each selected state. Since edges link nodes indirectly, long chains of patients can connect thousands of
physicians into a single component; our preliminary work indicated that even local samples of patients
generate nationally extensive networks. For example, the figure at left shows that our sample of patients
from Pennsylvania generated links to physicians around the nation (with understandably greater volume
within the state, degree is the number of other
physicians in this sample each physician is
connected to). While this larger network would
be important for studying problems such as
diffusion, the full structure is not likely to be
useful
for
studying
physician
practice
communities – the local networks in which
patients and their physicians are embedded. In
large urban areas near state borders, such as Philadelphia, PA or Vancouver WA, it is likely that some
PPCs span the border, so a simple hard-coding rule that excluded all out-of-state ties was not appropriate.
Since the shape of the distance distribution is similar in all five states, differing only in the scale, selecting
a common percentile cutoff from the edge-distance distribution provides a consistent rule that is also
tailored to the variable geography of each state.
The figure below provides the cumulative distribution of cases by distance for each State and year
(within-state distributions are very similar, so points overlap). The 90% cut-off value corresponds to the
leveling off of the tail of the
distribution. Since the weight of an
edge is typically much lower for
long-distance ties, this means that
the edges removed were largely
low-weight ties, and as such, we
expect this to have little or no effect
on the within-region community
detection process.
Identifying Physician Practice Communities within the network
There is a large literature on the difficult task of identifying communities within networks.2-5 We made
primary use of the well-known Blondel model,6 as implemented in the software package PAJEK,7 with all
other data manipulations computed in SAS. Two key choices that inform this process are the (a) use of
the Blondel et. al detection method rather than alternative detection routines and (b) the selection of the
resolution parameter for identifying PPCs within regions. We discuss each in detail below.
In general, community detection involves partitioning network nodes into mutually exclusive
groups to maximize within-group ties and minimize between group ties. While there are other definitions
of network groups, the relative density formulation is by far the most standard and appropriate for this
project (Porter et al 2009; Moody and Coleman 2013).4,5 The emerging standard metric for community
detection is the modularity index,8 calculated as:
𝑄=∑
𝐾𝑖 𝐾𝑗
1
(𝐴𝑖𝑗 − 𝛾
) 𝛿(𝐶𝑖 𝐶𝑗 )
2𝑚
𝑖𝑗 2𝑚
where m is the number of edges, k is degree, Aij is the edge weight between physicians i and j, d is an
indicator that equals 1 if node i and j are in the same community, and g is a “resolution parameter” that
identifies the scale at which clustering is observed (Reichardt and Bornholdt 2006, Fortunato and
Barthelemy 2007).9,10 Substantively, [kikj/2m] represents the null model – the expected contact between
nodes with this degree, so (Aij - [𝛾
𝐾𝑖 𝐾𝑗
2𝑚
]) is the connectivity above random expectation, normalized by the
total volume of ties in the network. Modularity reaches a maximum of 1 if all ties fall within distinct
groups and has a value of zero if ties are as likely within as between communities. The resolution
parameter is a key feature, particularly as networks become very large (as in the national Figure above).
Optimization of Q in these types of geographically grounded large networks tends to be biased toward
finding a small number of large groups, so a naïve search to maximize Q can lead to unsatisfactory results
as multiple small groups are lumped into larger aggregates. We carefully developed multiple strategies for
combating this tendency, resulting in a uniquely tuned tool for identifying comparatively small (~150
physician) groups.
The Blondel model uses a local aggregation strategy, first finding many small groups, then
treating those resulting groups as a “super-node” in a (now smaller) network, and then repeating this
process at the higher level, continuing until no improvement in modularity is identified. In general, such
“greedy algorithm” approaches can fail if an initial assignment is poor (as each step builds on the prior
step). A unique feature of the Blondel approach is that early assignments are tested against multiple
alternatives at later levels, allowing corrections to the assignment process that other “fast and greedy”
style algorithms do not. Importantly, the runtime for the Blondel model on sparse graphs (such as ours) is
linear in |V|, and thus practical for graphs as large as ours.
We spent considerable energy fine tuning and calibrating our community detection algorithm and
have high faith in the stability and reliability of the method. Still, plausible alternative community
detection routines exist and it is worth discussing the implications of algorithm choice. Landon et al 11
used the Girvan Newman edge-betweenness algorithm. The Girvan-Newman model recursively deletes
edges from the network that link parts of the network that are otherwise less connected (those with high
“edge betweenness”) – effectively removing the weakest links connecting disconnected sets.
This
weighting is recalculated after every edge is removed and generates a tree of nodes that remain together
the longest. The user then identifies communities based on the edge removal step along the tree that
maximizes the modularity score. A key advantage of the G-N algorithm is that it is essentially
deterministic, with randomness only coming to play to break edge-weight ties. The critical disadvantage,
however, is that the method is computationally intense – making it impossible to implement on networks
the size we are dealing with here (our networks have millions of edges). For example, the iGraph
implementation of the algorithm has runtime: |V||E|^2.
We initially attempted running the iGraph
implementation of the G-N algorithm, but it did not return a result after days of running. A theoretically
attractive alternative algorithm is the Oslom hierarchical statistical model.12 The Oslom model has an
attractive multi-level structure that could be used to automate our two-level process and incorporates a
statistical testing framework to assessing cluster detection. Unfortunately but initial tests proved
unworkable in these data, as we were unable to get the models to run. Finally, given the dynamic nature
of our data, new fully dynamic methods13 are currently not implemented for networks of this scale, but
methods to do so might be developed in the future.
To identify PPCs in these large regional networks, we used a two-stage implementation of the
basic Blondel model, run separately by state. In the top-level stage, we run the model on the full network
with a low resolution parameter (0.75) to identify a small number of very large clusters (typically less
than 10 in each state). This results in large regional concentrations that are highly segmented (with
modularity scores over 0.9). We then re-applied the group detection routine within these large regional
clusters with a higher resolution level to identify the PPC. The primary calibration step used here
involves setting the resolution parameter. All modularity maximizing approaches (including the GirvanNewman edge-betweenness algorithm) must set a resolution parameter that governs the size of the
clusters ultimately identified. This is a fundamental feature of all network clustering routines, analogous
to choosing an alpha level for statistical tests or a factor loading cutoff in scale construction. We can use
this to our advantage by searching over a wide range of values and selecting a value that returns
consistently reasonable results. Result stability over a range of parameters indicates an underlying reality
of the groups found. While computationally intensive, this provides a grounded way to choose a
parameter. We evaluated a sweep of solutions with resolutions in the range of 0.5 to 2.0 (evaluated in .25
step intervals) with respect to overall clustering, group size (relative to a target median size of between
100 and 150 and maximum less than 1000) and the cluster stability (measured as the adjusted rand index
for multiple runs). As a general guide, we computed a composite fit score based on these four values and
compared the distribution of fit across the range of resolution values. We repeated this in each state and
selected the single resolution value that seemed most robust across all states/years, which produced a
consensus resolution parameter value of 1.25.
The effect of this calibration step is fairly
direct: had we selected a markedly lower resolution
value we would have identified a smaller number of
larger PPCs (this would have had most of its effect on
the maximum group sizes observed) while selecting a
markedly higher resolution value would have generated
a larger number of smaller groups. While the Blondel
algorithm has proved fast and accurate in very large
networks, we employed a final node-level reassignment
sweep to ensure that nodes were placed in the group
where the majority of their neighbors resides3 and
affects fewer than 1% of nodes.
There are usually many PPCs within the same
local area serving the same hospitals. The Figure at
right highlights two small PPCs that both admit patients
to one hospital in Pennsylvania (both PPCs see patients from other hospitals as well; placement is
approximate based on jittered zip code centroid). The mixing matrix is typical of geographically
proximate PPCs: while there are shared patients across PPCs, the rate of patient sharing is many times
greater within PPCs than between (roughly 5 times higher in this example), and only the strongest ties fall
within PPCs (darker edges in panel c).