Statistical Network Approach to Patent Citation

PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Materials for IPSC Presentation Aug. 11-12, 2006
Katherine J. Strandburg
Kinetics of the Patent Citation Network: A Physics Approach to Understanding the
Patent System
I have not yet prepared a law review style article about this research. For purposes of this
presentation, I have attached:
1. An NSF proposal that gives results of the research so far and some ideas for future
work. (Non-technical)
2. A draft paper that we will be submitting to the Proceedings of the National Academy
of Sciences (fairly technical).
I am very interested in your reactions to and suggestions for this work!
1
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Collaborative Research:
A Statistical Physics Approach to Patent Citation Networks
A Proposal to the National Science Foundation
Principal Investigators: Katherine J. Strandburg, Jan Tobochnik, Peter Erdi
1.
Introduction
We propose to apply network analysis approaches from statistical physics to a
study of the patent citation network. The goal of our study is to gain a better
understanding of the structure and dynamics of the United States patent system and to
investigate the implications of current patenting behavior for innovation policy and law.
To our knowledge, our study is the first to address the current groundswell of concern
about the effects of increased patenting on innovation by combining the recent
availability of extensive and detailed computerized data about United States patents with
recent advances in the statistical physics study of complex networks. The project will not
only advance our understanding of the patent system -- a matter of critical significance
for the long term health of United States innovation and economic growth -- but will also
help to demonstrate the important role that network theory and other techniques of
statistical physics might play in furthering the interdisciplinary approach to legal theory
that is exemplified by the extremely influential law and economics line of attack.
The study proposed here thus stands at a particularly propitious crossroads of
societal and technical events: recently available data, recently developed methodology,
and currently pressing legal concerns. The principal investigators are also singularly
equipped for this research: Dr. Strandburg is a patent law professor and former litigator
with a background as a statistical physicist performing numerical investigations of critical
phenomena; Dr. Tobochnik is a statistical physicist presently working in the area of
network theory and an expert on the application of numerical techniques to statistical
physics problems; and Dr. Erdi is an expert on networks and complex systems.
2.
Background
Recent years have seen a major upsurge in patenting, an expansion of the range of
innovations which are eligible for patent protection, and a perception that the United
States economy relies more and more heavily on knowledge and innovation for its
success. (See, e.g., Merrill, 2004; Federal Trade Commission, 2003; Cohen, 2003) At
the same time, developments in the law, including the establishment of a single appellate
court that hears the vast majority of patent appeals in the United States have led to
suggestions that, in one way or another, the legal system is becoming increasingly
“patent-friendly;” that patents are being issued for lower quality innovations; and that the
legal rights awarded to patentees are becoming stronger. (For discussion of these issues
see, e.g., Allison, 2002; Burk, 2003; Burk, 2005; Cohen, 2003; Dreyfuss, 1989; Dreyfuss,
2004; Federal Trade Commission, 2003; Hall, 2005; Heller, 1998; Jaffe, 2004; Landes,
2004; Litmann, 1997; Lunney, 2000/2001; Merges, 1990; Merges, 2000; Merrill, 2004;
Rai, 2003; Shapiro, 2001; Thomas, 2003; Turner, 2005.) Along with empirical evidence
that increasingly calls into question the extent to which patents actually provide
incentives for research and development, (see, e.g., Bessen, 2005; Cohen, 2003; Hall,
2
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
2001; Heald, 2005; Long, 2002; Malchup, 1958; Wagner, 2005), these trends have
converged to raise growing concerns among academics and policymakers about whether
patent law and policy are adequately designed to “promote the Progress of . . . useful
Arts.” U.S. Const., Art. I, sec. 8. To accomplish this constitutionally mandated
objective, the benefits of patent protection must be carefully balanced against the costs
that patents impose on attempts to improve upon existing technology.
United States patents are issued by the Patent and Trademark Office (USPTO)
after applications are examined to determine, among other things, whether the patent
“claims” meet the legal requirements of novelty and non-obviousness. (Patent claims are
specific statements of the scope of the legal coverage of a patent.) In the course of this
examination, claims are compared against potential “prior art,” consisting in large part of
prior patents and other publications in relevant technical fields. See 35 U.S.C. §§ 102,
103, and 112 for the main statutory requirements for patentability; see also the USPTO
Manual of Patent Examining Procedures. Potential prior art is identified by applicants
and their patent attorneys and by the official patent examiners. References relevant to the
examination are cited in the issued patent document. See MPEP § 707.05.
Historically, empirical analyses of the effects of patent protection have been very
difficult historically due to a lack of usable data. Though important information about
the economic value of patents (such as most records of royalties and licensing) is
proprietary, in some respects the United States patent system is extremely transparent.
Complete records of patent prosecution, patent citations, and patent litigation over many
years are publicly available. In the past, however, the form in which the data were
available made quantitative analysis a difficult and painstaking process. (For examples
of past uses of patent citation data and other patent statistics, see Griliches, 1990;
Trajtenberg, 1990.) Recent advances in computer technology, along with the efforts of
empirical economists (see especially, Hall, 2003), have rendered a wealth of publicly
available patent data suitable for large-scale quantitative analysis. A surge of attempts to
capitalize on this new availability by economists and economically-inclined legal
academics has resulted. (See, e.g., Allison, 2004; Bessen, 2005; Duguet, 2005;
Hagedoorn, 2003; Hall, 2005; Harhoff, 1998; Harhoff, 2003; Jaffe, 2003; Lanjouw, 2004;
Marco, 2005; Maurseth, 2005; Moore, 2005; Ziedonis, 2004.) This work has applied
econometric techniques to connect patent characteristics -- such as the number of
citations made and received, the number of patent claims and whether the patent was
renewed; inventor characteristics -- such as geographical location and employment
context; and, most recently, patent examiner characteristics -- such as length of
experience -- to financial, economic and technological indicators. In this way, patent data
has been used to investigate knowledge flows and spillovers; to attempt to determine the
characteristics of valuable patents; and to try to understand the role innovation plays in
the behavior of various types of institutions, from universities to small and large firms.
Our approach is different from, but complementary to, these econometric studies.
We apply techniques from statistical physics to the analysis of patent citation data. Our
proposed study is particularly timely and promising for several reasons. First, one way to
view the patent system is as a network of nodes (patents) connected by citation links.
While the precise interpretation of a particular citation of one patent by a later patent
varies, on average citations convey valuable information about the relationships between
the technologies covered by the citing and cited patent. Current econometric approaches
3
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
probe the connections between citation counts and various quantities of interest, such as
probability of licensing, probability of renewal, and probability of litigation. However,
because of its connectivity, the network of patents and citations is a much richer source of
information about the patent system and the associated technological development than is
captured by these techniques. Recently, statistical physicists have undertaken the study
of a wide variety of networks with great vigor. (See, e.g., Albert, 2002; Barabasi, 2002;
Barabasi, 1999; Dorogovtsev, 1999; Dorogovtsev, 2003; Klemm, 2002; Krapivsky, 2000;
Newman, 2003; Pastor-Satorras, 2004; Redner, 2005; Watts, 1998; Watts, 2002; Zhu,
2003.) These studies have looked at the structural properties of networks and the
dynamics of their growth both empirically and theoretically, cataloguing and exploring
the similarities and differences between various types of networks. Applying network
analysis to the patent citation network has great potential both to complement existing
econometric studies of patent citations and to advance the understanding of statistical
networks in general because of the fact that the patent citation network is one of the
largest and most completely characterized networks available for study.
A second reason to take a statistical physics approach to analyzing the patent
system is the preponderance of highly skewed, power-law-like statistical distributions
that characterize both the citation network and other measures of patent system attributes,
such as the distribution of innovation values. (See, e.g., Astebro, 2003; Lemley, 2005;
Marsili, 2005; Scherer, 2000a; Scherer, 2000b.) Statistical physics, as a discipline, has a
long history of dealing with such highly nonlinear behaviors in the contexts of critical
phenomena and self-organized criticality, (see, e.g., Bak, 1999; Hohenberg, 1977; Jensen,
1998; Ma, 2000; Stanley, 1971; Stanley, 1999), and has developed both intuitions and
analytical techniques tailored to such phenomena that may be extremely useful in
understanding the patent system. To the extent that the law must deal with skewed and
heavy-tailed probability distributions in other contexts, this particular application of
insights from physics may prove exemplary.
Finally, our work is part of a broader movement attempting to apply physics-style
analysis to economic questions. Such approaches have begun to bear fruit in the study of
finance, but application of physics-like analysis to the study of innovation is in its
infancy. (See, e.g., Chakrabarti, 2005; Cowan, 2004; Frenken, 2005; Mandelbrot, 2004;
Mantegna, 1999; Silverberg, 2005; Wille, 2002.)
3.
Initial Results
We have already begun the studies proposed here and achieved interesting results.
After a preliminary investigation of some basic network properties, our initial study has
focused on a kinetic description of the growing patent network. A short technical paper
outlining the results of this initial work has been submitted for publication. Preliminary
results were presented at the March 2005 meeting of the American Physical Society and
at the 2005 Intellectual Property Scholars Conference at Cardozo Law School. In the
analysis we have used the Hall, Jaffe, and Trajtenberg dataset (Hall, 2003) which
includes the approximately 16 million citations made by the more than two million
patents issued by the USPTO between 1975 and 1999.
Our results show that, to a good approximation, the growth of the patent system
can be modeled by a random attachment model in which the probability that an existing
4
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
patent will be the target of a particular citation is given by a separable function of its
current number of citations, its age, and the time at which it issued. Thus, P(l,k,t) 
A(k,l)/S(t), where l is the age of the patent as expressed in patent numbers, k is its current
number of citations received (also called in the literature “forward citations”), and t is the
time of issue, also expressed in patent numbers. S(t) is an overall scale factor determined
by the sum of A(k,l) over all existing patents at time t.
To analyze the data we have developed a novel iterative technique which we have
used to extract A(k,l) and S(t) from the citation data. We do not assume a particular
functional form for these functions, but find empirically that A(k,l) is approximately
given by a product of two functions, which we denote Ak(k) and Al(l). See Figures 1 and
2. Each of these functions exhibits a power-law decay for large values of its argument.
3.1
Dependence on Citations Received
The dependence on citations received, Ak(k) ~ k, where   1.19, indicates that
the patent system exhibits the well-known “preferential attachment” phenomenon in
which highly cited patents are more likely to be cited in the future. From network
studies, this type of kinetics is widely understood to produce a highly skewed node
degree distribution with a relatively long tail (see Albert, 2002; Newman, 2003)
(corresponding in our case to the well-known skewed distribution of citations received
per patent). When  = 1, preferential attachment can lead to a power law, or Pareto, node
degree distribution. Theoretical models suggest that growing networks with super-linear
preferential attachment ( > 1) undergo a “condensation” in which nearly all nodes in the
system have very low connectivity and are connected to a small (finite even in an infinite
network) number of very highly connected nodes. (Krapivsky, 2000.) The fact that  >
1 in the patent network suggests that citations are highly concentrated -- in other words, a
relatively small number of patents receive most of the citations. In the patent system, the
accumulation of citations by highly cited nodes is mitigated by a decay of citation
probability with age, so the sharp condensation will probably not occur, but the relatively
high value of  still means a highly unequal distribution of patent “citability.”
3.2
Dependence on Patent Age
As noted, the age dependence of the citation probability is approximately
independent of k, especially at large l and k, as shown in Figure 2 for several values of k.
The observed Al(l) has a peak for relatively small l and decays as l- for large l, where  
1.6. This observed aging function highlights the complicated dynamics of technical
innovation. Since, presumably, the citation probability function reflects underlying
substantive differences between patents, one way to interpret the functional form of Al(l)
is that there are “typical” patents, which are most likely to be cited relatively soon after
they issue, along with a long tail of “pioneer”-type patents that continue to influence
innovation over long periods of time.
The age dependence appears to be described most consistently by using the patent
number as a measure of age. Given that patents have been issuing at a rapidly
accelerating rate, (see, e.g., Jaffe, 2003), the fact that patent number provides a
meaningful measure of age is significant, suggesting, for example, that the twenty-year
5
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
patent term is becoming effectively longer as measured by the pace of innovation.
Similarly, the peak age for citation is approximately 200,000 patent numbers,
corresponding in 1999 to just over a year. The accelerating issuance of patents means
that this peak age is becoming shorter and shorter in real time. Nonetheless, though the
peak age for citation is relatively short (and getting shorter), patents have a nonnegligible chance of being cited at any age. Moreover, while highly cited patents are
more likely to be cited, the form of Al(l) is relatively independent of in-degree, i.e.,
citation probability decays slowly even for low k. This means that, at least on average,
patents are never entirely “dead.” There are “sleeper” patents that go without citation for
long periods of time, only to re-emerge at a later time. If one makes the reasonable
assumption that receiving a citation is some indication of social value, this observation
suggests the difficulty of predicting the social value of innovations in the long term.
3.3
Time Dependence
3.3.1 Time Dependence of Scale Factor S(t)
Independently of the approximate decomposition of citation probability into
separate k and l dependent components, our analysis yields a time-dependent scale factor,
S(t), which is normalized such that the time-dependent probability that a brand new
patent (with k = 0 and l = 1) will be cited by a given citation is 1/S(t). (See Figure 3.)
Over time, this scale factor increases (and the probability that a new patent will be the
target of any particular citation decreases) because the number of patents increases while
older patents continue to be citable. This increase dilutes the chance that a given patent
will receive any particular citation. However, we find that this decay has been
compensated by an increasing number of citations made by each patent so that the
probability of a brand new patent being cited by a given patent has been slowly growing
over time -- patents do not become “lost in the crowd” of new patents. (See Figure 4.)
These results might suggest that, despite the notoriety of dubious patents on
crustless peanut butter and jelly sandwiches and so forth, (see, e.g., Crouch, 2005;
Varian, 2005) there has not been an average increase of truly “outlier” patents that are far
beyond the technological mainstream. In other words, patent applicants and examiners
have on the average felt that the number of “material” patents to be cited by any given
patent has been increasing at least as fast as the number of patents. The observed time
dependence is consistent, however, with a picture of increasing patent “density”, in which
patents constitute smaller and smaller technical advances, leading to a crowded “thicket”
of closely related patents, all of which must be cited by each new patent in the vicinity,
even though none may gather citations for long. The interpretation of these average
results is complicated, however, by the highly skewed distributions of citations made and
received and by the possibility that citation practices may have changed over time due,
for example, to the increasing availability of electronically searchable databases. Thus, a
more detailed understanding of the behavior underlying the average increase in the
probability of being cited by a given patent is part of our ongoing research.
3.3.2 Time Dependence of Power Law Decay
6
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
While it is well known that there has been a significant increase in the number of
US patents granted per year since 1984 (see, e.g., Jaffe, 2003), the underlying reason for
this increase is not clear. Has there simply been an acceleration of technological
development in the last twenty years or has there been a more fundamental change in the
patent system, perhaps as a result of increased leniency in the legal standard for obtaining
a patent? Our study may provide some insight into this question. Our kinetic model
permits us to measure whether there has been any deep change in the growth kinetics of
the patent citation network. Because we measure time in units of patent number, a mere
acceleration of technological progress should leave A(k,l) unchanged in patent number
“time.” A change in A(k,l) over time would indicate that something more is going on.
To begin to investigate the possible time dependence of A(k,l), we allowed  and
 to vary with time. To perform time-dependent fits, we averaged over a 500,000-patent
sliding time window and calculated the parameters after every 100,000 patents. There is
a significant variation in over time. No significant time dependence was observed in 
to within the statistical errors. There are two regimes for . Prior to about 1991, 
decreases with time, while there is a significant increase afterward. As noted earlier, the
 parameter has some very important consequences for the growth of the network: the
higher , the more “condensed” the network will be. An increasing  indicates increasing
stratification -- a smaller and smaller fraction of the patents is receiving a larger and
larger fraction of the citations. The turn-around from decreasing to increasing  suggests
a more fundamental change in the characteristics of patents that are being issued than a
simple acceleration of technological progress.
4.
Proposed Plan of Research
In the near term, we plan to focus on three specific directions: 1) further research
on patent citation network growth kinetics and dynamics, focusing on possible time
dependence of the growth behavior and on investigating variation with category of
technology; 2) use of path length measures from statistical network theory to investigate
relationships between different technical areas, particularly focusing on whether the
currently used patent classification schemes reflect the observed relationships between
patented innovations; and 3) use of clustering coefficients and generalized correlation
functions to develop new measures of patent “generality” and “originality,” (discussed
below) that do not rely on patent office classification schemes and to explore whether
there are different structural “signatures” for citations that reflect complementary and
substitute technologies.
4.1
Citation Network Growth Kinetics
4.1.1. Technological Differences in Patent System Kinetics
We plan to investigate the behavior of A(k,l) for particular subsets of patents. In
our initial study we will determine the behavior of Al(l) and Ak(k) for six different
technology classes defined by Hall, Jaffe and Trajtenberg based on a classification
scheme employed by the USPTO. (Hall, 2003) Very preliminary results suggest that the
decay exponents vary from class to class in an interesting way. In particular, the category
7
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
of “Computers and Communication” seems to have a larger value of  than the others. A
larger value of  for this category is consistent with the general intuition that the
computer industry is “fast-paced.” We plan to pursue a more detailed investigation of the
technology dependence of A(k,l). If the apparent distinctiveness of the Computers and
Communications category holds up upon further study, we would like to perform a more
thorough investigation with the goal of exploring the behavior of software patents.
Software patents have been the subject of considerable controversy and analysis, (see,
e.g., Bessen, 2003; Burk, 2003; Graham, 2003; Lunney, 2001/2001; Mann, 2005) and the
proposed research should contribute valuable insights to this debate. We also plan to
explore possible variations in time dependence among technological categories.
Depending on data availability, it may also be possible in future work to compare the
time dependent evolution of the United States patent network with others internationally.
4.1.2 Modeling the Underlying “Dynamics” of the Patent System
So far we have only a probabilistic understanding of the evolution of the network.
Especially because citations are contributed by inventors, their attorneys, and patent
examiners, and because patents clearly have quite heterogeneous intrinsic technical merit,
it is not at all obvious how the observed dependence on number of citations previously
received and on patent age arises from the “microscopic mechanisms” of patent citation.
Indeed, it is quite interesting -- and perhaps surprising -- that a very recent extensive
study of physics journal citations has uncovered rather similar kinetics despite the very
different processes by which journal and patent citations are made. (Redner, 2005.) We
also have no theoretical explanation at present for the degree distribution for citations
made, which, according to our initial studies, has a power law tail unlike the previously
reported exponentially decaying distributions of citations made for scientific journals.
(Vazquez, 2001.) Since there is no obvious “preferential attachment” or multiplicative
mechanism to explain this power law distribution (citations are “made” in one fell
swoop), the observed distribution of citations made is surprising.
One way to attempt to gain some understanding of these underlying mechanisms
is through simulations of “microscopic” models that incorporate factors such as
heterogeneous patent quality. As our understanding of the network improves based upon
the studies already described we hope to develop models that will provide more
“microscopic” understanding of the evolution of the patent citation network.
4.2
Statistical Network Measures of Technological Relationships
Patent citations are a source of information about the technological relationships
between patents and may serve, indirectly, as a source of insight into technological
relationships more generally. Previous economic studies have used patent citation data as
a proxy for knowledge flow. (See, e.g., Jaffe, 2003.) Because many patent citations are
provided not by patent applicants, but by their attorneys and by patent examiners, the
usefulness of patent citations for this purpose must be carefully assessed. Data about
which citations have been provided by patent examiners and which by patentees have
only recently been made available by the USPTO. Regardless of the extent to which
citations reflect knowledge flows, citations do reflect technological relationships. The
8
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
techniques of statistical network theory, which are focused on understanding the complex
structure of networks, should be useful in exploring these relationships.
One particularly promising direction for such exploration is the potential to
compare technological relationships embedded in the citation network with the technical
categorization of patents by patent examiners. Each patent application, when received by
the USPTO, is categorized according to a classification scheme which has been
developed in an ad hoc manner over the years to assist examiners and patent applicants in
searching for relevant prior art. The examiner-assigned patent classifications have also
been used by scholars of the patent system to assess such things as the “generality” of a
patent (evaluated by the extent to which it is cited by patents from different technological
classes), the “originality” of a patent (evaluated by the extent to which it cites patents
from different technological classes) (Hall 2003), and the “technological closeness” of
firms involved in patent litigation (evaluated by the extent to which the patents assigned
to those firms are in the same technological classes). These quantities have limited
explanatory power, however. For example, one recent study determined that patent
litigation often involves firms that do not have significant “technological closeness” as
measured by the classifications. (Meurer, 2005.)
Statistical network analysis provides another way to measure technological
relationships, which can be used both to evaluate the existing technological
classifications and as an alternative to measures based on those classifications. Of
course, the citation network is not entirely independent of the classification scheme, since
both examiners and applicants derive some of their citations from searches based upon it.
However, the universe of citations is broader than those found using the classification
scheme. Moreover, as explained in somewhat more detail below, statistical network
theory provides a metric for measuring relationships between patents that is more finegrained and quantitative than comparing classifications. We propose two specific
projects that will use citation data to assess technological relationships. The first will use
path length measures from statistical network theory to measure technological closeness.
The second will build on the studies of technological closeness to develop and apply
network clustering algorithms to the patent citation network.
4.2.1 Network Path Length as a Measure of Technological Closeness
Physicists studying the properties of networks have employed various measures of
network distance based essentially on counting the number of steps or “links” between
network nodes. (See, e.g., Albert, 2002; Newman, 2003.) Thus, one may define the
average path length between two nodes, the shortest path length between two nodes, and
distributions of path lengths between nodes. In the patent citation network, one may use
these measures to explore the “technological distance” between patented technologies.
Direct citations will on average indicate closer technical relationships than second, third,
and later generation citations. (Of course, not all citations indicate equal technical
relationship and it is certainly possible to imagine “weighting” different citations
according to other properties, ranging from numbers of citations in common between
citing and cited patents to textual overlap of the underlying patent documents. For
numerical tractability and conceptual simplicity, we propose to begin with a simple
treatment in which each link is given equal weight.)
9
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
As an initial matter, we have measured the “size” of the patent citation network
by looking at the shortest path lengths (sometimes called “geodesic distances”) between
patent nodes. We determined that the network contains one “giant” connected
component, and some very small components. Thus, there is a “small world effect”
indicating that most patents are connected to most other patents by a relatively small
number of steps. (Albert, 2002; Newman, 2003.) The longest path length between nodes
in the “giant” component (of nearly 4 million patents) is only 24 steps. The “technology
space” encompassed by the patent system is perhaps surprisingly closely connected. This
calculation will be expanded to study the distribution of path lengths in more detail.
An understanding of how closely connected the patent space is could have
significant implications for the interpretation of important concepts in patent law. For
example, many legal questions depend on the concept of the “person having ordinary
skill in the art,” where the “art” is the technical field of the invention. These legal
inquiries assume that the “art” to which a patent pertains may be sensibly defined.
Studies of the connectedness of the patent network may provide insight into the extent to
which such an assumption is meaningful.
The network path length may also be used to evaluate and explore existing
classification schemes. Distances between patents that are classified in the same category
may be compared with distances between patents in different categories to evaluate the
accuracy of the classification systems. Moreover, average distances between different
categories can be computed. These path length measures may provide a more finegrained (and complementary) measure of technological closeness. They may also permit
generalized measures of generality and originality to be devised that account for the
citation distance between different categories.
4.2.2 Network Clustering Approach to Patent Categorization
After exploring the path length statistics, we will attempt to apply network
clustering techniques to explore the structure of the patent citation network. The
development of numerical techniques to partition large complex networks into
meaningful “communities” is an evolving area of research. (See, e.g., Newman, 2004a;
Newman, 2004b; Palla, 2005; Song, 2005; Wu, 2004; Zhou, 2003.) Clustering is difficult
in such networks, in part because their large size necessitates efficient algorithms, but
more fundamentally because of the conceptual difficulty in defining meaningful clusters
in such highly connected and often hierarchically structured systems. The study of
clustering in the patent network may have practical implications for patent classification,
but, on a more fundamental level, the attempt to devise a sensible clustering approach
may yield insights into the structure of the underlying “technological space.”
4.3
Clustering Coefficients and Network Correlations and the Study of Patent Context
Among the fundamental tools of statistical network theory are clustering
coefficients (sometimes called “transitivity”) and degree-degree correlations. (Albert,
2002; Newman, 2003.) The basic clustering coefficient measures the likelihood that two
nodes that are connected to a specific third node are also connected to one another. We
can also imagine defining clustering coefficients for higher-order neighbors, e.g., the
10
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
likelihood that two nodes two “steps” away from a particular node are connected to one
another. The study of clustering coefficients and correlations in the patent network is
likely to provide interesting insights in its own right as well as to identify quantities that
may be useful inputs to more traditional econometric analysis. For example, clustering
coefficients are a promising alternative to measures of “generality” or “originality” based
on the patent classification systems. Like previously defined measures of generality and
originality, clustering coefficients are sensitive to the relationships between the patents
that cite or are cited by a particular patent. Moreover, clustering coefficients and
correlation functions are much more nuanced and fine-grained than those previously
defined measures and are less dependent on ad hoc classification schemes.
Because patent citation networks are “directed networks” (i.e., the citation link
between two patents has a defined direction), several distinct varieties of local clustering
coefficient may be defined. Thus, a highly cited patent with a small incidence of citation
among its nearest neighbors is likely to be more general than a highly cited patent with a
highly connected set of citing patents. We propose to define a “generality” clustering
coefficient by looking at the fraction of patents citing a given patent that also cite one
another. (See Fig. 5a) Similarly, a patent that cites a number of patents which tend to
cite one another may be less “original” than a patent that cites patents that do not tend to
cite one another. An “originality” clustering coefficient might be defined by looking at
the fraction of patents cited by a given patent that cite one another. (See Fig. 5b.)
Clustering coefficients may also be useful in distinguishing “lovely” citations (which cite
to an earlier patent because they build on its technology) from “dangerous” citations
(which cite to an earlier patent because they replace or distinguish its technology). (See
Maurseth, 2005, for a discussion of this distinction.) A patent that cites a patent which
makes a “lovely” citation to an earlier patent may be more likely also to cite the earlier
patent than a patent that cites a patent which makes a “dangerous” citation to the earlier
patent. (See Fig. 5c.) It is thus possible that clustering coefficients will be able to
provide insight into the extent to which patents in a given technical field tend to be
competing substitutes or “blocking patent” complements.
5.
Broader Impact of the Proposed Research
This project addresses a subject of great importance to the technical and economic
health of society -- the functioning of the patent system in promoting innovative activity.
The proposed research will help to provide a stronger empirical basis for the current
debate among legal academics and policymakers about the legal design of the system. In
addition, the research will contribute to understanding the fundamental science of
networks, a topic that has far-reaching implications in many areas of physical, biological,
and social science. Because the Principal Investigators for this proposal come from
diverse scholarly communities, we are well positioned to disseminate the results of this
research broadly. Our initial results have already been presented at conferences dealing
with physics, applied mathematics, law, and complexity theory; we anticipate continuing
to publish and present our results in broad-based fora and to facilitate inter-disciplinary
communication on subjects related to the research. We also anticipate involving a wide
range of students -- including Ph.D. students from the KFKI Research Institute of the
11
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Hungarian Academy of Sciences, undergraduates from Kalamazoo College, and law
students from DePaul University College of Law, in the project.
12
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
Figure 1: Ak(k) as a function of in-degree, k, for various values of patent age, l.
13
12:33 AM
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Figure 2: Top: A(k,l) as a function of patent age l, in units of patent number, for various
values of in-degree, k. Bottom: The decaying portion of Al(l) as a function of patent age
l, for various values of k.
14
PLEASE DO NOT QUOTE OR CITE!!
Figure 3: 1/S(t) as a function of time in patent number units.
15
7/29/2017
12:33 AM
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Figure 4: Top: Average number of citations made per patent, E(t), as a function of time
in patent number units. Bottom: E(t)/S(t) (probability that a new patent will be cited by a
given patent) as a function of time in patent number units.
16
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
(a)
(b)
(c)
Figure 5: Illustrations of definitions of clustering coefficients relevant to (a) generality;
(b) originality; (c) “loveliness” or “dangerousness” of citation from B to C
17
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
REFERENCES
Albert, Reka and Albert-Laszlo Barabasi, Statistical Mechanics of Complex Networks, 74
REVIEWS OF MODERN PHYSICS 47 (2002)
Allison, John R. and Mark A. Lemley, The Growing Complexity of the United States
Patent System, 82 BOSTON UNIVERSITY LAW REVIEW 77 (2002)
Allison, John R., Mark A. Lemley, Kimberly A. Moore and R. Derek Trunkey, Valuable
Patents, 92 GEORGETOWN LAW JOURNAL 435 (2004)
Astebro, Thomas, The Return to Independent Invention: Evidence of Unrealistic
Optimism, Risk Seeking or Skewness Loving?, 113 THE ECONOMIC JOURNAL 226-39
(2003)
Bak, Per, HOW NATURE WORKS: THE SCIENCE OF SELF-ORGANIZED CRITICALITY,
Springer-Verlag Telos (1999)
Barabasi, Albert-Laszlo and Reka Albert, Emergence of Scaling in Random Networks,
286 SCIENCE 509-512 (1999)
Barabasi, Albert-Laszlo, LINKED: THE NEW SCIENCE OF NETWORKS, Perseus Publishing
(2002)
Bessen, James and Robert M. Hunt, The Software Patent Experiment, PROCEEDINGS OF
OECD CONFERENCE ON PATENTS, INNOVATION AND ECONOMIC PERFORMANCE, OECD,
(April 2003)
Bessen, James and Michael J. Meurer, Lessons For Patent Policy From Empirical
Research On Patent Litigation, 9 LEWIS & CLARK LAW REVIEW 1 (2005)
Burk, Dan L. and Mark A. Lemley, Policy Levers in Patent Law, 79 VIRGINIA LAW
REVIEW 101 (2003).
Burk, Dan L. and Mark A. Lemley, Designing Optimal Software Patents, in
INTELLECTUAL PROPERTY RIGHTS IN FRONTIER INDUSTRIES: SOFTWARE AND
BIOTECHNOLOGY, Robert Hahn, ed., (2005)
Chakrabarti, Bikas K., Arnab Chatterjee, and Sudhakar Yarlagadda, eds., ECONOPHYSICS
OF WEALTH DISTRIBUTIONS (NEW ECONOMIC WINDOWS), Springer (forthcoming 2005)
Cohen, Wesley M. and Stephen A. Merrill, Eds., PATENTS IN THE KNOWLEDGE-BASED
ECONOMY, National Research Council of the National Academies (2003)
18
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Cowan, Robin, Network Models of Innovation and Knowledge Diffusion, Technical
Report 016, Maastricht: MERIT, Maastricht Economic Research Institute on Innovation
and Technology (2004), available at http://ideas.repec.org/p/dgr/umamer/2004016.html
Crouch, Dennis, Children Rejoice -- Peanut Butter and Jelly Patent Rejected on Appeal,
Patently-O Blog,http://patentlaw.typepad.com (April 8, 2005)
Dorogovtsev, S. N. and J. F. F. Mendes, Evolution of Networks with Aging of Sites, 62
PHYSICAL REVIEW E 1842-1845 (2000)
Dorogovtsev, S. N. and J. F. F. Mendes EVOLUTION OF NETWORKS. FROM BIOLOGICAL
NETS TO THE INTERNET AND WWW, Oxford University Press (2003)
Dreyfuss, Rochelle Cooper, The Federal Circuit: A Case Study in Specialized Courts, 64
New York University Law Revew 1 (1989)
Dreyfuss, Rochelle Cooper, The Federal Circuit: A Continuing Experiment in
Specialization, 54 CASE WESTERN RESERVE LAW REVIEW 769 (2004)
Duguet, Emmanuel and Megan MacGarvie, How Well Do Patent Citations Measure
Flows of Technology? Evidence from French Innovation Surveys, 14 ECONOMICS OF
INNOVATION AND NEW TECHNOLOGY 374-93 (2005)
Federal Trade Commission, To Promote Innovation: The Proper Balance of Competition
and Patent Law and Policy, Report (October 2003)
Frenken, Koen, Technological Innovation and Complexity Theory, ECONOMICS OF
INNOVATION AND NEW TECHNOLOGY, (forthcoming 2005)
Graham, Stuart J. H. and David C. Mowery, Intellectual Property Protection in the U.S.
Software Industry, in Cohen, Wesley M. and Stephen A. Merrill, eds., PATENTS IN THE
KNOWLEDGE-BASED ECONOMY, National Research Council of the National Academies
(2003)
Griliches, Zvi, Patent Statistics as Economic Indicators: A Survey, 28 JOURNAL OF
ECONOMIC LITERATURE 1661-1707 (1990)
Hagedoorn, John and Myriam Cloodt, Measuring Innovative Performance: Is There an
Advantage in Using Multiple Indicators?, 32 RESEARCH POLICY 1365-1379 (2003)
Hall, Bronwyn & Rosemarie Ziedonis, The Patent Paradox Revisited: An Empirical
Study of Patenting in the U.S. Semiconductor Industry, 1979-1995, 32 RAND JOURNAL
OF ECONOMICS 101 (2001)
Hall, Bronwyn H., Adam B. Jaffe and Manuel Trajtenberg, The NBER Patent Citation
Data File: Lessons, Insights and Methodological Tools, in Adam B. Jaffe and Manuel
19
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Trajtenberg, eds., PATENTS, CITATIONS, & INNOVATIONS: A WINDOW ON THE
KNOWLEDGE ECONOMY, MIT Press (2003)
Hall, Bronwyn H., Exploring the Patent Explosion, 30 JOURNAL OF TECHNOLOGY
TRANSFER 35-48 (2005)
Harhoff, D., F. Narin, F.M. Scherer, and K. Vopel, Citation Frequency and the Value of
Patented Inventions, 81 REVIEW OF ECONOMICS AND STATISTICS 511-15 (1998)
Harhoff, Dietmar and Scherer, Frederic M. & Vopel, Katrin, Citations, Family Size,
Opposition and the Value of Patent Rights, 32 RESEARCH POLICY 1343-63, Elsevier
(2003).
Heald, Paul J., A Transaction Costs Theory of Patent Law, 66 OHIO STATE LAW JOURNAL
___ (forthcoming 2005)
Heller, Michael and Rebecca S. Eisenberg, Can Patents Deter Innovation? The
Anticommons in Biomedical Research, 280 SCIENCE 698-701 (May 1998)
Hohenberg, P.C. and B. I. Halperin, Theory of Dynamic Critical Phenomena, 49
REVIEWS OF MODERN PHYSICS 435 (1977)
Jaffe, Adam B. and Manuel Trajtenberg, PATENTS, CITATIONS, & INNOVATIONS: A
WINDOW ON THE KNOWLEDGE ECONOMY, MIT Press (2003)
Jaffe, Adam B. and Josh Lerner, INNOVATION AND ITS DISCONTENTS: HOW OUR BROKEN
PATENT SYSTEM IS ENDANGERING INNOVATION AND PROGRESS AND WHAT TO DO ABOUT
IT, Princeton University Press (2004)
Jensen, Henrik Jeldtoft, SELF-ORGANIZED CRITICALITY: EMERGENT COMPLEX BEHAVIOR
IN PHYSICAL AND BIOLOGICAL SYSTEMS, Cambridge Lecture Notes in Physics (No. 10)
(1998)
Klemm, Konstantin and Victor M. Eguiluz, Highly Clustered Scale-Free Networks, 65
PHYSICAL REVIEW E 036123 (2002)
Krapivsky, P.L., S. Redner, and F. Leyvraz, Connectivity of Growing Random Networks,
85 PHYSICAL REVIEW LETTERS 4629-32 (2000)
Landes, William M. and Richard A. Posner, An Empirical Analysis of the Patent Court,
71 UNIVERSITY OF CHICAGO LAW REVIEW 111 (2004)
Lanjouw, Jean O. and Mark Schankerman, Patent Quality And Research Productivity:
Measuring Innovation With Multiple Indicators, 114 ECONOMIC JOURNAL 441–65, Royal
Economic Society (2004)
20
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Lemley, Mark A. and Carl Shapiro, Probabilistic Patents, 19 JOURNAL OF ECONOMIC
PERSPECTIVES 75-98 (2005)
Litmann, Allan N., Restoring the Balance of Our Patent System, 37 IDEA 545 (1997)
Long, Clarisa, Patent Signals, 69 UNIVERSITY OF CHICAGO LAW REVIEW 625 (2002)
Lunney, Glynn S., Jr., E-Commerce And Equivalence: Definining The Proper Scope Of
Internet Patents Symposium: E-Obviousness, 7 MICHIGAN TELECOMMUNICATIONS AND
TECHNOLOGY LAW REVIEW 363 (2000 / 2001)
Ma, Shanggeng and Shang-Keng Ma, MODERN THEORY OF CRITICAL PHENOMENA,
Perseus (2000)
Malchup, Fritz, An Economic Review of the Patent System, Study No. 15 of the
Subcomm. On Patents , Trademarks, and Copyrights of the Committee on the Judiciary,
85th Cong., 2d sess. (1958)
Mandelbrot, Benoit, and Richard L. Hudson, THE (MIS)BEHAVIOR OF MARKETS, Basic
Books (2004)
Mann, Ronald J., The Myth of the Software Patent Thicket: An Empirical Investigation of
the Relationship Between Intellectual Property and Innovation in Software Firms, 83
TEXAS LAW REVIEW 961 (2005)
Mantegna, Rosario N. and H. Eugene Stanley, AN INTRODUCTION TO ECONOPHYSICS:
CORRELATIONS AND COMPLEXITY IN FINANCE, Cambridge University Press (1999)
Marco, Alan C., The Option Value of Patent Litigation: Theory and Evidence, REVIEW
OF FINANCIAL ECONOMICS (forthcoming 2005), Vassar College Department of Economics
Working Paper Series No. 52
Marsili, Orietta and Ammon Salter, ‘Inequality” of Innovation: Skewed Distributions
and the Returns to Innovation in Dutch Manufacturing, 14 ECONOMICS OF INNOVATION
AND NEW TECHNOLOGIES 83-102 (2005)
Maurseth, Per Botolf, Lovely but Dangerous: The Impact of Patent Citations on Patent
Renewal, 14 ECONOMICS OF INNOVATION AND NEW TECHNOLOGIES 351-74 (2005)
Merges, Robert P. and Richard R. Nelson, On the Complex Economics of Patent Scope,
90 COLUMBIA LAW REVIEW 839 (1990)
Merges, Robert P., One Hundred Years of Solicitude: Intellectual Property Law, 19002000, 88 CALIFORNIA LAW REVIEW 2187 (2000)
Merrill, Stephen A., Richard C. Levin, and Mark B. Myers, eds., A PATENT SYSTEM FOR
THE 21ST CENTURY, National Research Council of the National Academies (2004)
21
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Meurer, Michael and James Bessen, The Patent Litigation Explosion, Presentation at the
Annual Meeting of the American Law and Economics Association (May 6-7, 2005)
Moore, Kimberly A., Worthless Patents, 20 BERKELEY TECHNOLOGY LAW JOURNAL
(forthcoming 2005)
Newman, M. E. J., The Structure and Function of Complex Networks, 45 SIAM REVIEW
167-256 (2003)
Newman, M. E. J., Fast Algorithm for Detecting Community Structure in Networks, 69
PHYSICAL REVIEW E 066133 (2004)
Newman, M. E. J. and M. Girvan, Finding and Evaluating Community Structure in
Networks, 69 PHYSICAL REVIEW E 026113 (2004)
Palla, Gergely, Imre Der´enyi, Ill´es Farkas, and Tam´as Vicsek, Uncovering the
Overlapping Community Structure of Complex Networks in Nature and Society, 435
NATURE 814-18 (Jun 9, 2005)
Pastor-Satorras, Romualdo and Alessandro Vespignani, EVOLUTION AND STRUCTURE OF
THE INTERNET : A STATISTICAL PHYSICS APPROACH, Cambridge University Press (2004)
Rai , Arti K., Engaging Facts And Policy: A Multi-Institutional Approach To Patent
System Reform, 103 COLUMBIA LAW REVIEW 1035 (2003)
Redner, S., Citation Statistics from 110 Years of Physical Review, 58 PHYSICS TODAY 49
(2005)
Scherer, F.M. and Dietmar Harhoff, Technology Policy for a World of Skew-Distributed
Outcomes, 29 RESEARCH POLICY 559-66 (2000)
Scherer, F. M., Dietmar Harhoff, and Jorg Kukies, Uncertainty and the Size Distribution
of Rewards from Innovation, 10 JOURNAL OF EVOLUTIONARY ECONOMICS 175-200 (2000)
Shapiro, Carl, Navigating the Patent Thicket: Cross Licenses, Patent Pools, and
Standard-Setting in INNOVATION POLICY AND THE ECONOMY, VOLUME I, Adam Jaffe,
Joshua Lerner, and Scott Stern, eds., MIT Press (2001)
Silverberg, Gerald and Bart Verspagen, A Percolation Model of Innovation in Complex
Technology Spaces, 29 JOURNAL OF ECONOMIC DYNAMICS AND CONTROL 225-44 (2005)
Song, Chaoming, Shlomo Havlin, and Hern´an A. Makse, Self-similarity of Complex
Networks, 433 NATURE 392-95 (Jan. 27, 2005)
Stanley, H. Eugene, INTRODUCTION TO PHASE TRANSITIONS AND CRITICAL PHENOMENA
(1971)
22
PLEASE DO NOT QUOTE OR CITE!!
7/29/2017
12:33 AM
Stanley, H. Eugene, Scaling, Universality, and Renormalization: Three Pillars of Modern
Critical Phenomena, 71 REVIEWS OF MODERN PHYSICS S358 (1999)
Thomas, John R., Formalism At The Federal Circuit, 52 AMERICAN UNIVERSITY LAW
REVIEW 771 (2003)
Trajtenberg, Manuel, A Penny for your Quotes: Patent Citations and the Value of
Innovations, 21 RAND JOURNAL OF ECONOMICS 172-87 (1990)
Turner, John L., In Defense of the Patent Friendly Court Hypothesis: Theory and
Evidence (under review 2005), http://www.terry.uga.edu/~jlturner/PatentFCH.pdf
Varian, Hal R., Patent Protection Gone Awry, NEW YORK TIMES, Economic Scene
(October 21, 2004)
Vazquez, Alexei, Citations Networks, http://arxiv.org/PS_cache/condmat/pdf/0105/0105031.pdf (2001)
Wagner, R. Polk, and Gideon Parchomovsky, Patent Portfolios, 154 UNIVERSITY OF
PENNSYLVANIA LAW REVIEW __ (forthcoming 2005)
Watts, Duncan J. and Steven H. Strogatz, Collective dynamics of small world networks,
393 NATURE 440- 42 (1998).
Watts, Duncan J., SIX DEGREES: THE SCIENCE OF A CONNECTED AGE, W.W. Norton &
Co. (2002)
Wille, Luc T., NEW DIRECTIONS IN STATISTICAL PHYSICS: ECONOPHYSICS,
BIOINFORMATICS, AND PATTERN RECOGNITION, Springer (2002)
Wu, Fang and Bernardo A. Huberman, Finding Communities in Linear Time: A Physics
Approach, 38 EUROPEAN PHYSICS JOURNAL B 331-38 (2004)
Zhu, Han, Xinran Wang, and Jian-Yang Zhu, Effect of Aging on Network Structure, 68
PHYSICAL REVIEW E 056121 (2003)
Zhou, Haijun, Distance, Dissimilarity Index, and Network Community Structure, 67
PHYSICAL REVIEW E 061901 (2003)
Ziedonis, Arvids and Bhaven N. Sampat, Patent Citations and the Economic Value of
Patents: A Preliminary Assessment, in HANDBOOK OF QUANTITATIVE SCIENCE AND
TECHNOLOGY RESEARCH, Henk Moed, Wolfgang Glänzel, and Ulrich Schmoch, eds.,
Kluwer Academic Publishers (2004).
23
PLEASE DO NOT QUOTE OR CITE!!
24
7/29/2017
12:33 AM