What is the Most Important Theorem?

What is the Most Important Theorem?
Andy J. Reagan, Christopher M. Danforth, & Peter Sheridan Dodds
Computational Story Lab, Department of Mathematics & Statistics, Vermont Complex Systems Center, & Vermont Advanced Computing Core
Abstract
By considering the difference between outgoing and incoming degrees, we can find the most
fundamental result (highest differential in outgoing and incoming degree, or size of out
component), and the most important or end of the road result (highest differential in incoming
and outgoing degrees, or size of in component). In Rudins text, the most fundamental result is
De Morgans Laws, and the most important result is Multivariate Change of Variables in
Integration Theorem (MCVIT, thats a mouthful).
Degree Distributions
Out Component Size Distribution
70
Zipfian Degree, Out Component Distribution
200
2.5
Out Component
Degree
60
150
50
40
Count
Count
2
30
20
100
50
1.5
Related work and future directions
The online Proof Wiki laid out by Gephi’s Force Atlas, colored by community as
detected by Gephi’s built-in modularity algorithm.
1
0.5
0
0
2
4
6
8
10
0
0
Degree
50
100
Out Component Size
150
0
0
Network Statistics
0.5
1
1.5
2
2.5
l og1 0 R an k
The degree distribution of this network visually appears to follow a Poisson distribution, which
is typical of random (Erdos-Renyi) networks and unexpected in this case. Looking at the size of
the out-component, we see that most theorems (the left side of the histogram) have few others
that rely on them. However, there are a few theorems upon which many results are reliant,
which we can see on the right side of the histogram. To test whether these distributions could
represent a scale-free network, we look at their shape plotted on a logarithmic scale a. la. Zipf.
The small size of this network perhaps hinders the relationship, and we can reasonably
conclude (run Kolmogorov regression test?) that this is not a scale free network as one might
expect.
On the left is Chapter 9 of Rudin’s
Analysis, on Lebesgue theory. We see
that a naive layout of this network
does not reveal any of the structure,
but by pulling all of the edges into one
direction we see a form of a branching
network. This fits in with the toy
example given in constructing the
network, where we are able to lay out
the network as a planar graph with
directed edges all pointing east. In
general however, this is not the case.
As the connection in the network
increase, it quickly no longer can be
laid out in this way.
http://www.uvm.edu/storylab
Similar techniques have been successful in many other areas, including the two I talk about
below: predicting the growth of the economic space of nations and the success of recipes
based on ingredients [3] [6]. To predict economic development, Hidalgo et al. looked at the
10
The structure of knowledge
Walter Rudins Principles of Mathematical Analysis [5]. Node size weighted by in component
size, colored by chapter, and laid out by Gephis Force Atlas.
In agreement with Gonzaga et. al. [2], we find very large strongly connected comprising 97% of
the network. However, by considering only links contained in the proofs, we find evidence for a
power law describing the degree distribution. It is also clear that the 2-dimensional structure of
this network is not branching, as one might expect, and the connections between results are
complicated.
The cartoon on the left is from
Douglas Hofstadter’s book Gödel,
Escher, and Bach [4], and is a
depiction of any axiomatic system of
knowledge. Gödel’s landmark
Incompleteness Theorem proved that
there will always be unreachable
truths within a sufficiently powerful
axiomatic system, and here Hofstadter
makes the analogy of our knowledge
reaching out into that space in a
branching manner, which we now see
to be forlorn hope.
2.5
(Top right) The Zipfian degree and out
degree distributions form a nearly
straight line in log-log space, indicating
a power law relationship. However, the
Maximum Likelihood Estimator (MLE) of
the Kolmogorov-Smirnoff statistic does
not confer a power law fit of the degree
distribution with α = 3.23 ± 0.10 and
p = 0.03. There are 14, 025 ± 335
theorems in this power law region, with
an xmin = 10 ± 1.52.
2
1.5
Looking at on online repository of recipes, Teng et. al. were able to classify both similar
ingredients for substitution and that go together. Applying this methodology could lead to a
more complete exploration of the possible proofs, by considering similar theorems that could
be used as replacements in proofs. Classifying which theorems belong together could more
clearly define the boundaries between different areas of mathematics.
1
0.5
0
0
(Middle right) The degree
complementary cumulative distribution
function (CCDF) plotted with the fit
provided from the MLE test, using an
asymptotically unbiased analysis from
Clauset et. al. [1]. In comparison to
Gonzaga et. al. [2], we include only links
within proofs. Therefore it is important to
note that this links are not all the links,
and these statistics are not an artifact of
the fact that these are internet links.
1
2
3
4
5
l og1 0 R an k
In the future a tool can be built to automate this network generation from textbooks. Finally,
original inspiration for the network to study the structure of theorems in upper level analysis
can be realized in making this a tool for education, by making the network interactive.
0
CCDF
Fit
−1
Acknowledgments
−2
−3
(Bottom right) The closeness centrality
is shown with counts, as a measure of
the shortest distance between any two
theorems. This is defined
( P
−1
, if Ri 6= ∅ .
j∈Ri dij
Ci =
0,
otherwise
The authors wish to acknowledge the Vermont Advanced Computing Core, which is supported
by NASA (NNX-08AO96G) at the University of Vermont which provided High Performance
Computing resources that contributed to the research results reported within this poster. AJR
was supported by a EPSCoR research assistantship, PSD was supported by NSF Career
Award # 0846668. CMD and PSD were also supported by a grant from the MITRE Corporation.
α = 3. 2
D = 4. 82 · 10 − 3
−4
Most theorems are very close together,
but we see that there is another peak at
distance 1.5, indicating that most
theorems are only one neighbor away
and almost all are only slightly more
than that. The normalized count of the
PageRank scores for the whole network
are reported in the inset, where we can
see that there are few theorems that
PageRank would classify as relevant.
The global average clustering coefficient
is 0.065.
network of products and models that new products can be reached as a function of the current
product space. Following the work of [3], I hope to further study how future directions of growth
can be predicted. Our current mathematical machinery is ready ripe for certain discoveries, as
can be noticed with concurrent and independent development of important theory (i.e. Newton
and Liebniz with calculus), so this application has founding in history.
Out Component
Degree
−5
0
p = 0. 03
0.5
1
1.5
l og10 n
2
2.5
5000
1
4000
3000
C. A. Hidalgo, B. Klinger, A. L. Barabasi, and R. Hausmann.
The product space conditions the development of nations.
Science 317, 482 (2007), 2007.
0
2000
0
Douglas Hofstadter.
Gdel, Escher, Bach : an eternal golden braid.
Basic Books, New York, 1999.
1
Pagerank
Walter Rudin.
Principles of mathematical analysis.
McGraw-Hill, New York, 1976.
1000
0
1
Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman.
Power-law distributions in empirical data.
SIAM Review 51, 661-703 (2009), 2007.
Flavio B. Gonzaga, Valmir C. Barbosa, and Geraldo B. Xexo.
The network structure of mathematical knowledge according to the wikipedia, mathworld, and
dlmf online libraries, 2012.
Norm Count
Rudin’s Analysis
The network itself can be built differently, changing which theorems are included or which are
used to prove others, and the present structures are all a combination of historical
development and how a writer structures these chunks is his mind. So perhaps this structure is
also a reflection of the natural structuring of complex ideas in the human mind.
l og1 0 D e gr e e , S i z e
In the bottom right we include the famous
result of Heine-Borel. The proof of
Heine-Borel in Walter Rudin’s Analysis
relies on the results that he has already
proved, namely Archimedean Property as
well as De Morgan’s Laws In this fashion,
we add all of the Theorems contained in
Rudin’s Analysis and the ProofWiki to their
respective networks.
The highest outgoing degree node, potentially most useful results in these proofs is the
Definition of Mapping, with 228 incoming links. Kleinberg’s HITS ranks “State Code Function is
Primitive Recursive” as the most authoritative proof, respectively, which comes as a very
technical result in the study of primitive recursive functions. Looking at incoming degree we
find Proof 1 of Sequentially Compact Metric Space is Totally Bounded has a whopping 49
definitions and theorems upon which it relies and being ranked most relevant by Page’s
PageRank, followed far behind by the 38 incoming links of Proof 2 of Complete and Totally
Bounded Metric Space is Sequentially Compact.
So the Fundamental Theorem of Calculus falls short of the mark with a net incoming degree
19, not even half of MCVITs net incoming degree of 45. And it is not the axioms of the real
numbers that are the most fundamental, with the Existence of having a net outgoing degree of
94, but instead the properties of sets shown by De Morgan with a whopping net outgoing
degree of 122. Larry Pages PageRank (the original algorithm behind Google) and Jon
Kleinbergs HITS algorithm also both rate the MCVIT as the most important result.
Degree Distribution
On the right is a sample construction of this
network. At the top right, since the
Archimedean Property relies on the
Existence of R, we draw a directed edge
from Existence of R to the Archimedean
Property.
The structure that we find is a human construction itself. One could prove the Fundamental
Theorem of Calculus (which sounds important but could be just good branding) with nothing
more than the axioms of ZFC set theory. But such a proof would be so long and tedious that
any hope of conveying a clear understanding would be lost. Imagine taking all the atoms that
make up a duck and trying to stick them together to create a duck; this would be the worst
Lego kit ever. And so in any mathematical analysis textbook, the theorems contain small
stories of logic that are meaningful to mathematicians, and theorems that are connected are
neither too close or too far apart.
l og10 Fr e q u e n c y
Each individual theorem contains a small
logical construction, and these
encapsulations allow more technical
theorems to be proven succinctly. If
theorem A is used in proving theorem B,
here we draw a link from theorem A
pointing to theorem B, a directed edge
A → B. In the case of Rudin’s Analysis, we
considered explicit mentions of prior results
to make links. For the ProofWiki, we
include links from only the proofs of
theorems, in this same way.
First there are some things that we notice just from looking at this small graph. We find that
Lebesgue theory (capstoned by Lebesgue Dominated Convergence) lives on the fringe, not
nearly as tied up with the properties of the real numbers as the Riemann-Stieltjes integral or
the integration of differential forms. Visually, it appears that the integration of differential forms
and functions of several variables rely the most on prior results. Over on the right, weve got
things going on with sequences and series, where the well-known Cauchy Convergence
criterion is labeled. By sizing the nodes proportional to their outgoing degree (i.e., the number
of theorems they lead to), we observe that the basic properties of , of sets, and of topology
(purple) lie at the core.
Count
Building a theorem network
Results
The Proof Wiki
l og1 0 D e gr e e , S i z e
Mathematical truths are organized in an incredibly structured manner. We start with the basic
properties of the natural numbers, called axioms, and slowly, painfully work our way up; first
reaching the real numbers then the joys of calculus and far, far beyond. To prove new
theorems, we make use of old theorems, creating a network of interconnected results: a
mathematical house of cards.
So what is the big picture view of this web of theorems? Here, we take a first look at a part of
the “Theorem Network”, and uncover surprising facts about the ones that are important. We
use Walter Rudin’s “Principles of Mathematical Analysis” [5] for the network, and use Gephi for
visualization. We find that the Multivariate Change of Variables in Integration relies on the most
previous results. A basic result about sets known as DeMorgan’s Laws prove the most useful,
leading to the biggest connected component.
Discussion
2
3
Closeness Centrality
4
5
Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic.
Recipe recommendation using ingredient networks, 2011.
@compstorylab