2D Graph Exploration Using Irrational Numbers

UNIVERSITY OF LIVERPOOL
2D Graph Exploration Using
Irrational Numbers
by
Andrew Paul Collins
A dissertation submitted in partial fulfilment for the
degree of Master of Science in Advanced Computer Science
in the
Faculty of Science
Department of Computer Science
September 2009
UNIVERSITY OF LIVERPOOL
Abstract
Faculty of Science
Department of Computer Science
Master of Advanced Computer Science
by Andrew Paul Collins
Irrational numbers such as Pi are often suggested as being random, in this project
it is studied as to whether Pi and log 2 are sufficiently random such that they
may be a suitable for use within random number generators for the purposes of
performing random walks. In order to do this, Pi and log 2 were generated to
600,000 places using the Bailey-Borwein-Plouffe (BBP) algorithms and random
walks were performed in 2-dimensional graphs of various sizes. Further to performing the walks, the Chi Square distribution of the sequences is also calculated
to measure if randomnes truly affects of the walk. To benchmark the significance
of the results, both pseudo-random (Linear Congruential Generator and Mersenne
Twister) and true-random (HotBits and RANDOM.ORG) are also used in the
experiments to provide a measurement of how well the irrational numbers really
perform.
The results show that log 2 and Pi are interesting alternatives to traditional generators, however, further work is needed. Specifically, the deterministic nature
of irrational numbers along with their known good randomness qualities ensures
that regardless of the seed, the quality of the sequence is more likely to be good.
Further, a series of interesting observations unrelated to the goals of the project
were also made, such as an indication that the irrational numbers may become
more random as the start position increases, and additionally, some evidence that
the movement controlled by the digits of Pi within a 2D graph is circular. Finally,
a potential interesting application for BBP is shown in cryptography due to its
relatively low time and space complexity.
Acknowledgements
I would like to acknowledge and thank those who have provided advice, guidance
and support throughout this project:
Prof. Leszek A. Gąsieniec (Supervisor) for his help and guidance throughout this
project.
Dr. Russell Martin (Second Supervisor), for his positive and constructive feedback
throughout the project assessment stages, in particular those relating to the usage
of true random number generators.
Dr. Prudence Wong (Assessor) for her positive and constructive feedback throughout the project assessment stages.
Finally, I would like to thank family and friends who have tolerated the many
hours I have dedicated purely to this project over the previous months.
iv
Contents
Abstract
iii
Acknowledgements
iv
List of Figures
vii
List of Tables
ix
Abbreviations
xi
Symbols
1 Introduction
1.1 Scope . . . . . . .
1.2 Problem Statement
1.3 Approach . . . . .
1.4 Outcome . . . . . .
1.5 Outline . . . . . . .
xiii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Background
2.1 Random Walk in Robotics . . . . . . . . . .
2.2 Random Number Generators . . . . . . . . .
2.2.1 Pseudo Random Number Generators
2.2.2 True Random Number Generators . .
2.3 Irrational Numbers . . . . . . . . . . . . . .
2.3.1 Generation of Irrational Numbers . .
2.4 Chi Square Test . . . . . . . . . . . . . . . .
3 Methodology
3.1 Proposed Solution . . . . . .
3.2 Pre-Processing . . . . . . .
3.2.1 Storage of Data . . .
3.2.2 Generation of Digits
.
.
.
.
.
.
.
.
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
3
4
4
.
.
.
.
.
.
.
7
7
8
8
9
10
11
12
.
.
.
.
15
15
16
16
17
Contents
3.3
vi
Processing . . . . . . . . .
3.3.1 2D Walks . . . . .
3.3.2 Chi Square Test . .
3.3.3 Standard Deviation
Post-Processing . . . . . .
3.4.1 Log Analyser . . .
3.4.2 Heat Maps . . . . .
Testing . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
21
21
21
22
23
24
4 Experimental Work
4.1 Generation of Digits . . .
4.2 Performance of Walks . . .
4.2.1 Experiment One .
4.2.2 Experiment Two .
4.2.3 Experiment Three
4.2.4 Overall Analysis . .
4.3 Further Observations . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
29
30
31
32
34
3.4
3.5
5 Evaluation
39
5.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Professional Issues
43
6.1 Code of Conduct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Code of Good Practice . . . . . . . . . . . . . . . . . . . . . . . . . 44
7 Conclusions
47
7.1 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A Project Specification
51
B Project Specification
57
C Project Implementation
63
Bibliography
69
List of Figures
3.1
3.2
3.3
3.4
Example of the Log Analyser output .
Summary of walks where x = y from 10
Heat map of 85x85 grid using Pi . . . .
Heat map colour scale . . . . . . . . .
4.1
Percentage of 50x50 grids with a good Chi Square distribution at
various starting points . . . . . . . . . . . . . . . . . . . . . . . . . 35
Percentage of 25x25 grids with a good Chi Square distribution at
various starting points . . . . . . . . . . . . . . . . . . . . . . . . . 36
Heat map of 122x122 grid using Pi . . . . . . . . . . . . . . . . . . 37
4.2
4.3
vii
. . . .
to 125
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
23
24
24
List of Tables
2.1
Chi Square Distribution Table . . . . . . . . . . . . . . . . . . . . . 13
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
Summary
Summary
Summary
Summary
Summary
Summary
Summary
Summary
Summary
of
of
of
of
of
of
of
of
of
experiment
experiment
experiment
experiment
experiment
experiment
experiment
experiment
experiment
one – Walk speed . . . . .
one – Chi Square Test . . .
one – Standard Deviation .
two – Walk speed . . . . .
two – Chi Square Test . . .
two – Standard Deviation .
three – Walk speed . . . .
three – Chi Square Test . .
three – Standard Deviation
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
30
31
31
31
32
32
32
Abbreviations
BBP
Bailey Borwein Plouffe
137
Caesium-137
Cs
LCG
Linear Congruential Generator
OOP
Object Orientated Programming
PRNG
Pseudo Random Number Generator
PHP
PHP Hypertext Processor
TRNG
True Random Number Generator
xi
Symbols
χ2
Chi Square Test
∞
Infinity
π
Pi
log 2
Logarithm of 2 in base-10
φ
Golden Ratio
e
Euler’s Number
xiii
Chapter 1
Introduction
The generation of random numbers is too important to be left to
chance.
Robert R. Coveyou (1915-1996)
1.1
Scope
Exploration using robots is a rapidly growing topic in recent times due to their
interesting and sometimes vital applications. Currently exploration robots are
used in many places such as in the exploration of other planets [22], exploration of
heavily polluted or hazardous environments [30], or simply for domestic vacuum
cleaning [23].
In the majority of these cases, the exploration robots are often remotely controlled
by a human simply due to the complexity of the task being outside of our technological boundaries (e.g. explosive defusal) or simply due to social concerns (e.g.
unmanned combat air vehicle – UCAV). However, there are many applications
where an autonomous robot could be critical such as in the exploration of areas where the terrain is simply unsuitable for wired or even wireless contact (e.g.
exploration of Lunar or Martian lava tubes).
1
Chapter 1. Introduction
2
In this project, the objective is to explore the concept of autonomous robots, in
particular looking at a new method for robots to make decisions when faced with
multiple choices such as when faced with multiple doors, which door to traverse
through. While this project will focus purely on the concept of a robot being
placed within an initial room and then letting it autonomously visit all linked
rooms (i.e. exploration), it would be possible to use the concepts shown in other
decision making processes such as which object to interact with first. Further, the
key aspect of this project is to investigate in to memoryless decision making where
decisions are made without any prior knowledge of previous actions. Specifically,
the decision making process of the robots in this project will mimic that of a
probabilistic method.
A probabilistic robot is not only memory efficient it can also generate interesting
applications as its actions become unpredictable. A potential application of an
unpredictable robot is that of a sentry that patrols a building checking each room
for intruders but has no specific pattern as to which rooms are checked in which
order. Memory efficiency also becomes important when a robot must explore
something so vast that its memory requirements could become a burden.
1.2
Problem Statement
The problem with memoryless robots is that they require a method of generating
decisions, such as through the use of random walks which require random number
generators. However, regardless of the random number generation source used, the
speed at which a robot completes the process of visiting each room can be slow
in comparison to a deterministic robot. While speed is not in all cases an issue,
the distribution of visits to rooms can be, as it would be preferred if a robot, if
for example in the sentry or vacuum cleaning robots, visits each room on an equal
or near-equal number of occasions. Therefore there are two factors in this project
that are explored in the search for an alternative source for random numbers:
• The method must be faster in completing the walk.
Chapter 1. Introduction
3
• The distribution of visits must be more equal even at the sacrifice of speed.
Preferably, a new method would have both factors, but this is not strictly necessary.
1.3
Approach
In this project, the approach used to solve this problem using a method that is
deterministic but could be viewed as probabilistic.
In various prior works, attempts have been made to prove that the expansions
of various irrational numbers such as π are random making them in essence, an
infinite series of pre-generated random numbers. In this project the expansions
of irrational numbers are used as a random number generator in replacement of
a pseudo random number generator (PRNG) or true random number generator
(TRNG).
To test the performance of irrational numbers a series of experiments using the expansions of irrational numbers are performed and in addition, a comparison using
both poor quality and good quality PRNGs is included. Further, a comparison
with two TRNG services is also performed.
For the experiments, 2D graph (G = (V, E)) exploration is used, whereby the
set of vertices, V , are the rooms and the set of edges, E, are doorways leading
between rooms. The exploration of the graphs is performed using a form of random walk [13], or a deterministic counter-part, whereby an agent makes decisions
probabilistically using a random number generator.
To validate against the PRNGs and TRNGs, walks are performed on various
graph sizes using each irrational number generator and then a measure of speed
and distribution are taken to identify which performed best. Further, a measure of
the randomness of the sequence used to complete a walk is taken, to see whether
randomness is a factor in the quality of a walk.
Chapter 1. Introduction
4
For the purposes of this work, while the graph model is used, all graphs are treated
as 2D grids, which could be viewed as the grid model being used.
1.4
Outcome
To summarise, the project was completed successfully, however, a number of weaknesses were discovered particularly due to use of the incorrect tools for the task
– simply due to not known alternatives existing. While some evidence has been
shown that the expansions of irrational numbers do provide an alternative, there
is simply insufficient data to provide the sufficient number of experiments needed
to ensure this would be worthwhile. What has been discovered in this project are
interesting observations as to the randomness of the expansions of irrational numbers. First and foremost, it is shown that log 2 appears to become more random
as the starting position of a sequence increases. Secondly, it is shown that the
movement of π within a random walk is not evenly distributed, and instead is circular in that it starts from a central point and slowly moves outwards, frequently
visiting its starting area as it explores.
Finally, in the conclusion and and evaluation, a number of ideas are presented for
future projects as well as new applications for the BBP algorithms that are not
known to have been discussed previously.
1.5
Outline
This work provides a background introduction to the types of random number
generators available, as well as the best known methods of generating irrational
numbers in binary bases. In the third chapter, the method of how the experiments
were performed is explained, such as the creation of the software and the reasoning
for why the software was required to be broken up into three components.
Chapter 1. Introduction
5
In chapter four, the results are discussed from the three main experiments performed in this project, along with the results of the additional work performed;
including the introduction of the interesting observations made that were unrelated to the scope of this project. In later chapters a critical evaluation of the
work is provided with a discussion on the weaknesses and strength of the project,
along with suggestions of how to improve this project should a repeat of the experiments be of interest. At the end of the document, the conclusion is provided
with an overview of the experimental work and a discussion as to the potential
applications that maybe available should further work be done in this area.
Chapter 2
Background
Exploring pi is like exploring the universe.
David Chudnovsky (1947–Present)
2.1
Random Walk in Robotics
The concept of random walks in robotics as a method of movement has been shown
to be viable in [23], which used a random number generator to perform random
walks in an autonomous vacuum cleaner. While the focus of this work was on the
technical challenges of implementing such a robot, from the results (202 m covered
in 1 hour 32 minutes) it is clear to see that the speed of such a method is incredibly
slow. However, [23] differs from work performed in this project as floor coverage is
not of interest, instead the focus will be purely on the speed of visiting all rooms
within a set of rooms. Further work that uses the concept of a random walk is the
iRobot Roomba which utilises a random walk as a fall back algorithm should the
robot come into difficulty (i.e. lost or stuck) or when a floor plan is not known
[21].
What both of these examples highlight, is that random walks can be implemented
successfully within robotics. While the speed of room coverage for [23] was slow
7
Chapter 2. Background
8
in comparison to a human doing the same task, the speed is not realistically an
issue if the task can be completed without human interaction. The cost-benefit of
this becomes more significant if viewed using examples of autonomous robots in
other-world environments where a human may not necessarily be able to interact
with the robot.
2.2
Random Number Generators
In both of the previous robotic vacuum cleaner examples, random number generators were used as the probabilistic element of the random walks. While in neither
case was it specifically stated as to exactly what random number generator was
used, there are many possibilities it could be.
Random number generators can be classified into two groups, true random number
generators (TRNG) – where by the random numbers are gathered from a random
(e.g. quantum) source – and pseudo-random number generators (PRNG) – where
by the random numbers are generated using a deterministic procedure.
A TRNG is often not available within standard computing devices due to the
lack of perceived need for one to be included [11]. While TRNG hardware is
available to those that specifically require this functionality, they are not standard
equipment and are often highly expensive. Due to this lack of a TRNG being
available as standard and in the majority of cases the quality of a TRNG rarely
being necessary, an alternative exists – PRNG.
2.2.1
Pseudo Random Number Generators
A PRNG does not use a true random source like a TRNG, instead its random
numbers are generated through the execution of an algorithm. The perhaps most
well known and most implemented PRNG algorithm is the Linear Congruential
Generator (LCG) which uses the algorithm shown in [20]. The LCG algorithm is
fast and memory efficient making it highly suitable for applications where simple
Chapter 2. Background
9
random numbers are required. However, it has been shown in many cases that
the quality of the random numbers generated is poor [26] and as such the LCG
should not be used in any applications that depend on good random numbers, e.g.
lotteries. Further, in the majority of LCG implementations, the period size is 232
[40].
While periodicity is a concern, a small period is not always the case in all PRNGs.
A prominent example of a good PRNG is the Mersenne Twister [24]. The Mersenne
Twister is designed so as to pass tests for randomness, such that it could be
used within applications that depend on randomness (e.g. Monte Carlo methods).
However, not only does it show to work well within randomness testing, it also
has a period of (219937 − 1) [24], far larger than that of the LCG. Unfortunately
though, while it performs well in randomness tests and has a period far beyond any
possible need1 , it is not without its disadvantages. While the Mersenne Twister
is faster in more recent work [29] it is still slow at starting up in comparison to
the lightweight LCG. Further it also requires a much greater amount of memory,
where as LCG only requires one word, Mersenne Twister requires 623 words (i.e.
2,492 bytes if using 32 bit integers compared to 4 bytes if using LCG).
The LCG and the Mersenne Twister show two extremes, where one is fast yet does
not provide good random numbers, and the other is slow yet provides good random
numbers. For the purpose of this project, it is believed it would be interesting to
see how well irrational numbers compare to both of these PRNGs, and as a subtask it would be interesting to see whether good random numbers are any better
than bad random numbers in the case of random walks, or whether it has no effect.
2.2.2
True Random Number Generators
As mentioned previously, for a TRNG to be used, a source of randomness is required. While there are multiple possible sources within most personal computers
[11], none are standardised or even natively available to the system for usage.
1 19937
2
− 1 is approximately 4.3 × 106001 which is far greater than the predicted number of
atoms – approximately 1080 – in the current observable universe.
Chapter 2. Background
10
Fortunately peripheral hardware capable of generating true random numbers does
exist, however, these are often expensive.
An inexpensive method that is available for use, is in using available Internet based
services such as those available on the WWW. There are two potential services
that are perceived to be TRNGs, RANDOM.ORG2 and HotBits3 .
RANDOM.ORG generates random bits using atmospheric noise. The atmospheric
noise is provided by tuning a very inexpensive and old radio (such that it does not
have noise filters) into an unused frequency so that static is generated which can
be used to generate bits [16].
HotBits, differs in that instead it uses the radioactive decay of Caesium-137 (137 Cs)
as a means of generating bits [33].
2.3
Irrational Numbers
Irrational numbers are numbers which are believed to have infinite non-periodic
expansions. Possibly the most popular irrational number is the number π which
is the ratio of a circles circumference to its diameter. However, there are many
√
others such as the log 2, 2, e (Euler’s Number), and φ (Golden Ratio). Further,
these can also be combined or manipulated while also remaining irrational. For
√
example, π 2 and π 2 remain irrational.
Definition 2.1. “An irrational number is a number that cannot be expressed as
a fraction
p
q
for any integers p and q.” [38]
Further, while many authors show that irrational numbers, such as π, are normal
of many known intervals of the expansions, there yet exists no formal proof that
the numbers are truly normal [6].
2
3
RANDOM.ORG: True Random Number Service — http://www.random.org
HotBits: Genuine Random Numbers — http://www.fourmilab.ch/hotbits/
Chapter 2. Background
11
Definition 2.2. “A normal number is an irrational number for which any finite
pattern of numbers occurs with the expected limiting frequency in the expansion
in a given base (or all bases).” [39]
However, what can often be agreed, is that the property of normality holds for
the digits discovered so far for many irrational numbers – whether this will remain
true as more digits are discovered can only be assumed. This creates an interesting
prospect as not only do the expansions of irrational numbers lack periodicity they
are also so far believed as being normal, which indicates that certain irrational
numbers may in fact be random [5].
This indicates that the expansions of irrational numbers could be utilised in a
random number generator, assuming the property of normality holds. In a recent
work, Mitsui [25] shows that π has potential of being utilised as a random number
generator and in fact in the tests undertaken, outperforms the LCG and Mersenne
Twister.
2.3.1
Generation of Irrational Numbers
For the generation of the expansions of various irrational numbers the Bailey–
Borwein–Plouffe (BBP) Algorithm [7] is one of the most suitable methods. It
was first shown to be capable of generating π (eq. 2.1) in base-16 with very low
memory and processing requirements.
∞
X
1
4
2
1
1
π=
−
−
−
16k 8k + 1 8k + 4 8k + 5 8k + 6
k=0
(2.1)
Since the initial work on BBP, the binary expansions of many other irrational numbers have been shown to be capable of being generated using BBP-style algorithms
giving rise to P-notation [5] as shown in eq. 2.2.
∞
n
X
1 X
aj
P (s, b, n, A) =
k
b j=1 (kn + j)s
k=0
(2.2)
Chapter 2. Background
12
π and many other irrational numbers have been written in this form [1, 4, 36].
This makes BBP of great interest to the project as it would allow the binary
expansions of many irrational numbers to be added to the software quickly once
initial support for P-notation is added, however, this is not a necessary objective
of the project. While not expected to be necessary to this project, BBP has also
shown to be capable of generating non-binary bases such as decimal [14, 28].
Further to this, the BBP algorithm generates binary digits, this allows for irrational numbers to be generated in binary bases, such as base-4. Due to the grid
model being used in the graph exploration this allows for the irrational numbers to
be generated natively in base-4 and mapped directly to an individual port. This
removes any aspect of unfairness existing in the project should for example base-10
digits need to be mapped to four possible edges.
2.4
Chi Square Test
While the performance of the random walks themselves will be a major factor
within the outcome of the results, it would also be of interest to test the randomness
of the sequences that completed the walks. This would assist in the understanding
as to whether a good sequence is better than a bad sequence. The Chi Square (χ2 )
Test provides an indication of the probability of a given sequence occurring. While
the χ2 test in this usage would not necessarily indicate how random a sequence is,
as for a sequence to be random all permutations of a given sequence are possible,
even if that is several hundred consecutive zeros. However, what χ2 will provide is
an indication of whether a sequence should be viewed with some suspicion or not
due to the probability of such a case occurring. Particularly it would be of interest
which performs best, those sequences which arouse suspicion or those that do not.
2
χ =
n
X
(Ei − Oi )2
i=0
Ei
(2.3)
Chapter 2. Background
13
The algorithm to calculate the χ2 distribution of a given expected and observed
data is shown in eq. 2.3, where E is the expected frequency and O is the observed
frequency. In this project E is
Number of Digits
b
where b is the base that the digits
were generated in, which in this case will be base-4.
v
v
v
v
v
p = 1% p = 5% p = 25% p = 50% p = 75% p = 95% p = 99%
= 1 0.00016 0.00393 0.1015
0.4549
1.323
3.841
6.635
= 2 0.02010 0.1026
0.5754
1.386
2.773
5.991
9.210
= 3 0.1148 0.3518
1.213
2.366
4.108
7.815
11.34
= 4 0.2974 0.7107
1.923
3.357
5.385
9.488
13.28
= 5 0.5543 1.1455
2.675
4.351
6.626
11.07
15.09
Table 2.1: χ2 distribution table [19, pg. 44]
On calculating the χ2 distribution, a decimal value is generated. To understand
the decimal values, a look up table (such as table 2.1) is required that converts
the given decimal value to that of a probability. Each row within a χ2 table is
limited to a certain degrees of freedom v, in the case of this project the degrees of
freedom will be v = 3. The reason for this is that all sequences will be in base-4
which results in four possible values, k. The degrees of freedom is v = k − 1.
Through using these look up tables it can be established as to what the probability
of a given sequence is likely to occur. Realistically for something to be random it
should occur with ≥ 25% and ≤ 75%, ≥ 1.213 and ≤ 4.108 respectively.
Chapter 3
Methodology
Any one who considers arithmetical methods of producing
random digits is, of course, in a state of sin.
John von Neumann (1903–1957)
3.1
Proposed Solution
To identify whether irrational numbers are a viable solution to the problem of
performing memory efficient walks, it was proposed that a software solution was
to be created that was capable of not only generating the expansions of irrational
numbers, but also capable of performing the 2D random walks as a means of
testing whether irrational numbers are a viable alternative.
In this chapter, it is identified how the software solution was constructed to generate the required results. The solution consists of three stages, a pre-processing
stage, a processing stage, and a post-processing stage. Each of which are discussed
in their respective sections.
Further, in this chapter it is shown how the software was tested.
In the proposed solution, the BBP algorithm will be used for the generation of
base-4 digits as it was shown capable of doing so in §2.3.1. Specifically, due
15
Chapter 3. Methodology
16
to two PRNGs (LCG, and Mersenne Twister) and two TRNGs (HotBits, and
RANDOM.ORG) being implemented, two BBP algorithms were proposed to be
used, a constant – π – and a logarithm – log 2.
This chapter contains a combination of the design and implementation phases of
the project. For the initial project proposal please see appendix A. For further
information on the separate design and implementation stages please see the poststage documentation in appendix B and appendix C respectively.
3.2
Pre-Processing
A requirement of the software was that the expansions of the irrational numbers
must be generated in base-4 such that they can be fairly mapped to a specific
movement direction. Due to the large number of grids that were expected to be
used it would be inefficient to generate digits on-demand for each grid, especially as
the time complexity of generating an individual digit using the the BBP algorithm
is linear.
The solution to this was to perform a stage of pre-processing whereby databases for
each digit generator are created prior to the walks. This allowed digit generation
to be performed once for all walks. As a side effect, this results in disk space being
required to store the many digits which are generated, however, the processing
time savings easily make this a negligible cost.
3.2.1
Storage of Data
One potential concern was that the digits are in base-4 (0, 1, 2, 3) having a storage
requirement of 2 bits (00, 01, 10, and 11 respectively) which could potentially cause
overheads as it is common for programming languages to operate natively in bytes.
As one digit is stored per byte, this creates an overhead of 6 bits per digit stored.
Chapter 3. Methodology
17
To highlight the problem, if the software were to generate up to the potential
internal limits of the software (assuming 32 bit unsigned integers) then it would
be generating up to n = 232 digits, resulting in a disk space usage of O(n) bytes (4
GB) per generator. Assuming that there are multiple irrational number generators,
PRNGs, and TRNGs the disk space requirements could become an issue. A simple
yet effective solution designed to resolve this was to utilise bit packing whereby
multiple digits are packed into a single byte. This allows for four digits to be
stored in every one byte with no overhead reducing the disk space requirement to
O( n4 ) bytes (1 GB).
While realistically it was not expected that 232 digits would be generated, for the
purposes of future proofing the software it was reasonable that efficient storage
methods should be utilised.
3.2.2
Generation of Digits
To ensure the software is expandable it utilises concepts of Object Orientated
Programming (OOP) which are natively available within the Java programming
language.
One of the first features used was an interface. An interface simply provides a
means of specifically stating exactly which methods must exist within any class
that implements it. Effectively this standardises the methods available within a
class. This becomes advantageous at later stages as functionality that intends to
use these classes only needs to be written to handle one set of methods rather than
many.
Secondly, the work requirement for each generator was reduced by increasing the
amount of re-usable code. To do this a superclass was created. A superclass is a
full or partially written class that can be extended at a later time. In this case
the superclass adds the majority of the functionality that would be required by
each generator, such as the ability to write to a log file, or performing efficient
Chapter 3. Methodology
18
modular exponentiation. However, the superclass does not specifically have any
functionality of how to generate digits.
The final feature used is subclasses. A subclass is a class that inherits the functionality of a superclass. In the software, the subclasses inherit the features of
the previously mentioned superclass allowing it to have all the core functionality
available natively without the need to re-write them. This has allowed for the core
functionality to be centralised such that only the features that make a generator
unique need to be written – such as the implementation of a BBP algorithm, or
the usage of a PRNG. Further, all subclasses not only inherit the superclass, they
also implement the interface allowing access to the generators uniformly.
As stated previously, π and log 2 were implemented using the BBP algorithms, [7]
and [1] respectively. The LCG was implemented with relative ease due to support
being natively available within the Java libraries [31] and further the Mersenne
Twister was also implemented with similar ease due to an existing implementation
available within the Colt Project libraries [10].
The two TRNGs did however provide some initial difficulties. Both TRNGs are
Internet services and as such have been designed to prevent abuse by limiting the
number of digits generated. While RANDOM.ORG provides a daily allowance of
1,000,000 digits per day it did create difficulty when trying to debug the project
software or generating various test cases. Fortunately, RANDOM.ORG seems to
have foreseen the problem of its users requiring high volumes of data as it provides
pre-generated files which are updated every 24 hours [17]. The pre-generated files
are simply a binary file containing 1 MB of data (4,194,304 base-4 digits). In
addition it contains an archive of all previous generated databases providing a
huge amount of data to work with. Conveniently the data files utilise bit packing
just as the project software itself uses for its own databases resulting in the files
being suitable for use without any modification. The project software has been
designed to automatically download the latest of these files and use them as a
digit database. In cases where this functionality is not required, databases can be
manually downloaded and added to the software.
Chapter 3. Methodology
19
The second TRNG implemented, HotBits, also suffered from this same issue as
its daily allowance is 16,384 bits making it completely unsuitable for use in the
software. Further, HotBits does not provide pre-generated files as was the case
with RANDOM.ORG. However, after contact with the HotBits author [35], two
16 MB (67,108,864 base-4 digits) data files became available to the project, or one
file per HotBits generation hardware, which had been pre-generated for previous
statistical testing [34]. Once again, as the data files are bit packed they can be
simply loaded in to the software.
3.3
Processing
Once the digit databases have been been created, the required walks can be performed. Within the main processing stage the previously built databases are used
to perform the walks. Further, due to the number sequences produced by the
generators being directly used in this stage, the additional tests such as χ2 test
and the calculation of standard deviation are performed immediately after the
walks complete. As both tests depend on the results of the walks it is sensible
to compute these values now before the result data is released from memory to
reduce additional processing later.
3.3.1
2D Walks
The 2D walks as mentioned are a core aspect of this project, as it is in these tests
that it is expected that the performance of the expansions of irrational numbers
will be visible. The walks that will be performed will traverse either squares (x × y
where x = y) or rectangles (x × y where x 6= y)1 . Further, while the size of the
graphs will be rectangular or square, the edges of the shapes will be transparent
as the graph will be treated as a torus.
1
For all cases x and y are always assumed to be integers greater 0.
Chapter 3. Methodology
20
Specifically due to the graphs being treated as grids, the most efficient method
of implementation is through the use of a multi-dimensional array, a feature commonly available in most modern programming languages. Further, by using an
integer array rather than a boolean array, it is possible to record the number of
visits the agent makes to any given vertex. By using an array the memory required
to perform the walks is O(x×y) plus the addition of two bytes that act as a pointer
to the current vertex the agent is positioned in. A potential alternative to this is
to use linked lists, however, this would require substantially more memory as each
vertex would be required to store a reference to all four of its edges. If the graphs
were not square or rectangular in shape, then this could possibly be more memory
efficient over the grid method that is used.
Performing the walks uses the previously generated databases, for each walk the
database is opened and read sequentially – unpacking each byte to four digits as
each new byte is read. For each digit extracted, the digit is mapped to a direction
(x ± 1 ∨ y ± 1) and the agent will perform one step in that direction. In a physical
environment this could be envisioned as a robot going through one of four doors
and into a different room.
At the initial state all vertices will be set to 0 – in the array – indicating that
the vertex has not yet been visited. Upon the agent visiting a vertex, the vertex
increments its value by 1, thus recording how many visits occurred in a given
vertex. Once all vertices are greater than 0 the walk is then considered complete
as all vertices can be classified as being visited. The end result is that a matrix of
size x × y holding at each vertex the number of visits is produced. These matrices
are stored to disk as a particular interest is how well irrational numbers perform in
generating a relatively equal amount of visits to each vertex. This will be discussed
further in §3.3.3. Further, these matrices can also be used to generate a heat map
which will also be covered further in §3.4.2.
Chapter 3. Methodology
3.3.2
21
Chi Square Test
Additionally in the performance of the walks, a frequency list is maintained of the
number of times a digit occurs within the sequence. The list contains no more
than four values, one value for every base-4 digit and as such is implemented as a
1D array with each digit referring to an index value. In terms of the χ2 test, this
list will be the observed data.
By now having the observed values and the expected values2 as is required by the
χ2 test, the χ2 distribution can be calculated as per the algorithm in eq. 2.3.
3.3.3
Standard Deviation
To calculate how even the distributions are across every vertex, the standard deviation of the matrices produced in the walks are calculated. The standard deviation
provides a measure of how well a given series of numbers deviate from the the
average mean of the list.
In the best case the result will be 0, however, it is unlikely this will ever occur.
3.4
Post-Processing
In the processing stage, an individual walk (or multiple unrelated walks) is performed and the additional statistics are calculated relating to them. After each
walk, the results of the walks are recorded to a log file so that a collection of walks
can be analysed at once. It is the analysis of these log files which occurs in the
post-processing stage.
By reviewing each log file simultaneously the different generators performance at
each grid size can be compared to see how well the irrational numbers perform in
the walks.
2
See §2.4 for how the expected data value is calculated
Chapter 3. Methodology
3.4.1
22
Log Analyser
To analyse the log files for several walks a PHP (PHP: PHP Hypertext Processor)
based web front end was created. The PHP script reads each log file simultaneously
and provides the information as to which digit generator performed the best in
each measurement per grid size on the basis of walk speed. Further, for each grid
and generator, the χ2 distribution is colour coded to indicate whether the values
were rejected, suspected or almost suspect so that it can be seen as to whether
the randomness of a sequence affects the speed. An example of the log analyser
output can be found in fig. 3.1.
Figure 3.1: Example of the Log Analyser output
A further feature of the log analyser is it provides a summary of how well each
generator performed overall in the areas of speed, χ2 and standard deviation for
every grid size tested in a given session. An example of this output can be seen in
fig. 3.2.
Chapter 3. Methodology
23
Figure 3.2: Summary of walks where x = y from 10 to 125
3.4.2
Heat Maps
As an addition to the log analyser, it also provides the functionality to view the
matrices that were produced in the processing stage. These matrices have been
produced as heat maps. A heat map is a visual representation of the matrices with
all of the values normalised to colour values. In the heat map generator every
10th percentile will be normalised to a certain colour. The colours selected were
generated by ColorBrewer3 [18].
Using these maps it will be possible to see the variance of the distribution of visits
within a walk. While the standard deviation will provide an indication of the
walks, the heat maps will show any interesting patterns that may appear. An
example heat map generated by the software is in fig. 3.3 and the colour scale
used is available in fig. 3.4.
3
Colorbrewer: Color Advice for Maps — http://www.colorbrewer2.org
Chapter 3. Methodology
24
Figure 3.3: Heat map of 85x85 grid using Pi
Figure 3.4: Heat map colour scale
3.5
Testing
The primary function to test in the software was that of the irrational number expansion generation. To verify the generators were working, verification was completed by generating the 106 , 107 , and 108 digit of both π and log 2 and comparing
Chapter 3. Methodology
25
it to the results provided within [7]. Once the digits matched, it was believed the
generators were performing as designed.
Chapter 4
Experimental Work
I am ashamed to tell you to how many places of figures I carried
these computations, having no other business at the time.
Sir Isaac Newton (1643-1727)
4.1
Generation of Digits
While the generation of digits for both the PRNGs and the TRNGs were trivial
due to external data available or the speed of their algorithms – as discussed earlier
– the case was not the same for the BBP algorithms. The issue that arose with
the BBP algorithm is that due to it operating as a spigot algorithm, to generate
a sequence requires generating each digit from 1 to n independently. The time
complexity to do this becomes O( n(n+1)
) (i.e. that of a triangular number) as each
2
digit must be generated independently.
n
X
n(n + 1)
O(
)=
O(d)
2
d=0
(4.1)
This disadvantage of the BBP algorithm results in an incredibly slow processing
time to generate large sequences. Due to this and due to the limited resources
available to the project, 600,000 digits were generated for both π and log 2. While
27
Chapter 4. Experimental Work
28
a considerable number, it required 3 weeks of processing time using the authors
system to generate this amount.
4.2
Performance of Walks
Using the 600,000 digits, random walks were performed on grids of size 10 × 10
up to 100 × 100. For every x from 10 up to 100, y was set from 10 up to 100 (see
algorithm 1).
Algorithm 1 Generation of grids
1: for x = 10 to 100 do
2:
for y = 10 to 100 do
3:
doWalk(x, y)
4:
end for
5: end for
This created 8,281 independent walks that were performed by each of the six generators. To make the experiments fair for the PRNGs and TRNGs, the experiments
were performed three times. The aim of this was to balance out the chance of
an extremely good or extremely bad sequence being generated by a PRNG and
skewing the experiment.
Within the next three subsections will be a brief analysis of noteworthy results
within each of the three experiments. Each of these sections reproduce the summary tables that were generated in the post-processing stage. When the postprocessing stage is performed, every generator is ranked against each other generator on a grid by grid basis. In the case of walk speed and standard distribution,
rankings are ordered by the smallest value being best (i.e. shortest walk, or best
distribution) – with the best generator being ranked in 1st and the worst being
ranked in 6th place for those tests. The summary tables are frequency matrices of
how many times the generator was ranked at each position for all tests performed.
Due to this a generator that performs well would hope to score highly in the first
three rows and poorly in the remaining three in comparison to other generators.
Overall, the sum of the percentages for one generator (i.e. sum of all percentages
Chapter 4. Experimental Work
29
in one column) should be equal to 100%, equally, the sum of all percentages in an
individual row should equal also 100% as a generator cannot be ranked at multiple
positions and generally a rank will not be shared – however, this is possible but
only in very rare cases.
The table of grid by grid comparisons for each generator of which the summaries
are taken from are incredibly large as they contain 8,281 records, as such they are
unavailable within this document and are instead available upon the attached CD.
4.2.1
Experiment One
The summary results of the first experiments are available in tables 4.1, 4.2, and
4.3. In the first test for walk speed RANDOM.ORG is a clear winner, as not only
is it the fastest random number generator in 24.37% of walks, it is also the second
fastest in a further 19.86% of walks. Further, it is also has the least number of
walks gaining ranks, 4th , 5th , and 6th place. This positive result for TRNGs is also
reflected in the second TRNG tested. HotBits, also dominates the upper rankings
and also has the least number of walks in the lowest ranks. This is clearly, a
positive result for TRNGs as in this first experiment they appear to be the most
recommended choice to use when required to complete random walks fast.
The next prominent result is in the PRNGs. The LCG is known for its poor
randomness and that is possibly reflected in this experiment. The LCG had the
least number of walks that were ranked within the top three positions – it only
achieved 1st place in 11.25% of walks – making it the worst generator to use in
terms of speed. However, regardless of the LCGs result, the Mersenne Twister
performed much better as it holds more positions within the top three ranks.
Further, the number of tests which were ranked in the top three is similar to that
of the TRNG, HotBits.
Finally, looking at the generators of most interest to this project, the irrational
numbers – these tests vary. First and foremost, π is the fastest the least number
of times, second only to the LCG with only 12.49%, a difference of just over 1%.
Chapter 4. Experimental Work
30
Overall, π performs better than the LCG, but is far slower than that of the TRNGs
or the Mersenne Twister in the majority of tests. Interestingly though, log 2 seems
to perform much better and even outperforms the Mersenne Twister in terms of
speed in slightly more tests. However, it is still far from the level of performance
of the TRNGs.
1
2
3
4
5
6
Log2
1,412 17.05%
1,423 17.18%
1,471 17.76%
1,420 17.15%
1,392 16.81%
1,163 14.04%
1,034
1,156
1,484
1,618
1,481
1,508
Pi
12.49%
13.96%
17.92%
19.54%
17.88%
18.21%
LCG
932
11.25%
1,136 13.72%
1,278 15.43%
1,418 17.12%
1,648 19.90%
1,869 22.57%
1,320
1,413
1,429
1,395
1,416
1,308
MT
15.94%
17.06%
17.26%
16.85%
17.10%
15.80%
HotBits
1,565 18.90%
1,509 18.22%
1,306 15.77%
1,268 15.31%
1,293 15.61%
1,340 16.18%
RandomOrg
2,018 24.37%
1,645 19.86%
1,313 15.86%
1,162 14.03%
1,050 12.68%
1,093 13.20%
Table 4.1: Summary of experiment one – Walk speed
Rejects
Suspects
Almost Suspect
Normal
Average
Average (Norm)
Log2
1,182 14.27%
1,833 22.14%
1,198 14.47%
4,068 49.12%
0.7907
0.6667
Pi
111
1.34%
841
10.16%
829
10.01%
6,500 78.49%
1.7882
1.7033
LCG
3
0.04%
506
6.11%
1,443 17.43%
6,329 76.43%
1.2621
1.1644
MT
13
0.16%
87
1.05%
155
1.87%
8,026 96.92%
1.7912
1.7739
HotBits
711
8.59%
2,779 33.56%
1,248 15.07%
3,543 42.78%
6.6407
1.5085
RandomOrg
0
0.00%
106
1.28%
602
7.27%
7,573 91.45%
1.7835
1.7451
Table 4.2: Summary of experiment one – Chi Square Test
1
2
3
4
5
6
Log2
1,588 19.18%
1,221 14.74%
1,473 17.79%
1,625 19.62%
1,331 16.07%
1,043 12.60%
853
1,128
1,443
1,645
1,885
1,327
Pi
10.30%
13.62%
17.43%
19.86%
22.76%
16.02%
LCG
816
9.85%
950
11.47%
1,288 15.55%
1,401 16.92%
1,662 20.07%
2,164 26.13%
1,173
1,446
1,536
1,530
1,415
1,181
MT
14.16%
17.46%
18.55%
18.48%
17.09%
14.26%
HotBits
1,904 22.99%
1,531 18.49%
1,237 14.94%
1,134 13.69%
1,130 13.65%
1,345 16.24%
RandomOrg
1,947 23.51%
2,005 24.21%
1,304 15.75%
946
11.42%
858
10.36%
1,221 14.74%
Table 4.3: Summary of experiment one – Standard Deviation
4.2.2
Experiment Two
The summary results of the second experiments are available in tables 4.4, 4.5,
and 4.6. The overall results for walk speed in this experiment have changed dramatically from those in the first experiment. In this experiment RANDOM.ORG
has performed very badly and only 8.51% of walks were the best. Similarly the
results for the other top three ranks were also weak, and this is further reinforced
by the fact that RANDOM.ORG was in 6th position 29.28% of the time. HotBits,
has however maintained its position to an extent as aproximately 48% of its walks
tend to be in the top three positions.
Chapter 4. Experimental Work
31
The Mersenne Twister is shown to be the best generator over all, with the most
walks in both the first and second positions – outperforming the TRNGs. While
the sequence generated was an improvement over the sequence used in the last
experiment, it is not leading by far. Foremost however, the LCG has shown
considerable improvement as its performance is now almost level with that of
HotBits, and it also easily outperforms RANDOM.ORG.
Finally, the irrational numbers have also shown considerable improvement in this
experiment – possibly due to the very weak TRNGs. While, π has performed
reasonably well, better than that of both HotBits and the LCG, it is log 2 that has
shown the greatest improvement. While log 2 performed well in the first experiment, in this experiment it is second best to the Mersenne Twister with a similar
amount of walks gaining the top ranks.
1
2
3
4
5
6
Log2
1,815 21.92%
1,497 18.08%
1,342 16.21%
1,358 16.40%
1,188 14.35%
1,081 13.05%
1,232
1,450
1,598
1,480
1,412
1,109
Pi
14.88%
17.51%
19.30%
17.87%
17.05%
13.39%
LCG
1,521 18.37%
1,375 16.60%
1,299 15.69%
1,384 16.71%
1,397 16.87%
1,305 15.76%
1,851
1,566
1,363
1,261
1,151
1,089
MT
22.35%
18.91%
16.46%
15.23%
13.90%
13.15%
HotBits
1,160 14.01%
1,350 16.30%
1,531 18.49%
1,461 17.64%
1,508 18.21%
1,271 15.35%
RandomOrg
705
8.51%
1,040 12.56%
1,149 13.88%
1,336 16.13%
1,626 19.64%
2,425 29.28%
Table 4.4: Summary of experiment two – Walk speed
Rejects
Suspects
Almost Suspect
Normal
Average
Average (Norm)
Log2
1,182 14.27%
1,833 22.14%
1,198 14.47%
4,068 49.12%
0.7907
0.6667
Pi
111
1.34%
841
10.16%
829
10.01%
6,500 78.49%
1.7882
1.7033
LCG
15
0.18%
6
0.07%
31
0.37%
8,229 99.37%
3.0389
2.9838
MT
244
2.95%
1,333 16.10%
1,367 16.51%
5,337 64.45%
5.7617
2.7693
HotBits
10
0.12%
681
8.22%
1,159 14.00%
6,431 77.66%
3.8435
2.1108
RandomOrg
748
9.03%
2,096 25.31%
742
8.96%
4,695 56.70%
6.5047
2.3320
Table 4.5: Summary of experiment two – Chi Square Test
1
2
3
4
5
6
Log2
2,036 24.59%
1,284 15.51%
1,376 16.62%
1,493 18.03%
1,197 14.45%
895
10.81%
1,148
1,443
1,542
1,650
1,508
990
Pi
13.86%
17.43%
18.62%
19.93%
18.21%
11.96%
LCG
1,650 19.93%
1,637 19.77%
1,418 17.12%
1,306 15.77%
1,342 16.21%
928
11.21%
1,718
1,805
1,489
1,172
1,169
928
MT
20.75%
21.80%
17.98%
14.15%
14.12%
11.21%
HotBits
1,280 15.46%
1,428 17.24%
1,505 18.17%
1,530 18.48%
1,472 17.78%
1,066 12.87%
RandomOrg
449
5.42%
684
8.26%
951
11.48%
1,130 13.65%
1,593 19.24%
3,474 41.95%
Table 4.6: Summary of experiment two – Standard Deviation
4.2.3
Experiment Three
The summary results of the third experiments are available in tables 4.7, 4.8,
and 4.9. Once again, the overall best performing generator have changed. In
Chapter 4. Experimental Work
32
this experiment the LCG far out performs any of the other generators, with over
25.91% of its walks being ranked the best (1st ) – and in total 62% of its walks were
ranked within the top three ranks. This result is incredibly suprising, especially
as it is a clear winner over not only the Mersenne Twister, but also both TRNGs.
Finally, π performs relatively poorly again in this experiment – possibly due to the
extremely good sequence generated by the LCG. However, once again log 2 seems
to have a mediocre performance only outperforming π and RANDOM.ORG.
1
2
3
4
5
6
Log2
1,224 14.78%
1,248 15.07%
1,347 16.27%
1,495 18.05%
1,509 18.22%
1,458 17.61%
868
1,121
1,328
1,478
1,663
1,823
Pi
10.48%
13.54%
16.04%
17.85%
20.08%
22.01%
LCG
2,146 25.91%
1,680 20.29%
1,369 16.53%
1,243 15.01%
992
11.98%
851
10.28%
1,409
1,491
1,502
1,359
1,311
1,209
MT
17.01%
18.01%
18.14%
16.41%
15.83%
14.60%
HotBits
1,462 17.65%
1,511 18.25%
1,468 17.73%
1,396 16.86%
1,299 15.69%
1,145 13.83%
RandomOrg
1,173 14.16%
1,231 14.87%
1,265 15.28%
1,313 15.86%
1,504 18.16%
1,795 21.68%
Table 4.7: Summary of experiment three – Walk speed
Rejects
Suspects
Almost Suspect
Normal
Average
Average (Norm)
Log2
1,182 14.27%
1,833 22.14%
1,198 14.47%
4,068 49.12%
0.7907
0.6667
Pi
111
1.34%
841
10.16%
829
10.01%
6,500 78.49%
1.7882
1.7033
LCG
974
11.76%
2,128 25.70%
2,316 27.97%
2,863 34.57%
7.7648
1.7849
MT
695
8.39%
2,326 28.09%
554
6.69%
4,706 56.83%
6.1291
2.0553
HotBits
28
0.34%
406
4.90%
1,123 13.56%
6,724 81.20%
4.1831
2.9859
RandomOrg
154
1.86%
532
6.42%
487
5.88%
7,108 85.84%
2.9223
2.3816
Table 4.8: Summary of experiment three – Chi Square Test
1
2
3
4
5
6
Log2
1,264 15.26%
1,186 14.32%
1,192 14.39%
1,291 15.59%
1,648 19.90%
1,700 20.53%
672
935
1,160
1,383
1,857
2,274
Pi
8.11%
11.29%
14.01%
16.70%
22.42%
27.46%
LCG
2,481 29.96%
1,698 20.50%
1,491 18.01%
1,124 13.57%
822
9.93%
665
8.03%
1,268
1,593
1,557
1,618
1,262
983
MT
15.31%
19.24%
18.80%
19.54%
15.24%
11.87%
HotBits
1,617 19.53%
1,644 19.85%
1,523 18.39%
1,456 17.58%
1,205 14.55%
836
10.10%
RandomOrg
979
11.82%
1,225 14.79%
1,358 16.40%
1,409 17.01%
1,487 17.96%
1,823 22.01%
Table 4.9: Summary of experiment three – Standard Deviation
4.2.4
Overall Analysis
The results of the three experiments show the difficulty in trying to distinguish
the best random number generator. The problem is that in one instance a random
number generator can perform badly, yet in the same experiment with a different
seed the generator can perform impressively. A perfect example of this effect is in
experiment one and experiment three, as in these experiments the LCG appears
to perform terribly, yet with a different seed is able to perform the best.
Chapter 4. Experimental Work
33
This effect also causes problems as it makes it harder to distinguish as to whether
other generators performed consistently or whether they also had performance
changes. However, this effect is slightly negated by looking at the summary tables
for both the χ2 test and the standard deviation. As the values differ per experiment
it is clear to see the sequence qualities vary.
An observation that can be made though is about the irrational number generators, throughout all these experiments the same irrational number sequences have
been used, and regardless of how well the other generators have performed, the
performance of the irrational numbers has always been consistent. This becomes
interesting as that while the irrational numbers are not proving to be the best, nor
are they always the worst however, in the case of log 2 it is proving a strong candidate in every experiment. In further experiments that were simply a repeat of
experiment one with the starting point of the sequences increased by 50,000 digits
this also showed true. While all other generators seemed to vary in performance,
the performance of log 2 remained consistent.
A further observation made within these experiments is in the standard deviation
tests. In these tests it was hoped the generator that provides the best distribution
of visits would become visible. Unfortunately, this test appears to be flawed as the
standard deviation has become related to the speed of the walks. The less digits
required to complete the walks results in the less probability of a high deviation.
Due to this, the standard deviation has become an inaccurate measure. However,
it should be noted that the connection does not always hold and in some cases the
fastest generator is not always the one with the best standard deviation.
In §2.4, it was stated that it would be of interest to know if the randomness of
a sequence affects the performance of a walk. The answer to this question lies
with the walk speed and χ2 summary tables. At first, from reviewing the results
of RANDOM.ORG it appears as though it does, as RANDOM.ORG is the best
generator in experiment one with 91.45% of walks having a good sequence distribution and is the worst generator in experiment two with only 56.70%. However,
this correlation clearly does not appear with the LGC as in the first experiment
Chapter 4. Experimental Work
34
where it scored the worst it had 76.43% of walks with bad sequences yet when it
clearly out performed all other generators in experiment three it only had 34.57%
of walks with good distributions. This seems to be further reinforced with the
Mersenne Twister, in experiment two the Mersenne Twister performed the best in
terms of speed yet only had 64.45% of walks with good sequences.
While the summary tables can be misleading, further analysis of the actual tables
themselves shows that the different χ2 distributions have no affect as to whether
a walk will be fast or slow.
4.3
Further Observations
Using the software produced for this project, it was possible to not only perform
walks from the starting digit but also from any arbitrary digit. By performing
multiple smaller sets of walks such as up to 50 × 50 it was possible to observe the
effects this had on the irrational numbers. Due to the limited performance time
available, these side experiments were limited to the irrational numbers only.
In the first series of these experiments, walks were performed on grids of up to
50 × 50, at every 50,000 digit interval. In this experiment, the number of digits
tested is variable due to the digits being tested only after the walk. From there a
summation of the number of tests which scored well in the χ2 Test were generated
and then plotted to a chart. Such that at the end result, a chart containing the
number of tests which scored good χ2 distributions for both π and log 2 existed (see
fig. 4.1). In this chart it became noticeable that as the start point point increased,
the number of walks with good χ2 distributions also increased consecutively with
a peak at 300,000 digits of 100% thus indicating that regardless of the length
of the sequence, any sequence from the 300,000 digit up to the 350,000 digit
appears to hold good randomness qualities. Strangely, suddenly after this point
the percentage value drops sharply, indicating the opposite.
Chapter 4. Experimental Work
35
To see if this same pattern occurs in other intervals, a similar series of tests were
conducted using grids of up to 25 × 25 at each interval of 10,000 digits and the
results once again plotted to a chart (fig. 4.2). While the same effect can not
be observed in this graph, what can be observed is that when viewing the overall
trend of the number of sequences with good distributions, the value seems to
increase. By adding a trend line to the first chart, the same can also be viewed
as occurring. Further, from viewing the second chart, it is possible to see that the
overall number of good sequences appears to be converging towards the trend line.
Unfortunately, due the limit of only 600,000 digits generated, it is hard to see if
this is a trend that would continue as the starting point increases.
120%
100%
80%
Log2
60%
Pi
Linear (Log2)
Linear (Pi)
40%
20%
0%
-
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
500,000
Figure 4.1: Percentage of 50x50 grids with a good Chi Square distribution at
various starting points
A further interesting observation was made when making some of the largest possible grids with the digits available. In these grids for π it is possible to observe a
pattern appearing, as in the centre mass of the image the distribution of visits is
Chapter 4. Experimental Work
36
16,000
14,000
12,000
10,000
8,000
Log2
Pi
Linear (Log2)
6,000
Linear (Pi)
4,000
2,000
-
Figure 4.2: Percentage of 25x25 grids with a good Chi Square distribution at
various starting points
quite low, while in the corners the distribution is higher – forming a cross shape
within the middle of the image. As the agent is placed in the upper-left most
corner (0,0) and as the grid is treated as a torus this indicates that the the growth
of the exploration of π is that of a circle. The possible reason for this is that
the agent first starts at the centre of the circle and progressively explores outward
moving back and forward across the starting area building the circle outwards.
While this has not been verified, this pattern seems to occur in most grids but
becomes progressively more distinct as the number of digits increases. The most
distinct of the heat maps that shows this is in fig. 4.3.
This effect has not been observed in either of the TRNGs or PRNGs.
Chapter 4. Experimental Work
Figure 4.3: Heat map of 122x122 grid using Pi
37
Chapter 5
Evaluation
Everything that can be counted does not necessarily count;
everything that counts cannot necessarily be counted.
Albert Einstein (1879-1955)
5.1
Strengths
In terms of meeting the objectives of the project, the software has been successful
as it is capable of generating digits to user defined limits and storing them very
efficiently (zero overheads), performing 2D walks, performing the χ2 test, and
calculating the standard deviation. In addition the generated matrix of a walk is
saved along with a log file entry so that further analysis can be undertaken outside
of the software.
Looking specifically at the strengths, this project has an interesting contribution
that the author has not previously seen in other works and that is in its testing
of variable length sequence for randomness based on a rule rather than of fixed
length intervals. In the case of this project the rule is the walks.
The effect this has, is that the sequences are tested, and then a further few digits
are added to the sequences and then the sequences are tested again. This process is
39
Chapter 5. Evaluation
40
repeated 8,281 times using sequences from a starting length of approximately 500
digits to a finishing length approximately 600,000 digits. This concept has proved
to be interesting as it has shown particularly in the small experiments which
were completed that the expansions of irrational numbers appear to become more
random. Further it also provides a more realistic perspective as it tests sequences
of a range of sizes rather than that of an arbitrary fixed length.
A further strength of the project is the addition of heat maps which have shown
the movement of the digits that would not normally be seen by using tests such
as the standard deviation alone. Particularly, this functionality resulted in the
observation reported in §4.3.
5.2
Weaknesses
While all targets have been met, the project has not been able to tackle sufficiently
large data sets to see if any of the observations hold in much larger sequences. Particularly the problem is in the time complexity of the BBP algorithm for generating
sequences which as was mentioned is O(n2 ) where n is the length of the sequence
required. While there are various other algorithms available for generating the
expansions of irrational numbers, the problem lies that none appear as of yet to
exist that operate natively in binary bases. Due to this fact, the size of the sequences were limited resulting in only relatively small grids being suitable for use
in walks. While the walks performed have created interesting insights, it would be
interesting to see if the observations that were made such as that of the irrational
number, π, exploring from a central point outwards can also be observed in much
larger grids.
A further weakness of the project is in its usage of the standard deviation as a
measure of visit distribution. While the standard deviation algorithm performed
as expected, it did not create a fully accurate measure due to the length of walks
being variable across each generator making it unsuitable for comparing across
grids of the same size.
Chapter 5. Evaluation
5.3
41
Recommendations
While the software development has so far been successful, one potential issue that
appears to be arising is the large amount of processing power needed to generate
large sequences using the BBP algorithm. While there are some known faster BBP
formulas for π [8], this does not improve the performance of other BBP formulas
such as log 2. The issue is simply that BBP is not designed for generating digit
sequences of this scale, instead it is most suited at generating individual digits.
If the project was to be repeated, it would highly benefit from a similar algorithm
that is optimised towards generating digit binary sequences – but as was mentioned
previously, none are yet known to exist. However, the BBP algorithm does have
a strength which was unfortunately unable to be exploited in this project. First
and foremost, due to BBP being a spigot algorithm, it is possible to distribute the
processing across several processors and merging the work completed such that
each processor is assigned only one or a small portion of digits to generate.
In addition to distributing the processing of individual digits, it is also possible
to distribute the processing of an individual digit itself due to the design of the
algorithm. The algorithm makes use of a series that is performed multiple times
with differing variables, this function itself can be distributed to allow multiple
processors to work on one digit. For example, a future project could greatly
increase digit generation speed by first distributing the generation of digits across
multiple processor systems, of which each may distribute the generation of the
assigned digit across multiple processing cores.
Even still the process of calculating an individual series can be broken down further
so that only a small range of data is calculated, a method which was proven to be
successful in earlier projects which used BBP formulas [27].
Additionally, a feature of the BBP algorithm is that it does not only generate a
single digit, it generates several up to the precision of the floating point arithmetic
used. One potential method of reducing some processing is to avoid the use of
floating point arithmetic and instead use integer arithmetic of a very large word
Chapter 5. Evaluation
42
size, if available, [2] such that many digits can be generated in one attempt rather
than only a few, however, this will come at the cost of memory consumption.
Further, a new adequate method of calculating the standard deviation is recommended, i.e. one that provides a more accurate ratio to the sum of the list it
is performed up on. This would enable the standard deviations to be measured
across walks of various lengths. While the author did experiment with using the
standard deviation in a ratio to the walk length, it did not provide any useful
result.
Finally, with the project in its current form, it would be interesting to see how
the different generators perform using graphs of various shapes other than those
of a square or rectangular shape. For example in shapes which are triangular. As
a further expansion to this it would also be interesting to see how the generators
perform when the graphs are not treated as a torus.
Chapter 6
Professional Issues
When it comes to professionalism, it makes sense to talk about
being professional in IT. Standards are vital so that IT
professionals can provide systems that last.
Sir Tim Berners-Lee (1955–Present)
6.1
Code of Conduct
This project complies with the British Computer Society (BCS) Code of Conduct
as in no stage did the project deviate from these standards. In the section of
public interest the project has no interaction with the public, or the environment
and thus cannot harm them. Further, for any third parties which were interacted
with – such as that of the TRNG services, their rights were respected throughout.
In addition to this all known national, regional or international laws were complied
with. For example no personal data is stored by this project and any interaction
with third parties were done through compliance with their rules. This project
was also not in anyway shaped by any form of coercion.
The practices of the authorities were respected throughout the development of this
project. Further, all deadlines were met and indications were given at the relevant
stages as to any deviations from the prior specified project schedules. Finally, at
43
Chapter 6. Professional Issues
44
no stage within the project has the results ever been manipulated or withheld for
any reason.
Throughout the project, the authors skills were greatly refined, particularly the
author became competent in his understanding of BBP algorithms and further
increased his skills in the Java programming language. Also, the author took care
to carefully monitor this specific research area to ensure that any new works were
utilised at the most convenient possible time. In addition, the author made it clear
of the extent of his skills early in the project which was reflected in the decisions
made later in the project - such as a lack of a Java GUI.
6.2
Code of Good Practice
This project also complies with the BCS Code of Good Practice as once again
the project complies with all standards within the document. Throughout the
project, the author ensured his technical competence within the subject area was
maintained through careful monitoring of publications within the area. Also in
terms of the software implementation, tools required were also monitored to ensure
that new methods or tools released through the project were identified and utilised.
Specifically, multiple Java Development Kit updates were released throughout this
project and were adopted by the author – however, it was ensured that backward
compatibility was maintained. Further the standards and regulations of the organisation, and the laws of the country were complied with.
The author ensured that the level of his ability in specific areas were indicated
before proceeding at any point and the likely actions required as a reflection of the
level were also indicated. Further to this, the author ensured that the workload
undertaken was reasonable for his ability and that he had the necessary resources
such that all work could be completed within the time scales given.
Specific to this project, the project complies with the requirements that the research is beneficial and not damaging to society. Further, the potential applications
Chapter 6. Professional Issues
45
have been made clear for this project and pitfalls – of which so far none are known.
Further, due to no biological material (animal or human), or even personal data
being used within this project all ethical and legal concerns in these areas, comply
with the standards expected. Finally, work by other authors such as that of BBP
has been evaluated by this project and the results are made available within this
document. Further to this, the results produced by this project itself are also
made publicly available within this document.
Chapter 7
Conclusions
The most exciting phrase to hear in science, the one that heralds
the most discoveries, is not ’Eureka!’, but ’That’s funny...’
Isaac Asimov (1920-1922)
7.1
Main Findings
The two factors that were hoped would be observed within this project were:
• The method must be faster in completing the walk.
• The distribution of visits must be more equal even at the sacrifice of speed.
While this project has not found any direct evidence for either of these factors it
has found something of interest related to these. First and foremost, due to the
deterministic nature of the expansions of irrational numbers, yet also the good
quality of randomness the digits hold, an interesting effect is observed. In the
TRNGs, it was shown in the experiments that the results tended to vary drastically
from one sequence to another, in one experiments a sequence was performing
exceptionally well, while in another experiment the sequence generated would be
extremely poor. This effect also occurs in the PRNGs too, it appears very much
47
Chapter 7. Conclusions
48
that depending on the seed used for the PRNGs, depends on the quality of the
series generated.
However, as the expansions of irrational numbers do not change like this then
this flaw does not occur. By using the expansion of an irrational number as a
random number generator the sequence can always be of a good quality rather
than having the risk of having a very poor sequence – even though there is the
possibility of a very good quality sequence. While π did not prove to perform
well in the experiments, or at least as well as was hoped, log 2 shows potential
as 15%-20% of its walks performed the best. Further, this observation seemed to
hold when the starting point for the sequences was incremented by intervals of
50,000 from 0 up to 250,000.
Additionally, as the starting point increases, the performance of π within the
results also improved in some cases.
In regards to the second objective, this was complex to observe with the methods
used. However, through analysis using the heat maps, the irrational number, π,
did show a pattern emerging as to where the digits become distributed. Further,
while no generator performed a consistently smooth surface, log 2 did generate a
pattern that has a relatively good distribution. In comparison to π, the standard
deviation of identical grids always tended to be lower for log 2 even when log 2
required more digits to complete a walk.
Overall, π and log 2 have shown interesting results in the walks, while they have
not shown any noteworthy improvement over existing random numbers generators
they do indicate that potential applications may exist in this area. The memory
requirement of the the BBP algorithm is lower than that of the Mersenne Twister,
yet the performance is almost equivalent in many experiments, and it also reduces
the financial cost of implementing a TRNG, however, due to the linear nature of
the BBP algorithm it becomes increasingly slow as a sequence progresses. If this
were to be applied within a real-world agent such as a physical robot it may begin
to become increasingly slower in its actions over time, where as a PRNG or TRNG
would not. Until the time complexity of the BBP algorithms can be reduced this
Chapter 7. Conclusions
49
may reduce the time a BBP powered robot can be deployed for, or require it to
reset or switch algorithms after an arbitrary position.
7.2
Future Work
As mentioned a number of interesting observations have been made, in particular
related to the increasing randomness of log 2 as the starting point increases. Unfortunately due to the limit in the number of digits, little work was completed in
studying this, however, it is certainly an aspect that should be studied more in
future.
Further, due to the existence of patterns emerging in π it would be interesting to
see if there are certain graphs that would benefit from this movement pattern.
However, for either of these two concepts to be realised, it would be of immense
interest to study as to whether a more efficient generator for the expansions of
irrational numbers can be found, such as one which is logarithmic.
The possibility of an irrational number random number generator do seem realistic,
as the quality of the sequences is shown to be random and the performance in
experiments has been good in general. Further, these qualities were shown to
hold, or even improve if the random number generator were to be seeded (i.e.
starting from an arbitrary position).
Finally, as a further interesting application that does not appear to have been
discussed previously is the use of the BBP algorithm in cryptography. While
irrational number generators have been discussed in the past to be bad choices for
cryptography due to their predictable nature [25], what has not been realised is
that to calculate an individual digit is only linear. This would allow two nodes
to select an arbitrary position and use the digits at these points for encrypted
communication, incrementing the position used by some value for each message.
This would allow a small sub-sequence to be generated with relative ease even
with large values of n, however, to find the sub-sequence would require performing
Chapter 7. Conclusions
50
O(n2 ) complexity. Further, due to the number of BBP algorithms available this
would become O(m × n2 ) where m is the number of available BBP algorithms. If
a secret increment value is used it could be incredibly complex to eavesdrop even
if the key is broken for one message in the conversation.
Appendix A
Project Specification
Graph Exploration in 2D-Grids
Using Irrational Numbers
Project Specification
Andrew Collins
Project Description
In this project we will perform the exploration of 2D grids using the graph model
[13] where by we have a set of nodes each connected by a set of undirected edges
(G = (V, E)) to the four closest nodes. While in principle we will treat this as a
graph model it may also be seen as a geometric model [13]. For the exploration of
the graphs we will use a deterministic counterpart of a random walk [13] using the
expansions of irrational numbers (e.g. π, π 2 , log 2 (2)) – which are believed to have
no periodicity [12] – in base-4. From performing these walks we hope to see the
randomness of consecutive sequences of each irrational number. In addition we
will further verify this using the Chi Square (χ2 ) Test, which provides a measure
of the deviation of a given sample [37].
By using the expansions of multiple irrational numbers we will provide a benchmark as to which number performs best in given graphs and further provide an
51
Appendix A. Project Specification
52
indication of the randomness of the generated expansions. From these tests we
will provide an indication of how random the consecutive digits of the expansions
of irrational numbers are, in comparison to existing pseudo-random random number generators such as the Linear Congruential Generators [26], Mersenne Twister
[24], and Blum–Blum–Shub [9], – which are known to have a periodicity after some
number of digits [12].
To complete this project we must be able to generate the expansions of irrational
numbers efficiently, in terms of both memory and processing, as their is a possibility that some walks may require many digits to be produced before a graph can be
covered. The generated expansions must also be in base-4 so that each expansion
can be easily, and fairly, mapped to a direction.
One of the most efficient ways to generate expansions of specific irrational numbers
is the Bailey–Borwein–Plouffe (BBP) Algorithm [7], which, has been shown to be
capable of generating the expansions of many irrational numbers [1] with very low
memory and processing requirements. The originally proposed BBP Algorithm
for π is shown in eq. A.1, however, more recent work has improved this formula
by 47% [8].
∞
X
1
2
1
1
4
π=
−
−
−
16k 8k + 1 8k + 4 8k + 4 8k + 6
k=0
(A.1)
While the BBP Algorithm is designed towards the generation of digits in base-16
it is not restricted to this, any power of two can also be created from this formula
such as base-4 or even base-2. For the generation of other bases such as base-10
different BBP-style algorithms are required [14, 28].
In this project, we shall produce software capable of generating the expansions
of a selection of irrational numbers which can then be used to perform walks
in 2D grids. The software will output the number of moves required for each
irrational number to complete the walks within the grid such that comparisons can
be made. Further, the software will also be capable of verifying the randomness of
the expansions which were generated. From this we will produce documentation
Appendix A. Project Specification
53
stating the performance of each irrational number in terms of both speed1 and
randomness.
From the initial proposal we have refined the problem to using the expansions of
irrational numbers as a means of performing walks through grids as an alternative
to pseudo-random numbers. Further we have specified that we would like to test
for randomness in the numbers that were used to complete the walks.
Conduct of The Project
In preparation for the project, background research has been required and completed in the study of the BBP algorithm such that it can be understood and modified to generate base-4 expansions in the most optimal manner. As the project
continues we may require further research in finding suitable methods for proving
the correctness of the base-4 expansions of the selected irrational numbers that we
implement.
For the implementation of the software required by the project, the Java programming language will be used. The student, feels confident in his skill to be able
to develop the implementation sufficiently. However, additional skills will be required in the “Swing” library to be able to develop a GUI for the implementation.
It should be noted however, that while a GUI is of interest, it maybe unsuitable
due to the limit in the size of grids displayable. As such a GUI will be seen as
a value-added extra and not a functional requirement. Finally, the student must
also develop an extremely competent understanding of the BBP algorithm so as
to be capable of using it to generate the expansions of many irrational numbers.
As the software used to develop this project will be bespoke, no additional software
will be required. However, additional software will be used for the generation of
the software itself. In particular, the student intends to use the Eclipse IDE2 so as
1
The number of expansions required to be generated to complete a walk rather than as a
measure of time.
2
Available from: http://www.eclipse.org/
Appendix A. Project Specification
54
to increase the ease and speed in which the software is developed. In addition the
Sun Microsystems Java Development Kit (JDK)3 will be used for the compilation
of the source code. The compiled byte-code can be executed using any available
Java Virtual Machine (JVM) such as what is provided within the JDK and Java
Runtime Environment (JRE).
Statement of Deliverables
From this project we will present four deliverables by the completion of the project:
• Documentation – By completion of the project we will create multiple documents such as the specification, design and dissertation. In particular, the
latter will contain details as to how the expansions of irrational numbers in
base-4 were generated. Further it will contain the results of the experiments
as well as an analysis of these results. Finally we hope to produce additional
separate documentation should we discover any new BBP-type algorithms
or improvements.
• Software – To perform the experiments required, a bespoke software solution will be developed by the student. The aim of the solution will be
to generate the expansions of multiple irrational numbers and then perform
walks through grids of differing sizes. Further, the software will analyse at
the completion of each walk the randomness of the expansions that were
used to complete the walk.
• Experiments – Through using the software we will benchmark a range of
irrational numbers. Firstly, the winner of each benchmark will determined
through the speed at which the irrational numbers manage to complete a
walk through a given grid. In addition to this test we will also perform the χ2
test on the generated sequences to gather a further measure of randomness.
3
Available from: http://java.sun.com/javase/
Appendix A. Project Specification
55
• Evaluation Methods – To evaluate the speed aspect of the random walks
we will simply compare the number of expansions required to complete the
walk to find which irrational number completed the walk the quickest. Further, we will evaluate the randomness of the walks both through the results
of the walks within the grids and also through using the χ2 test.
Plan
A plan of how the project will be completed is available in table A.1. While the
project should follow this plan strictly, the project does however contain some
risk. Primarily, the experimentation phase is scheduled to run for approximately
2 weeks, however, as the experiments performed within this project may become
CPU intensive as the size of the grids grow, this may run over schedule. We
expect to mitigate this risk, through reducing the scale of the experiments or
more preferably by finishing the software development phase early. As mentioned
previously the student feels that this aspect should not be too troublesome and
as some algorithmic aspects have already been prototyped in the “Background
Research” phase this is realistic possibility.
For a more detailed version of this plan, please see the attached Gantt Chart.
Start Date
25 May
4 June
19 June
20 June
15 July
20 July
21 July
12 Aug.
17 Aug.
18 Aug.
28 Aug.
18 Sept.
End Date
3 June
18 June
19 June
14 July
19 July
20 July
11 Aug.
16 Aug.
17 Aug.
27 Aug.
17 Sept.
18 Sept.
Title
Background Research
Project Specification
Specification Submission
Design Documentation
Presentation Preparation
Design Presentation
Software Implementation
Presentation Preparation
Software Presentation
Experiments & Analysis
Dissertation
Project Completion
Deliverables
Understanding of req. algorithms
Specification document
Submission of Specification
Project Design document
Presentation Slides
Design Presentation
Software to be used in the project
Presentation Slides and sample experiments
Software Presentation
Experiment results and analysis documentation
Dissertation document
Submission of Dissertation
Table A.1: Plan as to how the project will be completed including milestones
in bold and deliverables
Appendix B
Project Specification
Graph Exploration in 2D-Grids
Using Irrational Numbers
Project Design
Andrew Collins
Summary of Proposal
In this project we have proposed to perform the exploration of the 2D grids using
the graph model [13] where by we have a set of nodes each connected by a set
of undirected edges (G = (V, E)) to the four closest nodes. For the exploration
of the graphs we will use a deterministic counterpart of a random walk [13] using
the expansions of various irrational numbers (e.g. π, π 2 , log2) in base-4. From
performing these walks we hope to see the randomness of the consecutive sequences
of each irrational number. In addition we will further verify this using the Chi
Square Test (χ2 ) [37].
Through performing these walks we shall identify which irrational number performs the best in various grid sizes, and additionally we shall identify which irrational number produces the most random sequence for completing each grid
size. Further we will also compare the performance of the expansions of various
57
Appendix B. Project Specification
58
irrational numbers against Pseudo-Random Number Generators (PRNG), such as
the commonly available Linear Congruential Generators [26] and the Mersenne
Twister [24] as well as against True Random Number Generators (TRNG) such as
RANDOM.ORG [15] and HotBits [32].
For the generation of the expansions of various irrational numbers we intend to
use the Bailey–Borwein–Plouffe (BBP) Algorithm [7] which was first shown to be
capable of generating π (eq. B.1) in base-16 with very low memory and processing
requirements.
∞
X
1
4
2
1
1
π=
−
−
−
16k 8k + 1 8k + 4 8k + 5 8k + 6
k=0
(B.1)
Since the initial work on BBP, the binary expansions of many other irrational numbers has been shown to be capable of being generated using BBP-style algorithms
giving rise to P-notation [5] as shown in eq. B.2.
n
∞
X
1 X
aj
P (s, b, n, A) =
k
b j=1 (kn + j)s
k=0
(B.2)
π and many other irrational numbers have been written in this form [1, 4, 36].
This makes BBP of great interest to the project as it would allow the binary
expansions many irrational numbers to be added to the software quickly once
initial support for P-notation is added. While not expected to be necessary to this
project, BBP has also shown to be capable of generating non-binary bases such as
decimal [14, 28].
Design
To complete this project, a software solution is required to be developed that
will be capable of generating the expansions of various irrational numbers and
performing walks to measure randomness. Due to the nature of this project being
Appendix B. Project Specification
59
less focused on the software itself and more towards the actual output of the
software, we will use an incremental methodology. In this methodology, the design
process is treated as being cyclic with relatively simple objectives being specified
to be completed on each cycle. This allows the software to be built gradually
through refinement based upon what is learned on the previous iteration.
Due to the complexity of implementing various BBP algorithms being unknown
to us, this allows in the first cycle for us to first focus solely upon implementing
the BBP for π algorithm. From here we can re-use what has been learnt to refine
the software such that P-notation is supported. Should this be successful then
other irrational numbers can be added as is necessary. This iterative process of
refinement can be repeated until the software reaches what is expected of it.
A further advantage of this method is the development process also becomes very
responsive to change as new insights or issues can be factored into the software at
the start of the next cycle. Finally, through using this method we create the functional aspects relatively early allowing for us to be able to start generating some
results before the software is fully complete, effectively increasing the experimental
period of the project.
In terms of the expected modules or components, the software will be required to be
capable of pre-generating and storing the base-4 expansions of various irrational
numbers, preferably using a space efficient method as the number of generated
digits may be quite large - using one byte to store one digit would have an overhead
of 6 bits per digit. As each digit only requires 2 bits, it would be possible to perform
bit packing using bitwise operators so that 4 digits can be stored per byte. Further,
the database will be required to be read from as well as written or appended to.
The software will then be required to use these databases to perform walks.
To perform the walks, the software will need to be able to generate 2D grids of
various sizes. As the grids are strictly 2D and either square or rectangular in
shape, a simple 2D array can be used. If a 2D integer array is used, then it would
become possible to store whether a grid square has been visited or not, if it has
Appendix B. Project Specification
60
then how often the visits were. From this, charts showing frequency of visits can
be generated (e.g. heat maps).
Finally, the last component of the software is the χ2 test. Due to the simplicity
of the algorithm to compute this, this will be performed by the software itself. To
perform this test the software will be required to record the frequency each digit
occurred during the walk which can be calculated against the expected occurrences
of each digit to generate the result.
Review Against Plan
The plan which was produced for the specification stage of the project (see, fig.
B.1) has so far been followed and remained on schedule. In particular, the background research stage has identified the work in BBP as shown in the proposal
summary.
A minor change has been made to the design stage as the “Design Presentation”
has been set to the 24th July from the previous 20th July. Irrespective of this, the
previous design stages have not been extended, instead they have maintained so
as not to reduce the time allocated to the software implementation.
The remainder of the project plan will continue as is, however, it is still expected
that the software implementation stage may finish early due to some success already being made in the prototypes of the BBP algorithms within the earlier and
current stages. However, in response to feedback [41] additional subtasks have
been added to the remaining stages.
Appendix B. Project Specification
Start Date
25 May
4 June
19 June
20 June
15 July
24 July
21 July
21 July
24 July
27 July
1 Aug.
3 Aug.
5 Aug.
7 Aug.
12 Aug.
17 Aug.
18 Aug.
28 Aug.
28 Aug.
31 Aug.
3 Sept.
7 Sept.
11 Sept.
15 Sept.
18 Sept.
End Date
3 June
18 June
19 June
14 July
19 July
24 July
11 Aug.
23 July
26 July
31 July
2 Aug.
4 Aug.
6 Aug.
11 Aug.
16 Aug.
17 Aug.
27 Aug.
17 Sept.
30 Aug.
2 Sept.
6 Sept.
10 Sept.
14 Sept.
17 Sept.
18 Sept.
Title
Background Research
Project Specification
Specification Submission
Design Documentation
Presentation Preparation
Design Presentation
Software Implementation
,→
,→
,→
,→
,→
,→
,→
Presentation Preparation
Software Presentation
Experiments & Analysis
Dissertation
,→
,→
,→
,→
,→
,→
Project Completion
61
Deliverables
Understanding of req. algorithms
Specification document
Submission of Specification
Project Design document
Presentation Slides
Design Presentation
Software to be used in the project
BBP for Pi
BBP for Log2
P-Notation
PRNG
TRNG
χ2 Test
Testing of software
Presentation Slides and sample experiments
Software Presentation
Experiment results and analysis documentation
Dissertation document
Abstract & Introduction
Background
Design
Realisation
Evaluation & Conclusion
Clean-up for submission
Submission of Dissertation
Table B.1: Plan as to how the project will be completed including milestones
in bold and deliverables
Appendix C
Project Implementation
Graph Exploration in 2D-Grids
Using Irrational Numbers
Software Presentation
Andrew Collins
Summary of Proposal
In this project we have proposed to perform the exploration of the 2D grids using
the graph model [13] where by we have a set of nodes each connected by a set of
undirected edges (G = (V, E)) to the four closest nodes. For the exploration of
the graphs we will use a deterministic counterpart of a random walk [13] using the
expansions of irrational numbers (i.e. π, and log2) in base-4. From performing
these walks we hope to see the randomness of the consecutive sequences of each
irrational number. In addition we will further verify this using the Chi Square
Test (χ2 ) [19].
Through performing these walks we shall identify which irrational number performs the best in various grid sizes, and additionally we shall identify which irrational number produces the most random sequence for completing each grid size.
63
Appendix C. Project Implementation
64
Further we will also compare the performance of the expansions of various irrational numbers against Pseudo-Random Number Generators (PRNG), such as the
commonly available Linear Congruential Generators (LCG) [20] and the Mersenne
Twister [24] as well as against True Random Number Generators (TRNG) such as
RANDOM.ORG [15] and HotBits [32].
Summary of Design
The project design stated that the software is to be comprised of three components,
of which first and foremost is the generation of the digits necessary for performing
the walks. For the generation of irrational numbers it was decided that the BBP
algorithm [7] is to be used as not only is it very efficient in generating various
different irrational numbers, it also works in binary bases. In addition to the
requirement of generating irrational numbers, it was also a requirement for the
software to be capable of generating both pseudo-random and true-random number
sequences.
In the second component, the software is required to produce 2D grids and use
the previously generated digits to perform graph exploration, in a form similar to
a random walk.
Thirdly, the software must use the results of the prior walks to perform further
tests for randomness, in this case we intended to use the χ2 test.
Produced Software
Of the many various BBP formulas available, two numbers have been selected,
a constant – π, and a logarithm – log2, and have been successfully implemented
using the BBP formula for π [7] and log2 [1]. In addition to generating irrational
numbers, at this stage we were also required to implement both PRNG (LCG,
and Mersenne Twister) and TRNG (HotBits, and RANDOM.ORG). The LCG
Appendix C. Project Implementation
65
was implemented with relative ease due to support being natively available within
the Java libraries [31] and further the Mersenne Twister was also implemented with
similar ease due to an existing implementation available within the Colt Project
libraries [10].
The two TRNG’s did however provide some initial difficulties. First and foremost,
both TRNG’s are online services and as such have been designed to prevent abuse
by limiting the number of digits generated, while RANDOM.ORG is very generous
in providing a daily allowance of 1,000,000 digits per day it did create difficulty
when trying to debug the project software or generating various test cases. Fortunately, RANDOM.ORG seems to have foreseen the problem of its users requiring
high volumes of data as it provides pre-generated files which are updated every
24 hours [17]. The pre-generated files are simply a binary file containing 1 MB of
data (4,194,304 base-4 digits). In addition it contains an archive of all previous
generated databases providing a huge amount of data to work with. Also, conveniently the data files utilise bit packing just as the project software itself uses for
its own databases meaning that the files can be used without any modification.
The project software has been designed to automatically download the latest of
these files and use them as a digit database. In cases where this functionality is
not required, databases can be manually download and added to the software.
The second TRNG implemented, HotBits, also suffered from this same issue as its
daily allowance is only 16,384 bits making it completely unsuitable for use in the
project. Further, HotBits does not provide any pre-generated files as was the case
with RANDOM.ORG. However, after contact with the HotBits author [35], two
16 MB (67,108,864 base-4 digits) data files became available to the project, or one
file per HotBits generation hardware, which had been pre-generated for previous
statistical testing [34]. As we are testing for randomness and not uniqueness these
databases should prove more than sufficient. Once again, as the data files are bit
packed they can be simply loaded in to the software.
The 2D graph generation is as was planned a 2D integer array generated to the
requested size. By default the integer arrays starts with all values set to zero.
Appendix C. Project Implementation
66
Upon the the walk entity moving into a node, the position is incremented by one.
When all grid squares are greater than zero the walk is complete. Throughout
the walk, a record is kept of the frequency of each base-4 digit so that the χ2
distribution can be calculated and logged, in addition an analysis of the generated
matrix (2D array) is performed. While earlier in the project their was a lack of
clarity on how this was to be performed [42], this has now been resolved. To
analyse the matrices the software calculates the standard deviation at the end of
the walk as a measure of the smoothness of a performed walk.
At the completion of the walk, and calculation of both the χ2 distribution and the
standard deviation, the matrix and calculations are output to log files which can
themselves be analysed further, at a later point.
To analyse the log files for several walks a PHP based web front end has been created. The PHP script reads each log file simultaneously and provides the viewer
with information as to which digit generator performed the best in each measurement per grid size. Further, the interface also provides an summary of all the
results showing which generator performed the best overall. While at this stage
both the software and log analyser are deemed complete, it is expected that the
web-front end will receive further modifications throughout the experimentation
period as new and improved methods of analysing the results are found.
Evaluation
In terms of meeting the objectives of the project, the software has been successful, as described previously, as it is capable of generating digits to user defined
limits and storing them very efficiently (zero overheads), performing 2D walks,
performing the χ2 test, and calculating the standard deviation. In addition the
generated matrix is saved along with a log file entry so that further analysis can
be undertaken outside of the software.
Appendix C. Project Implementation
67
While, all targets have been met, one design of the original plan has been removed. In later work by the BBP authors, new BBP formulas have been written
in P-notation form, the hope initially was that P-notation could be supported by
the project so that many other documented formulas may be added to the software. Unfortunately however, conflicts were identified (π: k = 0 / log 2: k = 1
/ P-Notation: k = 0) in the formulas which made verifying more exotic formulas
√
(π 2) complex, communications with the author of P-notation failed to provide
any resolution [3]. Further, while prototypes did have some success, the P-notation
implementations proved to suffer in performance due to the additional logic required. Due to the already limited processing power available to the project this
stage was cancelled.
Future Suggestions
While the software development has so far been successful, one potential issue that
appears to be arising is the large amount of processing power needed to generate
large sequences using the BBP algorithm. While there are some known faster BBP
formulas for π [8], this does not improve the performance of other BBP formulas
such as log2. The issue is simply that BBP is not designed for generating digit
sequences of this scale, instead it is most suited at generating individual digits.
If the project was to be repeated, it would highly benefit from a similar algorithm
that is optimised towards generating digit binary sequences. However, the BBP
algorithm does have a strength which was unfortunately unable to be exploited
in this project. First and foremost, due to BBP being a spigot algorithm, it is
possible to distribute the processing across several processors and merging the
work done such that each processor is assigned only one or a small portion of
digits to generate.
In addition to distributing the processing of individual digits, it also possible to
distribute the processing of an individual digit itself due to the design of the
algorithm. The algorithm makes use of a series that is performed multiple times
Appendix C. Project Implementation
68
with differing variables, this function itself can be distributed to allow multiple
processors to work on one digit. For example, a future project could greatly
increase digit generation speed by first distributing the generation of digits across
multiple systems, of which each may distribute the generation of the assigned digit
across multiple processing cores.
Even still the process of calculating an individual series can be broken down further
so that only a small range of data is calculated, a method which was proven to be
successful in earlier projects which used BBP formulas [27].
Finally, a feature of the BBP algorithm is it does not only generate a lone digit,
it generates several up to the precision of the floating point arithmetic used. One
potential method of reducing some processing is to avoid the use of floating point
arithmetic and instead use integer arithmetic of a very large size if available [2]
such that many digits can be generated in one attempt rather than only a few,
however, this will come at the cost of memory.
Bibliography
[1] D. H. Bailey. A compendium of BBP-type formulas for mathematical constants. Available
from: http://crd.lbl.gov/~dhbailey/dhbpapers/bbp-formulas.pdf, 2009.
[2] D. H. Bailey. The bbp algorithm for pi. Available from: http://crd.lbl.gov/~dhbailey/
dhbpapers/bbp-alg.pdf, 2006.
[3] D. H. Bailey. Re: P-notation parser. Personal Communication, 2009.
[4] D. H. Bailey and J. M. Borwein. Mathematics by Experiment: Plausible Reasoning in the
21st Century, chapter 3.6, pages 127–131. Wellesley, MA: A K Peters, 2003.
[5] D. H. Bailey and R. E. Crandall. On the random character of fundamental constant expansions. Experimental Mathematics, 10(2):175–190, 2000.
[6] D. H. Bailey, J. M. Borwein, P. B. Borwein, and S. Plouffe. The quest for pi. Mathematical
Intelligencer, 19(1):50–57, 1997.
[7] D. H. Bailey, P. Borwein, and S. Plouffe. On the rapid computation of various polylogarithmic constants. Mathematics of Computation, 66(218):903–913, 1997. ISSN 0025-5718.
doi: http://dx.doi.org/10.1090/S0025-5718-97-00856-9.
[8] F. Bellard. A new formula to compute the n’th binary digit of pi. Available from: http:
//fabrice.bellard.free.fr/pi/pi_bin.pdf, 1997.
[9] L. Blum, M. Blum, and M. Shub. A simple unpredictable pseudo random number generator.
SIAM Journal on Computing, 15(2):364–383, 1986. ISSN 0097-5397. doi: http://dx.doi.
org/10.1137/0215025.
[10] CERN - European Organization for Nuclear Research. Colt project. Available from: http:
//acs.lbl.gov/~hoschek/colt/, 2004.
[11] D. Eastlake 3rd, J. Schiller, and S. Crocker. Randomness Requirements for Security. RFC
4086 (Best Current Practice), June 2005. URL http://www.ietf.org/rfc/rfc4086.txt.
[12] H. Ghodosi, C. Charnes, J. Pieprzyk, and R. Safavi-Naini. Pseudorandom sequences obtained from expansions of irrational numbers. In Pre-Proceedings of Cryptography Policy
and Algorithms Conference, pages 165–177, Brisbane, Australia, July 3-5 1995.
[13] L. Gąsieniec and T. Radzik. Memory efficient anonymous graph exploration. GraphTheoretic Concepts in Computer Science: 34th International Workshop, WG 2008, Durham,
UK, June 30 — July 2, 2008. Revised Papers, pages 14–29, 2008. doi: http://dx.doi.org/
10.1007/978-3-540-92248-3_2.
[14] X. Gourdon.
Computation of the n-th decimal digit of π with low memory. Available from: http://numbers.computation.free.fr/Constants/Algorithms/
nthdecimaldigit.pdf, 2003.
69
Bibliography
70
[15] M. Haahr. RANDOM.ORG - true random number service. Available from: http://www.
random.org, 2009.
[16] M. Haahr. RANDOM.ORG - the history of random.org. Available from: http://www.
random.org/history/, 2009.
[17] M. Haahr. RANDOM.ORG - pregenerated random numbers. Available from: http://
random.org/files/, 2009.
[18] M. Harrower and C. A. Brewer. Colorbrewer.org: An online tool for selecting color schemes
for maps. The Cartographic Journal, 40(1):27–37, 2003.
[19] D. E. Knuth. Seminumerical Algorithms, volume 2 of The art of computer programming,
chapter 3.3.1, pages 42–48. Addison-Wesley, third edition, 1998.
[20] D. E. Knuth. Seminumerical Algorithms, volume 2 of The art of computer programming,
chapter 3.2.1, pages 10–26. Addison-Wesley, third edition, 1998.
[21] T. E. Kurt. Hacking Roomba, chapter 1, page 5. ExtremeTech. John Wiley & Sons, 2006.
[22] M. W. Maimone, P. C. Leger, and J. J. Biesiadecki. Overview of the mars exploration
rovers’ autonomous mobility and vision capabilities. In IEEE International Conference on
Robotics and Automation (ICRA) Space Robotics Workshop, Roma, Italy, Apr. 2007.
[23] E. Maningat, B. Monterola, E. Obrero, R. Samante, and J. Villafuerte. Random
walk application for autonomous vacuum cleaner robot. Available from: http://www.
electronicslab.ph/projects/autonomous-vacuum-cleaner-robot.pdf, 2007.
[24] M. Matsumoto and T. Nishimura. Mersenne twister: a 623-dimensionally equidistributed
uniform pseudo-random number generator. ACM Transactions on Modeling and Computer
Simulation, 8(1):3–30, 1998. ISSN 1049-3301. doi: http://doi.acm.org/10.1145/272991.
272995.
[25] T. Mitsui. The number π as a pseudo-random number generator. The science and engineering review of Doshisha University, 49(3):160–168, 2008.
[26] S. K. Park and K. W. Miller. Random number generators: good ones are hard to find.
Communications of the ACM, 31(10):1192–1201, 1988. ISSN 0001-0782. doi: http://doi.
acm.org/10.1145/63039.63042.
[27] C. Percival. The quadrillionth bit of pi is ’0’. Available from: http://oldweb.cecm.sfu.
ca/projects/pihex/announce1q.html, Dec. 2001.
[28] S. Plouffe. On the computation of the n’th decimal digit of various transcendental numbers.
Available from: http://pictor.math.uqam.ca/~plouffe/Simon/articlepi.html, 1996,
Revised 2003.
[29] M. Saito and M. Matsumoto. Simd-oriented fast mersenne twister: a 128-bit pseudorandom
number generator. In Monte Carlo and Quasi-Monte Carlo Methods 2006, pages 607–622.
Springer Berlin Heidelberg, 2008.
[30] D. W. Seward and M. J. Bakari. The use of robotics and automation in nuclear decommissioning. In 22nd International Symposium on Automation and Robotics in Construction,
Ferrara, Italy, 2005.
[31] Sun Microsystems, Inc. Math (java platform se 6). Available from: http://java.sun.com/
javase/6/docs/api/java/lang/Math.html#random(), 2008.
[32] J. Walker. HotBits: Genuine random numbers. Available from: http://www.fourmilab.
ch/hotbits/, Sept. 2006.
Bibliography
71
[33] J. Walker. How hotbits works. Available from: http://www.fourmilab.ch/hotbits/how3.
html, Sept. 2009.
[34] J. Walker. Hotbits statistical testing. Available from: http://www.fourmilab.ch/
hotbits/statistical_testing/stattest.html, Sept. 2006.
[35] J. Walker. Re: [feedback] (bulk request). Personal Communication, 2009.
[36] E. W. Weisstein. BBP-type formula. From MathWorld–A Wolfram Web Resource. http:
//mathworld.wolfram.com/BBP-TypeFormula.html, 2009.
[37] E. W. Weisstein. Chi-squared test. From – A Wolfram Web Resource. http://mathworld.
wolfram.com/Chi-SquaredTest.html, 2009.
[38] E. W. Weisstein. Irrational number. From – A Wolfram Web Resource. http://mathworld.
wolfram.com/IrrationalNumber.html, 2005.
[39] E. W. Weisstein. Normal number. From – A Wolfram Web Resource. http://mathworld.
wolfram.com/NormalNumber.html, 2005.
[40] Wikipedia.
Linear congruential generator — wikipedia, the free encyclopedia.
Available from: http://en.wikipedia.org/w/index.php?title=Linear_congruential_
generator&oldid=302017530, 2009.
[41] P. W. H. Wong. Project specification feedback. Personal Communication, 2009.
[42] P. W. H. Wong. Project design feedback. Personal Communication, 2009.