UNIVERSITY OF LIVERPOOL 2D Graph Exploration Using Irrational Numbers by Andrew Paul Collins A dissertation submitted in partial fulfilment for the degree of Master of Science in Advanced Computer Science in the Faculty of Science Department of Computer Science September 2009 UNIVERSITY OF LIVERPOOL Abstract Faculty of Science Department of Computer Science Master of Advanced Computer Science by Andrew Paul Collins Irrational numbers such as Pi are often suggested as being random, in this project it is studied as to whether Pi and log 2 are sufficiently random such that they may be a suitable for use within random number generators for the purposes of performing random walks. In order to do this, Pi and log 2 were generated to 600,000 places using the Bailey-Borwein-Plouffe (BBP) algorithms and random walks were performed in 2-dimensional graphs of various sizes. Further to performing the walks, the Chi Square distribution of the sequences is also calculated to measure if randomnes truly affects of the walk. To benchmark the significance of the results, both pseudo-random (Linear Congruential Generator and Mersenne Twister) and true-random (HotBits and RANDOM.ORG) are also used in the experiments to provide a measurement of how well the irrational numbers really perform. The results show that log 2 and Pi are interesting alternatives to traditional generators, however, further work is needed. Specifically, the deterministic nature of irrational numbers along with their known good randomness qualities ensures that regardless of the seed, the quality of the sequence is more likely to be good. Further, a series of interesting observations unrelated to the goals of the project were also made, such as an indication that the irrational numbers may become more random as the start position increases, and additionally, some evidence that the movement controlled by the digits of Pi within a 2D graph is circular. Finally, a potential interesting application for BBP is shown in cryptography due to its relatively low time and space complexity. Acknowledgements I would like to acknowledge and thank those who have provided advice, guidance and support throughout this project: Prof. Leszek A. Gąsieniec (Supervisor) for his help and guidance throughout this project. Dr. Russell Martin (Second Supervisor), for his positive and constructive feedback throughout the project assessment stages, in particular those relating to the usage of true random number generators. Dr. Prudence Wong (Assessor) for her positive and constructive feedback throughout the project assessment stages. Finally, I would like to thank family and friends who have tolerated the many hours I have dedicated purely to this project over the previous months. iv Contents Abstract iii Acknowledgements iv List of Figures vii List of Tables ix Abbreviations xi Symbols 1 Introduction 1.1 Scope . . . . . . . 1.2 Problem Statement 1.3 Approach . . . . . 1.4 Outcome . . . . . . 1.5 Outline . . . . . . . xiii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Random Walk in Robotics . . . . . . . . . . 2.2 Random Number Generators . . . . . . . . . 2.2.1 Pseudo Random Number Generators 2.2.2 True Random Number Generators . . 2.3 Irrational Numbers . . . . . . . . . . . . . . 2.3.1 Generation of Irrational Numbers . . 2.4 Chi Square Test . . . . . . . . . . . . . . . . 3 Methodology 3.1 Proposed Solution . . . . . . 3.2 Pre-Processing . . . . . . . 3.2.1 Storage of Data . . . 3.2.2 Generation of Digits . . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 4 . . . . . . . 7 7 8 8 9 10 11 12 . . . . 15 15 16 16 17 Contents 3.3 vi Processing . . . . . . . . . 3.3.1 2D Walks . . . . . 3.3.2 Chi Square Test . . 3.3.3 Standard Deviation Post-Processing . . . . . . 3.4.1 Log Analyser . . . 3.4.2 Heat Maps . . . . . Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 21 21 21 22 23 24 4 Experimental Work 4.1 Generation of Digits . . . 4.2 Performance of Walks . . . 4.2.1 Experiment One . 4.2.2 Experiment Two . 4.2.3 Experiment Three 4.2.4 Overall Analysis . . 4.3 Further Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 28 29 30 31 32 34 3.4 3.5 5 Evaluation 39 5.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Professional Issues 43 6.1 Code of Conduct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Code of Good Practice . . . . . . . . . . . . . . . . . . . . . . . . . 44 7 Conclusions 47 7.1 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A Project Specification 51 B Project Specification 57 C Project Implementation 63 Bibliography 69 List of Figures 3.1 3.2 3.3 3.4 Example of the Log Analyser output . Summary of walks where x = y from 10 Heat map of 85x85 grid using Pi . . . . Heat map colour scale . . . . . . . . . 4.1 Percentage of 50x50 grids with a good Chi Square distribution at various starting points . . . . . . . . . . . . . . . . . . . . . . . . . 35 Percentage of 25x25 grids with a good Chi Square distribution at various starting points . . . . . . . . . . . . . . . . . . . . . . . . . 36 Heat map of 122x122 grid using Pi . . . . . . . . . . . . . . . . . . 37 4.2 4.3 vii . . . . to 125 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 23 24 24 List of Tables 2.1 Chi Square Distribution Table . . . . . . . . . . . . . . . . . . . . . 13 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Summary Summary Summary Summary Summary Summary Summary Summary Summary of of of of of of of of of experiment experiment experiment experiment experiment experiment experiment experiment experiment one – Walk speed . . . . . one – Chi Square Test . . . one – Standard Deviation . two – Walk speed . . . . . two – Chi Square Test . . . two – Standard Deviation . three – Walk speed . . . . three – Chi Square Test . . three – Standard Deviation ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 30 30 31 31 31 32 32 32 Abbreviations BBP Bailey Borwein Plouffe 137 Caesium-137 Cs LCG Linear Congruential Generator OOP Object Orientated Programming PRNG Pseudo Random Number Generator PHP PHP Hypertext Processor TRNG True Random Number Generator xi Symbols χ2 Chi Square Test ∞ Infinity π Pi log 2 Logarithm of 2 in base-10 φ Golden Ratio e Euler’s Number xiii Chapter 1 Introduction The generation of random numbers is too important to be left to chance. Robert R. Coveyou (1915-1996) 1.1 Scope Exploration using robots is a rapidly growing topic in recent times due to their interesting and sometimes vital applications. Currently exploration robots are used in many places such as in the exploration of other planets [22], exploration of heavily polluted or hazardous environments [30], or simply for domestic vacuum cleaning [23]. In the majority of these cases, the exploration robots are often remotely controlled by a human simply due to the complexity of the task being outside of our technological boundaries (e.g. explosive defusal) or simply due to social concerns (e.g. unmanned combat air vehicle – UCAV). However, there are many applications where an autonomous robot could be critical such as in the exploration of areas where the terrain is simply unsuitable for wired or even wireless contact (e.g. exploration of Lunar or Martian lava tubes). 1 Chapter 1. Introduction 2 In this project, the objective is to explore the concept of autonomous robots, in particular looking at a new method for robots to make decisions when faced with multiple choices such as when faced with multiple doors, which door to traverse through. While this project will focus purely on the concept of a robot being placed within an initial room and then letting it autonomously visit all linked rooms (i.e. exploration), it would be possible to use the concepts shown in other decision making processes such as which object to interact with first. Further, the key aspect of this project is to investigate in to memoryless decision making where decisions are made without any prior knowledge of previous actions. Specifically, the decision making process of the robots in this project will mimic that of a probabilistic method. A probabilistic robot is not only memory efficient it can also generate interesting applications as its actions become unpredictable. A potential application of an unpredictable robot is that of a sentry that patrols a building checking each room for intruders but has no specific pattern as to which rooms are checked in which order. Memory efficiency also becomes important when a robot must explore something so vast that its memory requirements could become a burden. 1.2 Problem Statement The problem with memoryless robots is that they require a method of generating decisions, such as through the use of random walks which require random number generators. However, regardless of the random number generation source used, the speed at which a robot completes the process of visiting each room can be slow in comparison to a deterministic robot. While speed is not in all cases an issue, the distribution of visits to rooms can be, as it would be preferred if a robot, if for example in the sentry or vacuum cleaning robots, visits each room on an equal or near-equal number of occasions. Therefore there are two factors in this project that are explored in the search for an alternative source for random numbers: • The method must be faster in completing the walk. Chapter 1. Introduction 3 • The distribution of visits must be more equal even at the sacrifice of speed. Preferably, a new method would have both factors, but this is not strictly necessary. 1.3 Approach In this project, the approach used to solve this problem using a method that is deterministic but could be viewed as probabilistic. In various prior works, attempts have been made to prove that the expansions of various irrational numbers such as π are random making them in essence, an infinite series of pre-generated random numbers. In this project the expansions of irrational numbers are used as a random number generator in replacement of a pseudo random number generator (PRNG) or true random number generator (TRNG). To test the performance of irrational numbers a series of experiments using the expansions of irrational numbers are performed and in addition, a comparison using both poor quality and good quality PRNGs is included. Further, a comparison with two TRNG services is also performed. For the experiments, 2D graph (G = (V, E)) exploration is used, whereby the set of vertices, V , are the rooms and the set of edges, E, are doorways leading between rooms. The exploration of the graphs is performed using a form of random walk [13], or a deterministic counter-part, whereby an agent makes decisions probabilistically using a random number generator. To validate against the PRNGs and TRNGs, walks are performed on various graph sizes using each irrational number generator and then a measure of speed and distribution are taken to identify which performed best. Further, a measure of the randomness of the sequence used to complete a walk is taken, to see whether randomness is a factor in the quality of a walk. Chapter 1. Introduction 4 For the purposes of this work, while the graph model is used, all graphs are treated as 2D grids, which could be viewed as the grid model being used. 1.4 Outcome To summarise, the project was completed successfully, however, a number of weaknesses were discovered particularly due to use of the incorrect tools for the task – simply due to not known alternatives existing. While some evidence has been shown that the expansions of irrational numbers do provide an alternative, there is simply insufficient data to provide the sufficient number of experiments needed to ensure this would be worthwhile. What has been discovered in this project are interesting observations as to the randomness of the expansions of irrational numbers. First and foremost, it is shown that log 2 appears to become more random as the starting position of a sequence increases. Secondly, it is shown that the movement of π within a random walk is not evenly distributed, and instead is circular in that it starts from a central point and slowly moves outwards, frequently visiting its starting area as it explores. Finally, in the conclusion and and evaluation, a number of ideas are presented for future projects as well as new applications for the BBP algorithms that are not known to have been discussed previously. 1.5 Outline This work provides a background introduction to the types of random number generators available, as well as the best known methods of generating irrational numbers in binary bases. In the third chapter, the method of how the experiments were performed is explained, such as the creation of the software and the reasoning for why the software was required to be broken up into three components. Chapter 1. Introduction 5 In chapter four, the results are discussed from the three main experiments performed in this project, along with the results of the additional work performed; including the introduction of the interesting observations made that were unrelated to the scope of this project. In later chapters a critical evaluation of the work is provided with a discussion on the weaknesses and strength of the project, along with suggestions of how to improve this project should a repeat of the experiments be of interest. At the end of the document, the conclusion is provided with an overview of the experimental work and a discussion as to the potential applications that maybe available should further work be done in this area. Chapter 2 Background Exploring pi is like exploring the universe. David Chudnovsky (1947–Present) 2.1 Random Walk in Robotics The concept of random walks in robotics as a method of movement has been shown to be viable in [23], which used a random number generator to perform random walks in an autonomous vacuum cleaner. While the focus of this work was on the technical challenges of implementing such a robot, from the results (202 m covered in 1 hour 32 minutes) it is clear to see that the speed of such a method is incredibly slow. However, [23] differs from work performed in this project as floor coverage is not of interest, instead the focus will be purely on the speed of visiting all rooms within a set of rooms. Further work that uses the concept of a random walk is the iRobot Roomba which utilises a random walk as a fall back algorithm should the robot come into difficulty (i.e. lost or stuck) or when a floor plan is not known [21]. What both of these examples highlight, is that random walks can be implemented successfully within robotics. While the speed of room coverage for [23] was slow 7 Chapter 2. Background 8 in comparison to a human doing the same task, the speed is not realistically an issue if the task can be completed without human interaction. The cost-benefit of this becomes more significant if viewed using examples of autonomous robots in other-world environments where a human may not necessarily be able to interact with the robot. 2.2 Random Number Generators In both of the previous robotic vacuum cleaner examples, random number generators were used as the probabilistic element of the random walks. While in neither case was it specifically stated as to exactly what random number generator was used, there are many possibilities it could be. Random number generators can be classified into two groups, true random number generators (TRNG) – where by the random numbers are gathered from a random (e.g. quantum) source – and pseudo-random number generators (PRNG) – where by the random numbers are generated using a deterministic procedure. A TRNG is often not available within standard computing devices due to the lack of perceived need for one to be included [11]. While TRNG hardware is available to those that specifically require this functionality, they are not standard equipment and are often highly expensive. Due to this lack of a TRNG being available as standard and in the majority of cases the quality of a TRNG rarely being necessary, an alternative exists – PRNG. 2.2.1 Pseudo Random Number Generators A PRNG does not use a true random source like a TRNG, instead its random numbers are generated through the execution of an algorithm. The perhaps most well known and most implemented PRNG algorithm is the Linear Congruential Generator (LCG) which uses the algorithm shown in [20]. The LCG algorithm is fast and memory efficient making it highly suitable for applications where simple Chapter 2. Background 9 random numbers are required. However, it has been shown in many cases that the quality of the random numbers generated is poor [26] and as such the LCG should not be used in any applications that depend on good random numbers, e.g. lotteries. Further, in the majority of LCG implementations, the period size is 232 [40]. While periodicity is a concern, a small period is not always the case in all PRNGs. A prominent example of a good PRNG is the Mersenne Twister [24]. The Mersenne Twister is designed so as to pass tests for randomness, such that it could be used within applications that depend on randomness (e.g. Monte Carlo methods). However, not only does it show to work well within randomness testing, it also has a period of (219937 − 1) [24], far larger than that of the LCG. Unfortunately though, while it performs well in randomness tests and has a period far beyond any possible need1 , it is not without its disadvantages. While the Mersenne Twister is faster in more recent work [29] it is still slow at starting up in comparison to the lightweight LCG. Further it also requires a much greater amount of memory, where as LCG only requires one word, Mersenne Twister requires 623 words (i.e. 2,492 bytes if using 32 bit integers compared to 4 bytes if using LCG). The LCG and the Mersenne Twister show two extremes, where one is fast yet does not provide good random numbers, and the other is slow yet provides good random numbers. For the purpose of this project, it is believed it would be interesting to see how well irrational numbers compare to both of these PRNGs, and as a subtask it would be interesting to see whether good random numbers are any better than bad random numbers in the case of random walks, or whether it has no effect. 2.2.2 True Random Number Generators As mentioned previously, for a TRNG to be used, a source of randomness is required. While there are multiple possible sources within most personal computers [11], none are standardised or even natively available to the system for usage. 1 19937 2 − 1 is approximately 4.3 × 106001 which is far greater than the predicted number of atoms – approximately 1080 – in the current observable universe. Chapter 2. Background 10 Fortunately peripheral hardware capable of generating true random numbers does exist, however, these are often expensive. An inexpensive method that is available for use, is in using available Internet based services such as those available on the WWW. There are two potential services that are perceived to be TRNGs, RANDOM.ORG2 and HotBits3 . RANDOM.ORG generates random bits using atmospheric noise. The atmospheric noise is provided by tuning a very inexpensive and old radio (such that it does not have noise filters) into an unused frequency so that static is generated which can be used to generate bits [16]. HotBits, differs in that instead it uses the radioactive decay of Caesium-137 (137 Cs) as a means of generating bits [33]. 2.3 Irrational Numbers Irrational numbers are numbers which are believed to have infinite non-periodic expansions. Possibly the most popular irrational number is the number π which is the ratio of a circles circumference to its diameter. However, there are many √ others such as the log 2, 2, e (Euler’s Number), and φ (Golden Ratio). Further, these can also be combined or manipulated while also remaining irrational. For √ example, π 2 and π 2 remain irrational. Definition 2.1. “An irrational number is a number that cannot be expressed as a fraction p q for any integers p and q.” [38] Further, while many authors show that irrational numbers, such as π, are normal of many known intervals of the expansions, there yet exists no formal proof that the numbers are truly normal [6]. 2 3 RANDOM.ORG: True Random Number Service — http://www.random.org HotBits: Genuine Random Numbers — http://www.fourmilab.ch/hotbits/ Chapter 2. Background 11 Definition 2.2. “A normal number is an irrational number for which any finite pattern of numbers occurs with the expected limiting frequency in the expansion in a given base (or all bases).” [39] However, what can often be agreed, is that the property of normality holds for the digits discovered so far for many irrational numbers – whether this will remain true as more digits are discovered can only be assumed. This creates an interesting prospect as not only do the expansions of irrational numbers lack periodicity they are also so far believed as being normal, which indicates that certain irrational numbers may in fact be random [5]. This indicates that the expansions of irrational numbers could be utilised in a random number generator, assuming the property of normality holds. In a recent work, Mitsui [25] shows that π has potential of being utilised as a random number generator and in fact in the tests undertaken, outperforms the LCG and Mersenne Twister. 2.3.1 Generation of Irrational Numbers For the generation of the expansions of various irrational numbers the Bailey– Borwein–Plouffe (BBP) Algorithm [7] is one of the most suitable methods. It was first shown to be capable of generating π (eq. 2.1) in base-16 with very low memory and processing requirements. ∞ X 1 4 2 1 1 π= − − − 16k 8k + 1 8k + 4 8k + 5 8k + 6 k=0 (2.1) Since the initial work on BBP, the binary expansions of many other irrational numbers have been shown to be capable of being generated using BBP-style algorithms giving rise to P-notation [5] as shown in eq. 2.2. ∞ n X 1 X aj P (s, b, n, A) = k b j=1 (kn + j)s k=0 (2.2) Chapter 2. Background 12 π and many other irrational numbers have been written in this form [1, 4, 36]. This makes BBP of great interest to the project as it would allow the binary expansions of many irrational numbers to be added to the software quickly once initial support for P-notation is added, however, this is not a necessary objective of the project. While not expected to be necessary to this project, BBP has also shown to be capable of generating non-binary bases such as decimal [14, 28]. Further to this, the BBP algorithm generates binary digits, this allows for irrational numbers to be generated in binary bases, such as base-4. Due to the grid model being used in the graph exploration this allows for the irrational numbers to be generated natively in base-4 and mapped directly to an individual port. This removes any aspect of unfairness existing in the project should for example base-10 digits need to be mapped to four possible edges. 2.4 Chi Square Test While the performance of the random walks themselves will be a major factor within the outcome of the results, it would also be of interest to test the randomness of the sequences that completed the walks. This would assist in the understanding as to whether a good sequence is better than a bad sequence. The Chi Square (χ2 ) Test provides an indication of the probability of a given sequence occurring. While the χ2 test in this usage would not necessarily indicate how random a sequence is, as for a sequence to be random all permutations of a given sequence are possible, even if that is several hundred consecutive zeros. However, what χ2 will provide is an indication of whether a sequence should be viewed with some suspicion or not due to the probability of such a case occurring. Particularly it would be of interest which performs best, those sequences which arouse suspicion or those that do not. 2 χ = n X (Ei − Oi )2 i=0 Ei (2.3) Chapter 2. Background 13 The algorithm to calculate the χ2 distribution of a given expected and observed data is shown in eq. 2.3, where E is the expected frequency and O is the observed frequency. In this project E is Number of Digits b where b is the base that the digits were generated in, which in this case will be base-4. v v v v v p = 1% p = 5% p = 25% p = 50% p = 75% p = 95% p = 99% = 1 0.00016 0.00393 0.1015 0.4549 1.323 3.841 6.635 = 2 0.02010 0.1026 0.5754 1.386 2.773 5.991 9.210 = 3 0.1148 0.3518 1.213 2.366 4.108 7.815 11.34 = 4 0.2974 0.7107 1.923 3.357 5.385 9.488 13.28 = 5 0.5543 1.1455 2.675 4.351 6.626 11.07 15.09 Table 2.1: χ2 distribution table [19, pg. 44] On calculating the χ2 distribution, a decimal value is generated. To understand the decimal values, a look up table (such as table 2.1) is required that converts the given decimal value to that of a probability. Each row within a χ2 table is limited to a certain degrees of freedom v, in the case of this project the degrees of freedom will be v = 3. The reason for this is that all sequences will be in base-4 which results in four possible values, k. The degrees of freedom is v = k − 1. Through using these look up tables it can be established as to what the probability of a given sequence is likely to occur. Realistically for something to be random it should occur with ≥ 25% and ≤ 75%, ≥ 1.213 and ≤ 4.108 respectively. Chapter 3 Methodology Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. John von Neumann (1903–1957) 3.1 Proposed Solution To identify whether irrational numbers are a viable solution to the problem of performing memory efficient walks, it was proposed that a software solution was to be created that was capable of not only generating the expansions of irrational numbers, but also capable of performing the 2D random walks as a means of testing whether irrational numbers are a viable alternative. In this chapter, it is identified how the software solution was constructed to generate the required results. The solution consists of three stages, a pre-processing stage, a processing stage, and a post-processing stage. Each of which are discussed in their respective sections. Further, in this chapter it is shown how the software was tested. In the proposed solution, the BBP algorithm will be used for the generation of base-4 digits as it was shown capable of doing so in §2.3.1. Specifically, due 15 Chapter 3. Methodology 16 to two PRNGs (LCG, and Mersenne Twister) and two TRNGs (HotBits, and RANDOM.ORG) being implemented, two BBP algorithms were proposed to be used, a constant – π – and a logarithm – log 2. This chapter contains a combination of the design and implementation phases of the project. For the initial project proposal please see appendix A. For further information on the separate design and implementation stages please see the poststage documentation in appendix B and appendix C respectively. 3.2 Pre-Processing A requirement of the software was that the expansions of the irrational numbers must be generated in base-4 such that they can be fairly mapped to a specific movement direction. Due to the large number of grids that were expected to be used it would be inefficient to generate digits on-demand for each grid, especially as the time complexity of generating an individual digit using the the BBP algorithm is linear. The solution to this was to perform a stage of pre-processing whereby databases for each digit generator are created prior to the walks. This allowed digit generation to be performed once for all walks. As a side effect, this results in disk space being required to store the many digits which are generated, however, the processing time savings easily make this a negligible cost. 3.2.1 Storage of Data One potential concern was that the digits are in base-4 (0, 1, 2, 3) having a storage requirement of 2 bits (00, 01, 10, and 11 respectively) which could potentially cause overheads as it is common for programming languages to operate natively in bytes. As one digit is stored per byte, this creates an overhead of 6 bits per digit stored. Chapter 3. Methodology 17 To highlight the problem, if the software were to generate up to the potential internal limits of the software (assuming 32 bit unsigned integers) then it would be generating up to n = 232 digits, resulting in a disk space usage of O(n) bytes (4 GB) per generator. Assuming that there are multiple irrational number generators, PRNGs, and TRNGs the disk space requirements could become an issue. A simple yet effective solution designed to resolve this was to utilise bit packing whereby multiple digits are packed into a single byte. This allows for four digits to be stored in every one byte with no overhead reducing the disk space requirement to O( n4 ) bytes (1 GB). While realistically it was not expected that 232 digits would be generated, for the purposes of future proofing the software it was reasonable that efficient storage methods should be utilised. 3.2.2 Generation of Digits To ensure the software is expandable it utilises concepts of Object Orientated Programming (OOP) which are natively available within the Java programming language. One of the first features used was an interface. An interface simply provides a means of specifically stating exactly which methods must exist within any class that implements it. Effectively this standardises the methods available within a class. This becomes advantageous at later stages as functionality that intends to use these classes only needs to be written to handle one set of methods rather than many. Secondly, the work requirement for each generator was reduced by increasing the amount of re-usable code. To do this a superclass was created. A superclass is a full or partially written class that can be extended at a later time. In this case the superclass adds the majority of the functionality that would be required by each generator, such as the ability to write to a log file, or performing efficient Chapter 3. Methodology 18 modular exponentiation. However, the superclass does not specifically have any functionality of how to generate digits. The final feature used is subclasses. A subclass is a class that inherits the functionality of a superclass. In the software, the subclasses inherit the features of the previously mentioned superclass allowing it to have all the core functionality available natively without the need to re-write them. This has allowed for the core functionality to be centralised such that only the features that make a generator unique need to be written – such as the implementation of a BBP algorithm, or the usage of a PRNG. Further, all subclasses not only inherit the superclass, they also implement the interface allowing access to the generators uniformly. As stated previously, π and log 2 were implemented using the BBP algorithms, [7] and [1] respectively. The LCG was implemented with relative ease due to support being natively available within the Java libraries [31] and further the Mersenne Twister was also implemented with similar ease due to an existing implementation available within the Colt Project libraries [10]. The two TRNGs did however provide some initial difficulties. Both TRNGs are Internet services and as such have been designed to prevent abuse by limiting the number of digits generated. While RANDOM.ORG provides a daily allowance of 1,000,000 digits per day it did create difficulty when trying to debug the project software or generating various test cases. Fortunately, RANDOM.ORG seems to have foreseen the problem of its users requiring high volumes of data as it provides pre-generated files which are updated every 24 hours [17]. The pre-generated files are simply a binary file containing 1 MB of data (4,194,304 base-4 digits). In addition it contains an archive of all previous generated databases providing a huge amount of data to work with. Conveniently the data files utilise bit packing just as the project software itself uses for its own databases resulting in the files being suitable for use without any modification. The project software has been designed to automatically download the latest of these files and use them as a digit database. In cases where this functionality is not required, databases can be manually downloaded and added to the software. Chapter 3. Methodology 19 The second TRNG implemented, HotBits, also suffered from this same issue as its daily allowance is 16,384 bits making it completely unsuitable for use in the software. Further, HotBits does not provide pre-generated files as was the case with RANDOM.ORG. However, after contact with the HotBits author [35], two 16 MB (67,108,864 base-4 digits) data files became available to the project, or one file per HotBits generation hardware, which had been pre-generated for previous statistical testing [34]. Once again, as the data files are bit packed they can be simply loaded in to the software. 3.3 Processing Once the digit databases have been been created, the required walks can be performed. Within the main processing stage the previously built databases are used to perform the walks. Further, due to the number sequences produced by the generators being directly used in this stage, the additional tests such as χ2 test and the calculation of standard deviation are performed immediately after the walks complete. As both tests depend on the results of the walks it is sensible to compute these values now before the result data is released from memory to reduce additional processing later. 3.3.1 2D Walks The 2D walks as mentioned are a core aspect of this project, as it is in these tests that it is expected that the performance of the expansions of irrational numbers will be visible. The walks that will be performed will traverse either squares (x × y where x = y) or rectangles (x × y where x 6= y)1 . Further, while the size of the graphs will be rectangular or square, the edges of the shapes will be transparent as the graph will be treated as a torus. 1 For all cases x and y are always assumed to be integers greater 0. Chapter 3. Methodology 20 Specifically due to the graphs being treated as grids, the most efficient method of implementation is through the use of a multi-dimensional array, a feature commonly available in most modern programming languages. Further, by using an integer array rather than a boolean array, it is possible to record the number of visits the agent makes to any given vertex. By using an array the memory required to perform the walks is O(x×y) plus the addition of two bytes that act as a pointer to the current vertex the agent is positioned in. A potential alternative to this is to use linked lists, however, this would require substantially more memory as each vertex would be required to store a reference to all four of its edges. If the graphs were not square or rectangular in shape, then this could possibly be more memory efficient over the grid method that is used. Performing the walks uses the previously generated databases, for each walk the database is opened and read sequentially – unpacking each byte to four digits as each new byte is read. For each digit extracted, the digit is mapped to a direction (x ± 1 ∨ y ± 1) and the agent will perform one step in that direction. In a physical environment this could be envisioned as a robot going through one of four doors and into a different room. At the initial state all vertices will be set to 0 – in the array – indicating that the vertex has not yet been visited. Upon the agent visiting a vertex, the vertex increments its value by 1, thus recording how many visits occurred in a given vertex. Once all vertices are greater than 0 the walk is then considered complete as all vertices can be classified as being visited. The end result is that a matrix of size x × y holding at each vertex the number of visits is produced. These matrices are stored to disk as a particular interest is how well irrational numbers perform in generating a relatively equal amount of visits to each vertex. This will be discussed further in §3.3.3. Further, these matrices can also be used to generate a heat map which will also be covered further in §3.4.2. Chapter 3. Methodology 3.3.2 21 Chi Square Test Additionally in the performance of the walks, a frequency list is maintained of the number of times a digit occurs within the sequence. The list contains no more than four values, one value for every base-4 digit and as such is implemented as a 1D array with each digit referring to an index value. In terms of the χ2 test, this list will be the observed data. By now having the observed values and the expected values2 as is required by the χ2 test, the χ2 distribution can be calculated as per the algorithm in eq. 2.3. 3.3.3 Standard Deviation To calculate how even the distributions are across every vertex, the standard deviation of the matrices produced in the walks are calculated. The standard deviation provides a measure of how well a given series of numbers deviate from the the average mean of the list. In the best case the result will be 0, however, it is unlikely this will ever occur. 3.4 Post-Processing In the processing stage, an individual walk (or multiple unrelated walks) is performed and the additional statistics are calculated relating to them. After each walk, the results of the walks are recorded to a log file so that a collection of walks can be analysed at once. It is the analysis of these log files which occurs in the post-processing stage. By reviewing each log file simultaneously the different generators performance at each grid size can be compared to see how well the irrational numbers perform in the walks. 2 See §2.4 for how the expected data value is calculated Chapter 3. Methodology 3.4.1 22 Log Analyser To analyse the log files for several walks a PHP (PHP: PHP Hypertext Processor) based web front end was created. The PHP script reads each log file simultaneously and provides the information as to which digit generator performed the best in each measurement per grid size on the basis of walk speed. Further, for each grid and generator, the χ2 distribution is colour coded to indicate whether the values were rejected, suspected or almost suspect so that it can be seen as to whether the randomness of a sequence affects the speed. An example of the log analyser output can be found in fig. 3.1. Figure 3.1: Example of the Log Analyser output A further feature of the log analyser is it provides a summary of how well each generator performed overall in the areas of speed, χ2 and standard deviation for every grid size tested in a given session. An example of this output can be seen in fig. 3.2. Chapter 3. Methodology 23 Figure 3.2: Summary of walks where x = y from 10 to 125 3.4.2 Heat Maps As an addition to the log analyser, it also provides the functionality to view the matrices that were produced in the processing stage. These matrices have been produced as heat maps. A heat map is a visual representation of the matrices with all of the values normalised to colour values. In the heat map generator every 10th percentile will be normalised to a certain colour. The colours selected were generated by ColorBrewer3 [18]. Using these maps it will be possible to see the variance of the distribution of visits within a walk. While the standard deviation will provide an indication of the walks, the heat maps will show any interesting patterns that may appear. An example heat map generated by the software is in fig. 3.3 and the colour scale used is available in fig. 3.4. 3 Colorbrewer: Color Advice for Maps — http://www.colorbrewer2.org Chapter 3. Methodology 24 Figure 3.3: Heat map of 85x85 grid using Pi Figure 3.4: Heat map colour scale 3.5 Testing The primary function to test in the software was that of the irrational number expansion generation. To verify the generators were working, verification was completed by generating the 106 , 107 , and 108 digit of both π and log 2 and comparing Chapter 3. Methodology 25 it to the results provided within [7]. Once the digits matched, it was believed the generators were performing as designed. Chapter 4 Experimental Work I am ashamed to tell you to how many places of figures I carried these computations, having no other business at the time. Sir Isaac Newton (1643-1727) 4.1 Generation of Digits While the generation of digits for both the PRNGs and the TRNGs were trivial due to external data available or the speed of their algorithms – as discussed earlier – the case was not the same for the BBP algorithms. The issue that arose with the BBP algorithm is that due to it operating as a spigot algorithm, to generate a sequence requires generating each digit from 1 to n independently. The time complexity to do this becomes O( n(n+1) ) (i.e. that of a triangular number) as each 2 digit must be generated independently. n X n(n + 1) O( )= O(d) 2 d=0 (4.1) This disadvantage of the BBP algorithm results in an incredibly slow processing time to generate large sequences. Due to this and due to the limited resources available to the project, 600,000 digits were generated for both π and log 2. While 27 Chapter 4. Experimental Work 28 a considerable number, it required 3 weeks of processing time using the authors system to generate this amount. 4.2 Performance of Walks Using the 600,000 digits, random walks were performed on grids of size 10 × 10 up to 100 × 100. For every x from 10 up to 100, y was set from 10 up to 100 (see algorithm 1). Algorithm 1 Generation of grids 1: for x = 10 to 100 do 2: for y = 10 to 100 do 3: doWalk(x, y) 4: end for 5: end for This created 8,281 independent walks that were performed by each of the six generators. To make the experiments fair for the PRNGs and TRNGs, the experiments were performed three times. The aim of this was to balance out the chance of an extremely good or extremely bad sequence being generated by a PRNG and skewing the experiment. Within the next three subsections will be a brief analysis of noteworthy results within each of the three experiments. Each of these sections reproduce the summary tables that were generated in the post-processing stage. When the postprocessing stage is performed, every generator is ranked against each other generator on a grid by grid basis. In the case of walk speed and standard distribution, rankings are ordered by the smallest value being best (i.e. shortest walk, or best distribution) – with the best generator being ranked in 1st and the worst being ranked in 6th place for those tests. The summary tables are frequency matrices of how many times the generator was ranked at each position for all tests performed. Due to this a generator that performs well would hope to score highly in the first three rows and poorly in the remaining three in comparison to other generators. Overall, the sum of the percentages for one generator (i.e. sum of all percentages Chapter 4. Experimental Work 29 in one column) should be equal to 100%, equally, the sum of all percentages in an individual row should equal also 100% as a generator cannot be ranked at multiple positions and generally a rank will not be shared – however, this is possible but only in very rare cases. The table of grid by grid comparisons for each generator of which the summaries are taken from are incredibly large as they contain 8,281 records, as such they are unavailable within this document and are instead available upon the attached CD. 4.2.1 Experiment One The summary results of the first experiments are available in tables 4.1, 4.2, and 4.3. In the first test for walk speed RANDOM.ORG is a clear winner, as not only is it the fastest random number generator in 24.37% of walks, it is also the second fastest in a further 19.86% of walks. Further, it is also has the least number of walks gaining ranks, 4th , 5th , and 6th place. This positive result for TRNGs is also reflected in the second TRNG tested. HotBits, also dominates the upper rankings and also has the least number of walks in the lowest ranks. This is clearly, a positive result for TRNGs as in this first experiment they appear to be the most recommended choice to use when required to complete random walks fast. The next prominent result is in the PRNGs. The LCG is known for its poor randomness and that is possibly reflected in this experiment. The LCG had the least number of walks that were ranked within the top three positions – it only achieved 1st place in 11.25% of walks – making it the worst generator to use in terms of speed. However, regardless of the LCGs result, the Mersenne Twister performed much better as it holds more positions within the top three ranks. Further, the number of tests which were ranked in the top three is similar to that of the TRNG, HotBits. Finally, looking at the generators of most interest to this project, the irrational numbers – these tests vary. First and foremost, π is the fastest the least number of times, second only to the LCG with only 12.49%, a difference of just over 1%. Chapter 4. Experimental Work 30 Overall, π performs better than the LCG, but is far slower than that of the TRNGs or the Mersenne Twister in the majority of tests. Interestingly though, log 2 seems to perform much better and even outperforms the Mersenne Twister in terms of speed in slightly more tests. However, it is still far from the level of performance of the TRNGs. 1 2 3 4 5 6 Log2 1,412 17.05% 1,423 17.18% 1,471 17.76% 1,420 17.15% 1,392 16.81% 1,163 14.04% 1,034 1,156 1,484 1,618 1,481 1,508 Pi 12.49% 13.96% 17.92% 19.54% 17.88% 18.21% LCG 932 11.25% 1,136 13.72% 1,278 15.43% 1,418 17.12% 1,648 19.90% 1,869 22.57% 1,320 1,413 1,429 1,395 1,416 1,308 MT 15.94% 17.06% 17.26% 16.85% 17.10% 15.80% HotBits 1,565 18.90% 1,509 18.22% 1,306 15.77% 1,268 15.31% 1,293 15.61% 1,340 16.18% RandomOrg 2,018 24.37% 1,645 19.86% 1,313 15.86% 1,162 14.03% 1,050 12.68% 1,093 13.20% Table 4.1: Summary of experiment one – Walk speed Rejects Suspects Almost Suspect Normal Average Average (Norm) Log2 1,182 14.27% 1,833 22.14% 1,198 14.47% 4,068 49.12% 0.7907 0.6667 Pi 111 1.34% 841 10.16% 829 10.01% 6,500 78.49% 1.7882 1.7033 LCG 3 0.04% 506 6.11% 1,443 17.43% 6,329 76.43% 1.2621 1.1644 MT 13 0.16% 87 1.05% 155 1.87% 8,026 96.92% 1.7912 1.7739 HotBits 711 8.59% 2,779 33.56% 1,248 15.07% 3,543 42.78% 6.6407 1.5085 RandomOrg 0 0.00% 106 1.28% 602 7.27% 7,573 91.45% 1.7835 1.7451 Table 4.2: Summary of experiment one – Chi Square Test 1 2 3 4 5 6 Log2 1,588 19.18% 1,221 14.74% 1,473 17.79% 1,625 19.62% 1,331 16.07% 1,043 12.60% 853 1,128 1,443 1,645 1,885 1,327 Pi 10.30% 13.62% 17.43% 19.86% 22.76% 16.02% LCG 816 9.85% 950 11.47% 1,288 15.55% 1,401 16.92% 1,662 20.07% 2,164 26.13% 1,173 1,446 1,536 1,530 1,415 1,181 MT 14.16% 17.46% 18.55% 18.48% 17.09% 14.26% HotBits 1,904 22.99% 1,531 18.49% 1,237 14.94% 1,134 13.69% 1,130 13.65% 1,345 16.24% RandomOrg 1,947 23.51% 2,005 24.21% 1,304 15.75% 946 11.42% 858 10.36% 1,221 14.74% Table 4.3: Summary of experiment one – Standard Deviation 4.2.2 Experiment Two The summary results of the second experiments are available in tables 4.4, 4.5, and 4.6. The overall results for walk speed in this experiment have changed dramatically from those in the first experiment. In this experiment RANDOM.ORG has performed very badly and only 8.51% of walks were the best. Similarly the results for the other top three ranks were also weak, and this is further reinforced by the fact that RANDOM.ORG was in 6th position 29.28% of the time. HotBits, has however maintained its position to an extent as aproximately 48% of its walks tend to be in the top three positions. Chapter 4. Experimental Work 31 The Mersenne Twister is shown to be the best generator over all, with the most walks in both the first and second positions – outperforming the TRNGs. While the sequence generated was an improvement over the sequence used in the last experiment, it is not leading by far. Foremost however, the LCG has shown considerable improvement as its performance is now almost level with that of HotBits, and it also easily outperforms RANDOM.ORG. Finally, the irrational numbers have also shown considerable improvement in this experiment – possibly due to the very weak TRNGs. While, π has performed reasonably well, better than that of both HotBits and the LCG, it is log 2 that has shown the greatest improvement. While log 2 performed well in the first experiment, in this experiment it is second best to the Mersenne Twister with a similar amount of walks gaining the top ranks. 1 2 3 4 5 6 Log2 1,815 21.92% 1,497 18.08% 1,342 16.21% 1,358 16.40% 1,188 14.35% 1,081 13.05% 1,232 1,450 1,598 1,480 1,412 1,109 Pi 14.88% 17.51% 19.30% 17.87% 17.05% 13.39% LCG 1,521 18.37% 1,375 16.60% 1,299 15.69% 1,384 16.71% 1,397 16.87% 1,305 15.76% 1,851 1,566 1,363 1,261 1,151 1,089 MT 22.35% 18.91% 16.46% 15.23% 13.90% 13.15% HotBits 1,160 14.01% 1,350 16.30% 1,531 18.49% 1,461 17.64% 1,508 18.21% 1,271 15.35% RandomOrg 705 8.51% 1,040 12.56% 1,149 13.88% 1,336 16.13% 1,626 19.64% 2,425 29.28% Table 4.4: Summary of experiment two – Walk speed Rejects Suspects Almost Suspect Normal Average Average (Norm) Log2 1,182 14.27% 1,833 22.14% 1,198 14.47% 4,068 49.12% 0.7907 0.6667 Pi 111 1.34% 841 10.16% 829 10.01% 6,500 78.49% 1.7882 1.7033 LCG 15 0.18% 6 0.07% 31 0.37% 8,229 99.37% 3.0389 2.9838 MT 244 2.95% 1,333 16.10% 1,367 16.51% 5,337 64.45% 5.7617 2.7693 HotBits 10 0.12% 681 8.22% 1,159 14.00% 6,431 77.66% 3.8435 2.1108 RandomOrg 748 9.03% 2,096 25.31% 742 8.96% 4,695 56.70% 6.5047 2.3320 Table 4.5: Summary of experiment two – Chi Square Test 1 2 3 4 5 6 Log2 2,036 24.59% 1,284 15.51% 1,376 16.62% 1,493 18.03% 1,197 14.45% 895 10.81% 1,148 1,443 1,542 1,650 1,508 990 Pi 13.86% 17.43% 18.62% 19.93% 18.21% 11.96% LCG 1,650 19.93% 1,637 19.77% 1,418 17.12% 1,306 15.77% 1,342 16.21% 928 11.21% 1,718 1,805 1,489 1,172 1,169 928 MT 20.75% 21.80% 17.98% 14.15% 14.12% 11.21% HotBits 1,280 15.46% 1,428 17.24% 1,505 18.17% 1,530 18.48% 1,472 17.78% 1,066 12.87% RandomOrg 449 5.42% 684 8.26% 951 11.48% 1,130 13.65% 1,593 19.24% 3,474 41.95% Table 4.6: Summary of experiment two – Standard Deviation 4.2.3 Experiment Three The summary results of the third experiments are available in tables 4.7, 4.8, and 4.9. Once again, the overall best performing generator have changed. In Chapter 4. Experimental Work 32 this experiment the LCG far out performs any of the other generators, with over 25.91% of its walks being ranked the best (1st ) – and in total 62% of its walks were ranked within the top three ranks. This result is incredibly suprising, especially as it is a clear winner over not only the Mersenne Twister, but also both TRNGs. Finally, π performs relatively poorly again in this experiment – possibly due to the extremely good sequence generated by the LCG. However, once again log 2 seems to have a mediocre performance only outperforming π and RANDOM.ORG. 1 2 3 4 5 6 Log2 1,224 14.78% 1,248 15.07% 1,347 16.27% 1,495 18.05% 1,509 18.22% 1,458 17.61% 868 1,121 1,328 1,478 1,663 1,823 Pi 10.48% 13.54% 16.04% 17.85% 20.08% 22.01% LCG 2,146 25.91% 1,680 20.29% 1,369 16.53% 1,243 15.01% 992 11.98% 851 10.28% 1,409 1,491 1,502 1,359 1,311 1,209 MT 17.01% 18.01% 18.14% 16.41% 15.83% 14.60% HotBits 1,462 17.65% 1,511 18.25% 1,468 17.73% 1,396 16.86% 1,299 15.69% 1,145 13.83% RandomOrg 1,173 14.16% 1,231 14.87% 1,265 15.28% 1,313 15.86% 1,504 18.16% 1,795 21.68% Table 4.7: Summary of experiment three – Walk speed Rejects Suspects Almost Suspect Normal Average Average (Norm) Log2 1,182 14.27% 1,833 22.14% 1,198 14.47% 4,068 49.12% 0.7907 0.6667 Pi 111 1.34% 841 10.16% 829 10.01% 6,500 78.49% 1.7882 1.7033 LCG 974 11.76% 2,128 25.70% 2,316 27.97% 2,863 34.57% 7.7648 1.7849 MT 695 8.39% 2,326 28.09% 554 6.69% 4,706 56.83% 6.1291 2.0553 HotBits 28 0.34% 406 4.90% 1,123 13.56% 6,724 81.20% 4.1831 2.9859 RandomOrg 154 1.86% 532 6.42% 487 5.88% 7,108 85.84% 2.9223 2.3816 Table 4.8: Summary of experiment three – Chi Square Test 1 2 3 4 5 6 Log2 1,264 15.26% 1,186 14.32% 1,192 14.39% 1,291 15.59% 1,648 19.90% 1,700 20.53% 672 935 1,160 1,383 1,857 2,274 Pi 8.11% 11.29% 14.01% 16.70% 22.42% 27.46% LCG 2,481 29.96% 1,698 20.50% 1,491 18.01% 1,124 13.57% 822 9.93% 665 8.03% 1,268 1,593 1,557 1,618 1,262 983 MT 15.31% 19.24% 18.80% 19.54% 15.24% 11.87% HotBits 1,617 19.53% 1,644 19.85% 1,523 18.39% 1,456 17.58% 1,205 14.55% 836 10.10% RandomOrg 979 11.82% 1,225 14.79% 1,358 16.40% 1,409 17.01% 1,487 17.96% 1,823 22.01% Table 4.9: Summary of experiment three – Standard Deviation 4.2.4 Overall Analysis The results of the three experiments show the difficulty in trying to distinguish the best random number generator. The problem is that in one instance a random number generator can perform badly, yet in the same experiment with a different seed the generator can perform impressively. A perfect example of this effect is in experiment one and experiment three, as in these experiments the LCG appears to perform terribly, yet with a different seed is able to perform the best. Chapter 4. Experimental Work 33 This effect also causes problems as it makes it harder to distinguish as to whether other generators performed consistently or whether they also had performance changes. However, this effect is slightly negated by looking at the summary tables for both the χ2 test and the standard deviation. As the values differ per experiment it is clear to see the sequence qualities vary. An observation that can be made though is about the irrational number generators, throughout all these experiments the same irrational number sequences have been used, and regardless of how well the other generators have performed, the performance of the irrational numbers has always been consistent. This becomes interesting as that while the irrational numbers are not proving to be the best, nor are they always the worst however, in the case of log 2 it is proving a strong candidate in every experiment. In further experiments that were simply a repeat of experiment one with the starting point of the sequences increased by 50,000 digits this also showed true. While all other generators seemed to vary in performance, the performance of log 2 remained consistent. A further observation made within these experiments is in the standard deviation tests. In these tests it was hoped the generator that provides the best distribution of visits would become visible. Unfortunately, this test appears to be flawed as the standard deviation has become related to the speed of the walks. The less digits required to complete the walks results in the less probability of a high deviation. Due to this, the standard deviation has become an inaccurate measure. However, it should be noted that the connection does not always hold and in some cases the fastest generator is not always the one with the best standard deviation. In §2.4, it was stated that it would be of interest to know if the randomness of a sequence affects the performance of a walk. The answer to this question lies with the walk speed and χ2 summary tables. At first, from reviewing the results of RANDOM.ORG it appears as though it does, as RANDOM.ORG is the best generator in experiment one with 91.45% of walks having a good sequence distribution and is the worst generator in experiment two with only 56.70%. However, this correlation clearly does not appear with the LGC as in the first experiment Chapter 4. Experimental Work 34 where it scored the worst it had 76.43% of walks with bad sequences yet when it clearly out performed all other generators in experiment three it only had 34.57% of walks with good distributions. This seems to be further reinforced with the Mersenne Twister, in experiment two the Mersenne Twister performed the best in terms of speed yet only had 64.45% of walks with good sequences. While the summary tables can be misleading, further analysis of the actual tables themselves shows that the different χ2 distributions have no affect as to whether a walk will be fast or slow. 4.3 Further Observations Using the software produced for this project, it was possible to not only perform walks from the starting digit but also from any arbitrary digit. By performing multiple smaller sets of walks such as up to 50 × 50 it was possible to observe the effects this had on the irrational numbers. Due to the limited performance time available, these side experiments were limited to the irrational numbers only. In the first series of these experiments, walks were performed on grids of up to 50 × 50, at every 50,000 digit interval. In this experiment, the number of digits tested is variable due to the digits being tested only after the walk. From there a summation of the number of tests which scored well in the χ2 Test were generated and then plotted to a chart. Such that at the end result, a chart containing the number of tests which scored good χ2 distributions for both π and log 2 existed (see fig. 4.1). In this chart it became noticeable that as the start point point increased, the number of walks with good χ2 distributions also increased consecutively with a peak at 300,000 digits of 100% thus indicating that regardless of the length of the sequence, any sequence from the 300,000 digit up to the 350,000 digit appears to hold good randomness qualities. Strangely, suddenly after this point the percentage value drops sharply, indicating the opposite. Chapter 4. Experimental Work 35 To see if this same pattern occurs in other intervals, a similar series of tests were conducted using grids of up to 25 × 25 at each interval of 10,000 digits and the results once again plotted to a chart (fig. 4.2). While the same effect can not be observed in this graph, what can be observed is that when viewing the overall trend of the number of sequences with good distributions, the value seems to increase. By adding a trend line to the first chart, the same can also be viewed as occurring. Further, from viewing the second chart, it is possible to see that the overall number of good sequences appears to be converging towards the trend line. Unfortunately, due the limit of only 600,000 digits generated, it is hard to see if this is a trend that would continue as the starting point increases. 120% 100% 80% Log2 60% Pi Linear (Log2) Linear (Pi) 40% 20% 0% - 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 450,000 500,000 Figure 4.1: Percentage of 50x50 grids with a good Chi Square distribution at various starting points A further interesting observation was made when making some of the largest possible grids with the digits available. In these grids for π it is possible to observe a pattern appearing, as in the centre mass of the image the distribution of visits is Chapter 4. Experimental Work 36 16,000 14,000 12,000 10,000 8,000 Log2 Pi Linear (Log2) 6,000 Linear (Pi) 4,000 2,000 - Figure 4.2: Percentage of 25x25 grids with a good Chi Square distribution at various starting points quite low, while in the corners the distribution is higher – forming a cross shape within the middle of the image. As the agent is placed in the upper-left most corner (0,0) and as the grid is treated as a torus this indicates that the the growth of the exploration of π is that of a circle. The possible reason for this is that the agent first starts at the centre of the circle and progressively explores outward moving back and forward across the starting area building the circle outwards. While this has not been verified, this pattern seems to occur in most grids but becomes progressively more distinct as the number of digits increases. The most distinct of the heat maps that shows this is in fig. 4.3. This effect has not been observed in either of the TRNGs or PRNGs. Chapter 4. Experimental Work Figure 4.3: Heat map of 122x122 grid using Pi 37 Chapter 5 Evaluation Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted. Albert Einstein (1879-1955) 5.1 Strengths In terms of meeting the objectives of the project, the software has been successful as it is capable of generating digits to user defined limits and storing them very efficiently (zero overheads), performing 2D walks, performing the χ2 test, and calculating the standard deviation. In addition the generated matrix of a walk is saved along with a log file entry so that further analysis can be undertaken outside of the software. Looking specifically at the strengths, this project has an interesting contribution that the author has not previously seen in other works and that is in its testing of variable length sequence for randomness based on a rule rather than of fixed length intervals. In the case of this project the rule is the walks. The effect this has, is that the sequences are tested, and then a further few digits are added to the sequences and then the sequences are tested again. This process is 39 Chapter 5. Evaluation 40 repeated 8,281 times using sequences from a starting length of approximately 500 digits to a finishing length approximately 600,000 digits. This concept has proved to be interesting as it has shown particularly in the small experiments which were completed that the expansions of irrational numbers appear to become more random. Further it also provides a more realistic perspective as it tests sequences of a range of sizes rather than that of an arbitrary fixed length. A further strength of the project is the addition of heat maps which have shown the movement of the digits that would not normally be seen by using tests such as the standard deviation alone. Particularly, this functionality resulted in the observation reported in §4.3. 5.2 Weaknesses While all targets have been met, the project has not been able to tackle sufficiently large data sets to see if any of the observations hold in much larger sequences. Particularly the problem is in the time complexity of the BBP algorithm for generating sequences which as was mentioned is O(n2 ) where n is the length of the sequence required. While there are various other algorithms available for generating the expansions of irrational numbers, the problem lies that none appear as of yet to exist that operate natively in binary bases. Due to this fact, the size of the sequences were limited resulting in only relatively small grids being suitable for use in walks. While the walks performed have created interesting insights, it would be interesting to see if the observations that were made such as that of the irrational number, π, exploring from a central point outwards can also be observed in much larger grids. A further weakness of the project is in its usage of the standard deviation as a measure of visit distribution. While the standard deviation algorithm performed as expected, it did not create a fully accurate measure due to the length of walks being variable across each generator making it unsuitable for comparing across grids of the same size. Chapter 5. Evaluation 5.3 41 Recommendations While the software development has so far been successful, one potential issue that appears to be arising is the large amount of processing power needed to generate large sequences using the BBP algorithm. While there are some known faster BBP formulas for π [8], this does not improve the performance of other BBP formulas such as log 2. The issue is simply that BBP is not designed for generating digit sequences of this scale, instead it is most suited at generating individual digits. If the project was to be repeated, it would highly benefit from a similar algorithm that is optimised towards generating digit binary sequences – but as was mentioned previously, none are yet known to exist. However, the BBP algorithm does have a strength which was unfortunately unable to be exploited in this project. First and foremost, due to BBP being a spigot algorithm, it is possible to distribute the processing across several processors and merging the work completed such that each processor is assigned only one or a small portion of digits to generate. In addition to distributing the processing of individual digits, it is also possible to distribute the processing of an individual digit itself due to the design of the algorithm. The algorithm makes use of a series that is performed multiple times with differing variables, this function itself can be distributed to allow multiple processors to work on one digit. For example, a future project could greatly increase digit generation speed by first distributing the generation of digits across multiple processor systems, of which each may distribute the generation of the assigned digit across multiple processing cores. Even still the process of calculating an individual series can be broken down further so that only a small range of data is calculated, a method which was proven to be successful in earlier projects which used BBP formulas [27]. Additionally, a feature of the BBP algorithm is that it does not only generate a single digit, it generates several up to the precision of the floating point arithmetic used. One potential method of reducing some processing is to avoid the use of floating point arithmetic and instead use integer arithmetic of a very large word Chapter 5. Evaluation 42 size, if available, [2] such that many digits can be generated in one attempt rather than only a few, however, this will come at the cost of memory consumption. Further, a new adequate method of calculating the standard deviation is recommended, i.e. one that provides a more accurate ratio to the sum of the list it is performed up on. This would enable the standard deviations to be measured across walks of various lengths. While the author did experiment with using the standard deviation in a ratio to the walk length, it did not provide any useful result. Finally, with the project in its current form, it would be interesting to see how the different generators perform using graphs of various shapes other than those of a square or rectangular shape. For example in shapes which are triangular. As a further expansion to this it would also be interesting to see how the generators perform when the graphs are not treated as a torus. Chapter 6 Professional Issues When it comes to professionalism, it makes sense to talk about being professional in IT. Standards are vital so that IT professionals can provide systems that last. Sir Tim Berners-Lee (1955–Present) 6.1 Code of Conduct This project complies with the British Computer Society (BCS) Code of Conduct as in no stage did the project deviate from these standards. In the section of public interest the project has no interaction with the public, or the environment and thus cannot harm them. Further, for any third parties which were interacted with – such as that of the TRNG services, their rights were respected throughout. In addition to this all known national, regional or international laws were complied with. For example no personal data is stored by this project and any interaction with third parties were done through compliance with their rules. This project was also not in anyway shaped by any form of coercion. The practices of the authorities were respected throughout the development of this project. Further, all deadlines were met and indications were given at the relevant stages as to any deviations from the prior specified project schedules. Finally, at 43 Chapter 6. Professional Issues 44 no stage within the project has the results ever been manipulated or withheld for any reason. Throughout the project, the authors skills were greatly refined, particularly the author became competent in his understanding of BBP algorithms and further increased his skills in the Java programming language. Also, the author took care to carefully monitor this specific research area to ensure that any new works were utilised at the most convenient possible time. In addition, the author made it clear of the extent of his skills early in the project which was reflected in the decisions made later in the project - such as a lack of a Java GUI. 6.2 Code of Good Practice This project also complies with the BCS Code of Good Practice as once again the project complies with all standards within the document. Throughout the project, the author ensured his technical competence within the subject area was maintained through careful monitoring of publications within the area. Also in terms of the software implementation, tools required were also monitored to ensure that new methods or tools released through the project were identified and utilised. Specifically, multiple Java Development Kit updates were released throughout this project and were adopted by the author – however, it was ensured that backward compatibility was maintained. Further the standards and regulations of the organisation, and the laws of the country were complied with. The author ensured that the level of his ability in specific areas were indicated before proceeding at any point and the likely actions required as a reflection of the level were also indicated. Further to this, the author ensured that the workload undertaken was reasonable for his ability and that he had the necessary resources such that all work could be completed within the time scales given. Specific to this project, the project complies with the requirements that the research is beneficial and not damaging to society. Further, the potential applications Chapter 6. Professional Issues 45 have been made clear for this project and pitfalls – of which so far none are known. Further, due to no biological material (animal or human), or even personal data being used within this project all ethical and legal concerns in these areas, comply with the standards expected. Finally, work by other authors such as that of BBP has been evaluated by this project and the results are made available within this document. Further to this, the results produced by this project itself are also made publicly available within this document. Chapter 7 Conclusions The most exciting phrase to hear in science, the one that heralds the most discoveries, is not ’Eureka!’, but ’That’s funny...’ Isaac Asimov (1920-1922) 7.1 Main Findings The two factors that were hoped would be observed within this project were: • The method must be faster in completing the walk. • The distribution of visits must be more equal even at the sacrifice of speed. While this project has not found any direct evidence for either of these factors it has found something of interest related to these. First and foremost, due to the deterministic nature of the expansions of irrational numbers, yet also the good quality of randomness the digits hold, an interesting effect is observed. In the TRNGs, it was shown in the experiments that the results tended to vary drastically from one sequence to another, in one experiments a sequence was performing exceptionally well, while in another experiment the sequence generated would be extremely poor. This effect also occurs in the PRNGs too, it appears very much 47 Chapter 7. Conclusions 48 that depending on the seed used for the PRNGs, depends on the quality of the series generated. However, as the expansions of irrational numbers do not change like this then this flaw does not occur. By using the expansion of an irrational number as a random number generator the sequence can always be of a good quality rather than having the risk of having a very poor sequence – even though there is the possibility of a very good quality sequence. While π did not prove to perform well in the experiments, or at least as well as was hoped, log 2 shows potential as 15%-20% of its walks performed the best. Further, this observation seemed to hold when the starting point for the sequences was incremented by intervals of 50,000 from 0 up to 250,000. Additionally, as the starting point increases, the performance of π within the results also improved in some cases. In regards to the second objective, this was complex to observe with the methods used. However, through analysis using the heat maps, the irrational number, π, did show a pattern emerging as to where the digits become distributed. Further, while no generator performed a consistently smooth surface, log 2 did generate a pattern that has a relatively good distribution. In comparison to π, the standard deviation of identical grids always tended to be lower for log 2 even when log 2 required more digits to complete a walk. Overall, π and log 2 have shown interesting results in the walks, while they have not shown any noteworthy improvement over existing random numbers generators they do indicate that potential applications may exist in this area. The memory requirement of the the BBP algorithm is lower than that of the Mersenne Twister, yet the performance is almost equivalent in many experiments, and it also reduces the financial cost of implementing a TRNG, however, due to the linear nature of the BBP algorithm it becomes increasingly slow as a sequence progresses. If this were to be applied within a real-world agent such as a physical robot it may begin to become increasingly slower in its actions over time, where as a PRNG or TRNG would not. Until the time complexity of the BBP algorithms can be reduced this Chapter 7. Conclusions 49 may reduce the time a BBP powered robot can be deployed for, or require it to reset or switch algorithms after an arbitrary position. 7.2 Future Work As mentioned a number of interesting observations have been made, in particular related to the increasing randomness of log 2 as the starting point increases. Unfortunately due to the limit in the number of digits, little work was completed in studying this, however, it is certainly an aspect that should be studied more in future. Further, due to the existence of patterns emerging in π it would be interesting to see if there are certain graphs that would benefit from this movement pattern. However, for either of these two concepts to be realised, it would be of immense interest to study as to whether a more efficient generator for the expansions of irrational numbers can be found, such as one which is logarithmic. The possibility of an irrational number random number generator do seem realistic, as the quality of the sequences is shown to be random and the performance in experiments has been good in general. Further, these qualities were shown to hold, or even improve if the random number generator were to be seeded (i.e. starting from an arbitrary position). Finally, as a further interesting application that does not appear to have been discussed previously is the use of the BBP algorithm in cryptography. While irrational number generators have been discussed in the past to be bad choices for cryptography due to their predictable nature [25], what has not been realised is that to calculate an individual digit is only linear. This would allow two nodes to select an arbitrary position and use the digits at these points for encrypted communication, incrementing the position used by some value for each message. This would allow a small sub-sequence to be generated with relative ease even with large values of n, however, to find the sub-sequence would require performing Chapter 7. Conclusions 50 O(n2 ) complexity. Further, due to the number of BBP algorithms available this would become O(m × n2 ) where m is the number of available BBP algorithms. If a secret increment value is used it could be incredibly complex to eavesdrop even if the key is broken for one message in the conversation. Appendix A Project Specification Graph Exploration in 2D-Grids Using Irrational Numbers Project Specification Andrew Collins Project Description In this project we will perform the exploration of 2D grids using the graph model [13] where by we have a set of nodes each connected by a set of undirected edges (G = (V, E)) to the four closest nodes. While in principle we will treat this as a graph model it may also be seen as a geometric model [13]. For the exploration of the graphs we will use a deterministic counterpart of a random walk [13] using the expansions of irrational numbers (e.g. π, π 2 , log 2 (2)) – which are believed to have no periodicity [12] – in base-4. From performing these walks we hope to see the randomness of consecutive sequences of each irrational number. In addition we will further verify this using the Chi Square (χ2 ) Test, which provides a measure of the deviation of a given sample [37]. By using the expansions of multiple irrational numbers we will provide a benchmark as to which number performs best in given graphs and further provide an 51 Appendix A. Project Specification 52 indication of the randomness of the generated expansions. From these tests we will provide an indication of how random the consecutive digits of the expansions of irrational numbers are, in comparison to existing pseudo-random random number generators such as the Linear Congruential Generators [26], Mersenne Twister [24], and Blum–Blum–Shub [9], – which are known to have a periodicity after some number of digits [12]. To complete this project we must be able to generate the expansions of irrational numbers efficiently, in terms of both memory and processing, as their is a possibility that some walks may require many digits to be produced before a graph can be covered. The generated expansions must also be in base-4 so that each expansion can be easily, and fairly, mapped to a direction. One of the most efficient ways to generate expansions of specific irrational numbers is the Bailey–Borwein–Plouffe (BBP) Algorithm [7], which, has been shown to be capable of generating the expansions of many irrational numbers [1] with very low memory and processing requirements. The originally proposed BBP Algorithm for π is shown in eq. A.1, however, more recent work has improved this formula by 47% [8]. ∞ X 1 2 1 1 4 π= − − − 16k 8k + 1 8k + 4 8k + 4 8k + 6 k=0 (A.1) While the BBP Algorithm is designed towards the generation of digits in base-16 it is not restricted to this, any power of two can also be created from this formula such as base-4 or even base-2. For the generation of other bases such as base-10 different BBP-style algorithms are required [14, 28]. In this project, we shall produce software capable of generating the expansions of a selection of irrational numbers which can then be used to perform walks in 2D grids. The software will output the number of moves required for each irrational number to complete the walks within the grid such that comparisons can be made. Further, the software will also be capable of verifying the randomness of the expansions which were generated. From this we will produce documentation Appendix A. Project Specification 53 stating the performance of each irrational number in terms of both speed1 and randomness. From the initial proposal we have refined the problem to using the expansions of irrational numbers as a means of performing walks through grids as an alternative to pseudo-random numbers. Further we have specified that we would like to test for randomness in the numbers that were used to complete the walks. Conduct of The Project In preparation for the project, background research has been required and completed in the study of the BBP algorithm such that it can be understood and modified to generate base-4 expansions in the most optimal manner. As the project continues we may require further research in finding suitable methods for proving the correctness of the base-4 expansions of the selected irrational numbers that we implement. For the implementation of the software required by the project, the Java programming language will be used. The student, feels confident in his skill to be able to develop the implementation sufficiently. However, additional skills will be required in the “Swing” library to be able to develop a GUI for the implementation. It should be noted however, that while a GUI is of interest, it maybe unsuitable due to the limit in the size of grids displayable. As such a GUI will be seen as a value-added extra and not a functional requirement. Finally, the student must also develop an extremely competent understanding of the BBP algorithm so as to be capable of using it to generate the expansions of many irrational numbers. As the software used to develop this project will be bespoke, no additional software will be required. However, additional software will be used for the generation of the software itself. In particular, the student intends to use the Eclipse IDE2 so as 1 The number of expansions required to be generated to complete a walk rather than as a measure of time. 2 Available from: http://www.eclipse.org/ Appendix A. Project Specification 54 to increase the ease and speed in which the software is developed. In addition the Sun Microsystems Java Development Kit (JDK)3 will be used for the compilation of the source code. The compiled byte-code can be executed using any available Java Virtual Machine (JVM) such as what is provided within the JDK and Java Runtime Environment (JRE). Statement of Deliverables From this project we will present four deliverables by the completion of the project: • Documentation – By completion of the project we will create multiple documents such as the specification, design and dissertation. In particular, the latter will contain details as to how the expansions of irrational numbers in base-4 were generated. Further it will contain the results of the experiments as well as an analysis of these results. Finally we hope to produce additional separate documentation should we discover any new BBP-type algorithms or improvements. • Software – To perform the experiments required, a bespoke software solution will be developed by the student. The aim of the solution will be to generate the expansions of multiple irrational numbers and then perform walks through grids of differing sizes. Further, the software will analyse at the completion of each walk the randomness of the expansions that were used to complete the walk. • Experiments – Through using the software we will benchmark a range of irrational numbers. Firstly, the winner of each benchmark will determined through the speed at which the irrational numbers manage to complete a walk through a given grid. In addition to this test we will also perform the χ2 test on the generated sequences to gather a further measure of randomness. 3 Available from: http://java.sun.com/javase/ Appendix A. Project Specification 55 • Evaluation Methods – To evaluate the speed aspect of the random walks we will simply compare the number of expansions required to complete the walk to find which irrational number completed the walk the quickest. Further, we will evaluate the randomness of the walks both through the results of the walks within the grids and also through using the χ2 test. Plan A plan of how the project will be completed is available in table A.1. While the project should follow this plan strictly, the project does however contain some risk. Primarily, the experimentation phase is scheduled to run for approximately 2 weeks, however, as the experiments performed within this project may become CPU intensive as the size of the grids grow, this may run over schedule. We expect to mitigate this risk, through reducing the scale of the experiments or more preferably by finishing the software development phase early. As mentioned previously the student feels that this aspect should not be too troublesome and as some algorithmic aspects have already been prototyped in the “Background Research” phase this is realistic possibility. For a more detailed version of this plan, please see the attached Gantt Chart. Start Date 25 May 4 June 19 June 20 June 15 July 20 July 21 July 12 Aug. 17 Aug. 18 Aug. 28 Aug. 18 Sept. End Date 3 June 18 June 19 June 14 July 19 July 20 July 11 Aug. 16 Aug. 17 Aug. 27 Aug. 17 Sept. 18 Sept. Title Background Research Project Specification Specification Submission Design Documentation Presentation Preparation Design Presentation Software Implementation Presentation Preparation Software Presentation Experiments & Analysis Dissertation Project Completion Deliverables Understanding of req. algorithms Specification document Submission of Specification Project Design document Presentation Slides Design Presentation Software to be used in the project Presentation Slides and sample experiments Software Presentation Experiment results and analysis documentation Dissertation document Submission of Dissertation Table A.1: Plan as to how the project will be completed including milestones in bold and deliverables Appendix B Project Specification Graph Exploration in 2D-Grids Using Irrational Numbers Project Design Andrew Collins Summary of Proposal In this project we have proposed to perform the exploration of the 2D grids using the graph model [13] where by we have a set of nodes each connected by a set of undirected edges (G = (V, E)) to the four closest nodes. For the exploration of the graphs we will use a deterministic counterpart of a random walk [13] using the expansions of various irrational numbers (e.g. π, π 2 , log2) in base-4. From performing these walks we hope to see the randomness of the consecutive sequences of each irrational number. In addition we will further verify this using the Chi Square Test (χ2 ) [37]. Through performing these walks we shall identify which irrational number performs the best in various grid sizes, and additionally we shall identify which irrational number produces the most random sequence for completing each grid size. Further we will also compare the performance of the expansions of various 57 Appendix B. Project Specification 58 irrational numbers against Pseudo-Random Number Generators (PRNG), such as the commonly available Linear Congruential Generators [26] and the Mersenne Twister [24] as well as against True Random Number Generators (TRNG) such as RANDOM.ORG [15] and HotBits [32]. For the generation of the expansions of various irrational numbers we intend to use the Bailey–Borwein–Plouffe (BBP) Algorithm [7] which was first shown to be capable of generating π (eq. B.1) in base-16 with very low memory and processing requirements. ∞ X 1 4 2 1 1 π= − − − 16k 8k + 1 8k + 4 8k + 5 8k + 6 k=0 (B.1) Since the initial work on BBP, the binary expansions of many other irrational numbers has been shown to be capable of being generated using BBP-style algorithms giving rise to P-notation [5] as shown in eq. B.2. n ∞ X 1 X aj P (s, b, n, A) = k b j=1 (kn + j)s k=0 (B.2) π and many other irrational numbers have been written in this form [1, 4, 36]. This makes BBP of great interest to the project as it would allow the binary expansions many irrational numbers to be added to the software quickly once initial support for P-notation is added. While not expected to be necessary to this project, BBP has also shown to be capable of generating non-binary bases such as decimal [14, 28]. Design To complete this project, a software solution is required to be developed that will be capable of generating the expansions of various irrational numbers and performing walks to measure randomness. Due to the nature of this project being Appendix B. Project Specification 59 less focused on the software itself and more towards the actual output of the software, we will use an incremental methodology. In this methodology, the design process is treated as being cyclic with relatively simple objectives being specified to be completed on each cycle. This allows the software to be built gradually through refinement based upon what is learned on the previous iteration. Due to the complexity of implementing various BBP algorithms being unknown to us, this allows in the first cycle for us to first focus solely upon implementing the BBP for π algorithm. From here we can re-use what has been learnt to refine the software such that P-notation is supported. Should this be successful then other irrational numbers can be added as is necessary. This iterative process of refinement can be repeated until the software reaches what is expected of it. A further advantage of this method is the development process also becomes very responsive to change as new insights or issues can be factored into the software at the start of the next cycle. Finally, through using this method we create the functional aspects relatively early allowing for us to be able to start generating some results before the software is fully complete, effectively increasing the experimental period of the project. In terms of the expected modules or components, the software will be required to be capable of pre-generating and storing the base-4 expansions of various irrational numbers, preferably using a space efficient method as the number of generated digits may be quite large - using one byte to store one digit would have an overhead of 6 bits per digit. As each digit only requires 2 bits, it would be possible to perform bit packing using bitwise operators so that 4 digits can be stored per byte. Further, the database will be required to be read from as well as written or appended to. The software will then be required to use these databases to perform walks. To perform the walks, the software will need to be able to generate 2D grids of various sizes. As the grids are strictly 2D and either square or rectangular in shape, a simple 2D array can be used. If a 2D integer array is used, then it would become possible to store whether a grid square has been visited or not, if it has Appendix B. Project Specification 60 then how often the visits were. From this, charts showing frequency of visits can be generated (e.g. heat maps). Finally, the last component of the software is the χ2 test. Due to the simplicity of the algorithm to compute this, this will be performed by the software itself. To perform this test the software will be required to record the frequency each digit occurred during the walk which can be calculated against the expected occurrences of each digit to generate the result. Review Against Plan The plan which was produced for the specification stage of the project (see, fig. B.1) has so far been followed and remained on schedule. In particular, the background research stage has identified the work in BBP as shown in the proposal summary. A minor change has been made to the design stage as the “Design Presentation” has been set to the 24th July from the previous 20th July. Irrespective of this, the previous design stages have not been extended, instead they have maintained so as not to reduce the time allocated to the software implementation. The remainder of the project plan will continue as is, however, it is still expected that the software implementation stage may finish early due to some success already being made in the prototypes of the BBP algorithms within the earlier and current stages. However, in response to feedback [41] additional subtasks have been added to the remaining stages. Appendix B. Project Specification Start Date 25 May 4 June 19 June 20 June 15 July 24 July 21 July 21 July 24 July 27 July 1 Aug. 3 Aug. 5 Aug. 7 Aug. 12 Aug. 17 Aug. 18 Aug. 28 Aug. 28 Aug. 31 Aug. 3 Sept. 7 Sept. 11 Sept. 15 Sept. 18 Sept. End Date 3 June 18 June 19 June 14 July 19 July 24 July 11 Aug. 23 July 26 July 31 July 2 Aug. 4 Aug. 6 Aug. 11 Aug. 16 Aug. 17 Aug. 27 Aug. 17 Sept. 30 Aug. 2 Sept. 6 Sept. 10 Sept. 14 Sept. 17 Sept. 18 Sept. Title Background Research Project Specification Specification Submission Design Documentation Presentation Preparation Design Presentation Software Implementation ,→ ,→ ,→ ,→ ,→ ,→ ,→ Presentation Preparation Software Presentation Experiments & Analysis Dissertation ,→ ,→ ,→ ,→ ,→ ,→ Project Completion 61 Deliverables Understanding of req. algorithms Specification document Submission of Specification Project Design document Presentation Slides Design Presentation Software to be used in the project BBP for Pi BBP for Log2 P-Notation PRNG TRNG χ2 Test Testing of software Presentation Slides and sample experiments Software Presentation Experiment results and analysis documentation Dissertation document Abstract & Introduction Background Design Realisation Evaluation & Conclusion Clean-up for submission Submission of Dissertation Table B.1: Plan as to how the project will be completed including milestones in bold and deliverables Appendix C Project Implementation Graph Exploration in 2D-Grids Using Irrational Numbers Software Presentation Andrew Collins Summary of Proposal In this project we have proposed to perform the exploration of the 2D grids using the graph model [13] where by we have a set of nodes each connected by a set of undirected edges (G = (V, E)) to the four closest nodes. For the exploration of the graphs we will use a deterministic counterpart of a random walk [13] using the expansions of irrational numbers (i.e. π, and log2) in base-4. From performing these walks we hope to see the randomness of the consecutive sequences of each irrational number. In addition we will further verify this using the Chi Square Test (χ2 ) [19]. Through performing these walks we shall identify which irrational number performs the best in various grid sizes, and additionally we shall identify which irrational number produces the most random sequence for completing each grid size. 63 Appendix C. Project Implementation 64 Further we will also compare the performance of the expansions of various irrational numbers against Pseudo-Random Number Generators (PRNG), such as the commonly available Linear Congruential Generators (LCG) [20] and the Mersenne Twister [24] as well as against True Random Number Generators (TRNG) such as RANDOM.ORG [15] and HotBits [32]. Summary of Design The project design stated that the software is to be comprised of three components, of which first and foremost is the generation of the digits necessary for performing the walks. For the generation of irrational numbers it was decided that the BBP algorithm [7] is to be used as not only is it very efficient in generating various different irrational numbers, it also works in binary bases. In addition to the requirement of generating irrational numbers, it was also a requirement for the software to be capable of generating both pseudo-random and true-random number sequences. In the second component, the software is required to produce 2D grids and use the previously generated digits to perform graph exploration, in a form similar to a random walk. Thirdly, the software must use the results of the prior walks to perform further tests for randomness, in this case we intended to use the χ2 test. Produced Software Of the many various BBP formulas available, two numbers have been selected, a constant – π, and a logarithm – log2, and have been successfully implemented using the BBP formula for π [7] and log2 [1]. In addition to generating irrational numbers, at this stage we were also required to implement both PRNG (LCG, and Mersenne Twister) and TRNG (HotBits, and RANDOM.ORG). The LCG Appendix C. Project Implementation 65 was implemented with relative ease due to support being natively available within the Java libraries [31] and further the Mersenne Twister was also implemented with similar ease due to an existing implementation available within the Colt Project libraries [10]. The two TRNG’s did however provide some initial difficulties. First and foremost, both TRNG’s are online services and as such have been designed to prevent abuse by limiting the number of digits generated, while RANDOM.ORG is very generous in providing a daily allowance of 1,000,000 digits per day it did create difficulty when trying to debug the project software or generating various test cases. Fortunately, RANDOM.ORG seems to have foreseen the problem of its users requiring high volumes of data as it provides pre-generated files which are updated every 24 hours [17]. The pre-generated files are simply a binary file containing 1 MB of data (4,194,304 base-4 digits). In addition it contains an archive of all previous generated databases providing a huge amount of data to work with. Also, conveniently the data files utilise bit packing just as the project software itself uses for its own databases meaning that the files can be used without any modification. The project software has been designed to automatically download the latest of these files and use them as a digit database. In cases where this functionality is not required, databases can be manually download and added to the software. The second TRNG implemented, HotBits, also suffered from this same issue as its daily allowance is only 16,384 bits making it completely unsuitable for use in the project. Further, HotBits does not provide any pre-generated files as was the case with RANDOM.ORG. However, after contact with the HotBits author [35], two 16 MB (67,108,864 base-4 digits) data files became available to the project, or one file per HotBits generation hardware, which had been pre-generated for previous statistical testing [34]. As we are testing for randomness and not uniqueness these databases should prove more than sufficient. Once again, as the data files are bit packed they can be simply loaded in to the software. The 2D graph generation is as was planned a 2D integer array generated to the requested size. By default the integer arrays starts with all values set to zero. Appendix C. Project Implementation 66 Upon the the walk entity moving into a node, the position is incremented by one. When all grid squares are greater than zero the walk is complete. Throughout the walk, a record is kept of the frequency of each base-4 digit so that the χ2 distribution can be calculated and logged, in addition an analysis of the generated matrix (2D array) is performed. While earlier in the project their was a lack of clarity on how this was to be performed [42], this has now been resolved. To analyse the matrices the software calculates the standard deviation at the end of the walk as a measure of the smoothness of a performed walk. At the completion of the walk, and calculation of both the χ2 distribution and the standard deviation, the matrix and calculations are output to log files which can themselves be analysed further, at a later point. To analyse the log files for several walks a PHP based web front end has been created. The PHP script reads each log file simultaneously and provides the viewer with information as to which digit generator performed the best in each measurement per grid size. Further, the interface also provides an summary of all the results showing which generator performed the best overall. While at this stage both the software and log analyser are deemed complete, it is expected that the web-front end will receive further modifications throughout the experimentation period as new and improved methods of analysing the results are found. Evaluation In terms of meeting the objectives of the project, the software has been successful, as described previously, as it is capable of generating digits to user defined limits and storing them very efficiently (zero overheads), performing 2D walks, performing the χ2 test, and calculating the standard deviation. In addition the generated matrix is saved along with a log file entry so that further analysis can be undertaken outside of the software. Appendix C. Project Implementation 67 While, all targets have been met, one design of the original plan has been removed. In later work by the BBP authors, new BBP formulas have been written in P-notation form, the hope initially was that P-notation could be supported by the project so that many other documented formulas may be added to the software. Unfortunately however, conflicts were identified (π: k = 0 / log 2: k = 1 / P-Notation: k = 0) in the formulas which made verifying more exotic formulas √ (π 2) complex, communications with the author of P-notation failed to provide any resolution [3]. Further, while prototypes did have some success, the P-notation implementations proved to suffer in performance due to the additional logic required. Due to the already limited processing power available to the project this stage was cancelled. Future Suggestions While the software development has so far been successful, one potential issue that appears to be arising is the large amount of processing power needed to generate large sequences using the BBP algorithm. While there are some known faster BBP formulas for π [8], this does not improve the performance of other BBP formulas such as log2. The issue is simply that BBP is not designed for generating digit sequences of this scale, instead it is most suited at generating individual digits. If the project was to be repeated, it would highly benefit from a similar algorithm that is optimised towards generating digit binary sequences. However, the BBP algorithm does have a strength which was unfortunately unable to be exploited in this project. First and foremost, due to BBP being a spigot algorithm, it is possible to distribute the processing across several processors and merging the work done such that each processor is assigned only one or a small portion of digits to generate. In addition to distributing the processing of individual digits, it also possible to distribute the processing of an individual digit itself due to the design of the algorithm. The algorithm makes use of a series that is performed multiple times Appendix C. Project Implementation 68 with differing variables, this function itself can be distributed to allow multiple processors to work on one digit. For example, a future project could greatly increase digit generation speed by first distributing the generation of digits across multiple systems, of which each may distribute the generation of the assigned digit across multiple processing cores. Even still the process of calculating an individual series can be broken down further so that only a small range of data is calculated, a method which was proven to be successful in earlier projects which used BBP formulas [27]. Finally, a feature of the BBP algorithm is it does not only generate a lone digit, it generates several up to the precision of the floating point arithmetic used. One potential method of reducing some processing is to avoid the use of floating point arithmetic and instead use integer arithmetic of a very large size if available [2] such that many digits can be generated in one attempt rather than only a few, however, this will come at the cost of memory. Bibliography [1] D. H. Bailey. A compendium of BBP-type formulas for mathematical constants. Available from: http://crd.lbl.gov/~dhbailey/dhbpapers/bbp-formulas.pdf, 2009. [2] D. H. Bailey. The bbp algorithm for pi. Available from: http://crd.lbl.gov/~dhbailey/ dhbpapers/bbp-alg.pdf, 2006. [3] D. H. Bailey. Re: P-notation parser. Personal Communication, 2009. [4] D. H. Bailey and J. M. Borwein. Mathematics by Experiment: Plausible Reasoning in the 21st Century, chapter 3.6, pages 127–131. Wellesley, MA: A K Peters, 2003. [5] D. H. Bailey and R. E. Crandall. On the random character of fundamental constant expansions. Experimental Mathematics, 10(2):175–190, 2000. [6] D. H. Bailey, J. M. Borwein, P. B. Borwein, and S. Plouffe. The quest for pi. Mathematical Intelligencer, 19(1):50–57, 1997. [7] D. H. Bailey, P. Borwein, and S. Plouffe. On the rapid computation of various polylogarithmic constants. Mathematics of Computation, 66(218):903–913, 1997. ISSN 0025-5718. doi: http://dx.doi.org/10.1090/S0025-5718-97-00856-9. [8] F. Bellard. A new formula to compute the n’th binary digit of pi. Available from: http: //fabrice.bellard.free.fr/pi/pi_bin.pdf, 1997. [9] L. Blum, M. Blum, and M. Shub. A simple unpredictable pseudo random number generator. SIAM Journal on Computing, 15(2):364–383, 1986. ISSN 0097-5397. doi: http://dx.doi. org/10.1137/0215025. [10] CERN - European Organization for Nuclear Research. Colt project. Available from: http: //acs.lbl.gov/~hoschek/colt/, 2004. [11] D. Eastlake 3rd, J. Schiller, and S. Crocker. Randomness Requirements for Security. RFC 4086 (Best Current Practice), June 2005. URL http://www.ietf.org/rfc/rfc4086.txt. [12] H. Ghodosi, C. Charnes, J. Pieprzyk, and R. Safavi-Naini. Pseudorandom sequences obtained from expansions of irrational numbers. In Pre-Proceedings of Cryptography Policy and Algorithms Conference, pages 165–177, Brisbane, Australia, July 3-5 1995. [13] L. Gąsieniec and T. Radzik. Memory efficient anonymous graph exploration. GraphTheoretic Concepts in Computer Science: 34th International Workshop, WG 2008, Durham, UK, June 30 — July 2, 2008. Revised Papers, pages 14–29, 2008. doi: http://dx.doi.org/ 10.1007/978-3-540-92248-3_2. [14] X. Gourdon. Computation of the n-th decimal digit of π with low memory. Available from: http://numbers.computation.free.fr/Constants/Algorithms/ nthdecimaldigit.pdf, 2003. 69 Bibliography 70 [15] M. Haahr. RANDOM.ORG - true random number service. Available from: http://www. random.org, 2009. [16] M. Haahr. RANDOM.ORG - the history of random.org. Available from: http://www. random.org/history/, 2009. [17] M. Haahr. RANDOM.ORG - pregenerated random numbers. Available from: http:// random.org/files/, 2009. [18] M. Harrower and C. A. Brewer. Colorbrewer.org: An online tool for selecting color schemes for maps. The Cartographic Journal, 40(1):27–37, 2003. [19] D. E. Knuth. Seminumerical Algorithms, volume 2 of The art of computer programming, chapter 3.3.1, pages 42–48. Addison-Wesley, third edition, 1998. [20] D. E. Knuth. Seminumerical Algorithms, volume 2 of The art of computer programming, chapter 3.2.1, pages 10–26. Addison-Wesley, third edition, 1998. [21] T. E. Kurt. Hacking Roomba, chapter 1, page 5. ExtremeTech. John Wiley & Sons, 2006. [22] M. W. Maimone, P. C. Leger, and J. J. Biesiadecki. Overview of the mars exploration rovers’ autonomous mobility and vision capabilities. In IEEE International Conference on Robotics and Automation (ICRA) Space Robotics Workshop, Roma, Italy, Apr. 2007. [23] E. Maningat, B. Monterola, E. Obrero, R. Samante, and J. Villafuerte. Random walk application for autonomous vacuum cleaner robot. Available from: http://www. electronicslab.ph/projects/autonomous-vacuum-cleaner-robot.pdf, 2007. [24] M. Matsumoto and T. Nishimura. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation, 8(1):3–30, 1998. ISSN 1049-3301. doi: http://doi.acm.org/10.1145/272991. 272995. [25] T. Mitsui. The number π as a pseudo-random number generator. The science and engineering review of Doshisha University, 49(3):160–168, 2008. [26] S. K. Park and K. W. Miller. Random number generators: good ones are hard to find. Communications of the ACM, 31(10):1192–1201, 1988. ISSN 0001-0782. doi: http://doi. acm.org/10.1145/63039.63042. [27] C. Percival. The quadrillionth bit of pi is ’0’. Available from: http://oldweb.cecm.sfu. ca/projects/pihex/announce1q.html, Dec. 2001. [28] S. Plouffe. On the computation of the n’th decimal digit of various transcendental numbers. Available from: http://pictor.math.uqam.ca/~plouffe/Simon/articlepi.html, 1996, Revised 2003. [29] M. Saito and M. Matsumoto. Simd-oriented fast mersenne twister: a 128-bit pseudorandom number generator. In Monte Carlo and Quasi-Monte Carlo Methods 2006, pages 607–622. Springer Berlin Heidelberg, 2008. [30] D. W. Seward and M. J. Bakari. The use of robotics and automation in nuclear decommissioning. In 22nd International Symposium on Automation and Robotics in Construction, Ferrara, Italy, 2005. [31] Sun Microsystems, Inc. Math (java platform se 6). Available from: http://java.sun.com/ javase/6/docs/api/java/lang/Math.html#random(), 2008. [32] J. Walker. HotBits: Genuine random numbers. Available from: http://www.fourmilab. ch/hotbits/, Sept. 2006. Bibliography 71 [33] J. Walker. How hotbits works. Available from: http://www.fourmilab.ch/hotbits/how3. html, Sept. 2009. [34] J. Walker. Hotbits statistical testing. Available from: http://www.fourmilab.ch/ hotbits/statistical_testing/stattest.html, Sept. 2006. [35] J. Walker. Re: [feedback] (bulk request). Personal Communication, 2009. [36] E. W. Weisstein. BBP-type formula. From MathWorld–A Wolfram Web Resource. http: //mathworld.wolfram.com/BBP-TypeFormula.html, 2009. [37] E. W. Weisstein. Chi-squared test. From – A Wolfram Web Resource. http://mathworld. wolfram.com/Chi-SquaredTest.html, 2009. [38] E. W. Weisstein. Irrational number. From – A Wolfram Web Resource. http://mathworld. wolfram.com/IrrationalNumber.html, 2005. [39] E. W. Weisstein. Normal number. From – A Wolfram Web Resource. http://mathworld. wolfram.com/NormalNumber.html, 2005. [40] Wikipedia. Linear congruential generator — wikipedia, the free encyclopedia. Available from: http://en.wikipedia.org/w/index.php?title=Linear_congruential_ generator&oldid=302017530, 2009. [41] P. W. H. Wong. Project specification feedback. Personal Communication, 2009. [42] P. W. H. Wong. Project design feedback. Personal Communication, 2009.
© Copyright 2026 Paperzz