Measuring Improvements and the Effects of Multiple and Unique

DEGREE PROJECT, IN COMPUTER SCIENCE , FIRST LEVEL
STOCKHOLM, SWEDEN 2015
Measuring Improvements and the
Effects of Multiple and Unique
Solution Puzzles on Sudoku Solving
Algorithms
JONATHAN GOLAN
JOEL KALLIN
KTH ROYAL INSTITUTE OF TECHNOLOGY
CSC SCHOOL
Measuring Improvements and the Effects of
Multiple and Unique Solution Puzzles on Sudoku
Solving Algorithms
JONATHAN GOLAN
JOEL KALLIN KILMAN
Degree Project in Computer Science, DD143X
Supervisor: Vahid Mosavat
Examiner: Örjan Ekeberg
May 10, 2015
Abstract
In this paper we compare various Sudoku solving algorithms
in order to determine what kind of run-time improvements
different optimizations can give. We will also examine what
kind of effect the existence of multiple solutions in the puzzles has on our result.
Referat
Mätning av förbättringar och effekterna av
unika och multipla pussellösningar på
Sudokulösningsalgoritmer
I denna rapport jämför vi olika Sudokulösningsalgoritmer
med avsikt att påvisa vilken sorts förbättringar för körtiden
olika optimiseringar kan ge oss. Vi ämnar också att undersöka hur det faktum att det finns flera lösningar på pusslen
påverkar våra resultat.
Contents
0.1
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 Introduction
1.1 Background . . . . . . .
1.1.1 Origins . . . . .
1.1.2 NP-completeness
1.1.3 SAT Problem . .
1.1.4 Other Research .
1.2 Purpose . . . . . . . . .
1.3 Problem Definition . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
4
4
5
5
5
5
6
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Solution Puzzles
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
9
10
10
11
11
11
12
12
3 Results
3.1 Multiple Solutions versus Single Solutions . . . . . . . . . . . . . . .
3.2 Algorithm Comparisons . . . . . . . . . . . . . . . . . . . . . . . . .
14
14
14
4 Discussion
4.1 Larger Puzzles . . . . . . . . .
4.2 Outliers . . . . . . . . . . . . .
4.3 Multiple Solutions versus Single
4.4 Algorithm Comparisons . . . .
4.5 Possible Improvements . . . . .
18
18
18
19
21
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Methods
2.1 Puzzle Generation . . . . . . . . . . .
2.2 Backtracking-based Algorithms . . . .
2.2.1 Backtracking . . . . . . . . . .
2.2.2 Forward Checking . . . . . . .
2.2.3 Constraint Propagation . . . .
2.2.4 Minimum Remaining Value . .
2.3 Reduction to Exact Cover . . . . . . .
2.4 Comparing Unique Puzzles to Multiple
2.5 Measuring Time . . . . . . . . . . . .
2.6 Hardware . . . . . . . . . . . . . . . .
2.7 Conditions for Testing . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
Solutions
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Conclusions
5.1 Findings . . . . . . . . . .
5.2 Possible Sources of Error .
5.2.1 Implementation . .
5.2.2 Measuring time . .
5.3 Further Research . . . . .
Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
22
23
23
23
23
24
0.1. TERMINOLOGY
0.1
Terminology
cell - any of the n4 squares which contain or could contain an integer
column - any of the n2 columns which make up the field
decision - whenever an integer is put in a cell we have made a decision
field - the n2 · n2 collection of cells make up the field
hint - a number already placed in the field from the beginning
row - any of the n2 rows which make up the field
state - the state of a field is a snapshot of the current field along with whatever
numbers have been filled in thus far
subgrid - any of the n2 squares which make up the n2 · n2 field
1
Chapter 1
Introduction
Sudoku is japanese for “number place”, a popular game designed to engage the
player in various degrees of brain gymnastics. It is a popular puzzle commonly
found in magazines among other puzzles. The game is played on a 9 · 9 grid made
up of nine 3 · 3 subgrids as demonstrated in figure 1.1 on page 2.
The objective of the game is to fill each row and column with nine integers ranging
from 1 to 9, the same integer only occurring once per row and once per column. At
the same time each subgrid should also contain nine integers ranging from 1 to 9, a
number can only occur once per subgrid here as well.
As this is an empty grid various solutions can be found, but the easiest solution is
to fill each row with the sequence (1,2,3,4,5,6,7,8,9) shift the series three positions
to the right, giving (4,5,6,7,8,9,1,2,3) which is the same series, except for the fact
that it was rotated three positions. Every third row we only shift the series one
3
3
9
Figure 1.1: Empty Sudoku field
2
3
3
1
4
7
9
3
6
8
2
5
2
5
8
1
4
7
9
3
6
3
6
9
2
5
8
1
4
7
4
7
1
3
6
9
2
5
8
5
8
2
4
7
1
3
6
9
6
9
3
5
8
2
4
7
1
7
1
4
6
9
3
5
8
2
8
2
5
7
1
4
6
9
3
9
3
6
8
2
5
7
1
4
9
Figure 1.2: Solution for Sudoku field without clues
position to the right relative to the top row of the subgrid preceding it. It is worth
noting that the rotation can be described as modular arithmetic. It can therefore
easily be described in pseudo code as:
Result: Completed empty Sudoku puzzle of size 9·9
foreach row do
if row = firstRow then
fill sequence (1,2,3,4,5,6,7,8,9);
else if row = firstRowOfSubgrid then
fill sequence ((previousFirstRowOfSubgrid - 1) % 9)+1;
else
fill sequence ((previousRow + 3) % 9)+1;
end
end
Algorithm 1: Code to build the simplest Sudoku solution possible on an empty
field
The result is as shown in figure 1.2 on page 3.
However, as this is not a very hard game, a number of cells on a real Sudoku field
already have predetermined values, limiting the solutions available. This is shown
in figure 1.3 on page 4.
These prefilled numbers or hints are how the difficulty of the game is regulated,
generally more hints means the game is easier to solve.
3
CHAPTER 1. INTRODUCTION
3
3 4 2
6
9
8
5 9 7
3
4 2 1
7 6 4
9
5
3
6
4
3
2
7
8
1
5
9
9 7
8
1
4 3
1 5
6 9
3 2
8 6
7 4
5 1 8
3
6 4
6 8 1
2
7 3 5
8 5 9
7
3
6
9
Figure 1.3: Sudoku field with clues
1
3
2
4
2
4
1
3
3
1
4
2
4
2
3
1
Figure 1.4: Latin Square
The astute reader has however already deduced that Sudoku can be played on a
field greater than 9 · 9. It can in fact be played on a field of size n2 ⇐⇒ n ∈ Z\{0}.
Since 9 has a positive root of 3 it can be used as a Sudoku field. 4 has a positive root
of 2 and could as such be used as a Sudoku field, 144 is a square of 12 and could also
be used as a Sudoku field. These fields of size n would have n rows and n columns
as well as n subgrids, all of which are to be filled with the integers 1,2,3,...,n.
1.1
1.1.1
Background
Origins
The origins of Sudoku can be traced back to Latin Squares. A Latin Square is an
n · n square where each integer [1, n] appears exactly n times. An example is the
Latin Square in figure 1.4 on page 4.
In 1956 W. U. Behrens made a special type of Latin Square called a gerechte (ger,
‘fair or just’). He divided the square of size n·n into n subgrids of size sqrt(n)·sqrt(n)
4
1.2. PURPOSE
adding the rules of what would become Sudoku. The actual Sudoku puzzle was
invented by Harold Garns in 1979 who named it “number place”.[6]
1.1.2
NP-completeness
Since then much research has been done on the subject of Sudoku. According
to Takayuki Yato and Takahiro Seta of the University of Tokyo, Sudoku can be
proved to be NP-complete since Sudoku is an example of a partial Latin Square
which in turn can be reduced to the ASP-problem which has been proven to be
NP-complete.[5]
1.1.3
SAT Problem
Another well known reduction for Sudoku is to the SAT problem. The SAT problem,
also known as the Boolean Satisfiability Problem, asks whether or not a satisfiable
answer exists for a given set of clauses. One such formula would be A ∧ ¬B = T rue
which would be satisfied by determining A to be true.[11]
1.1.4
Other Research
We were unable to find a lot of recent research into the Sudoku puzzle. The largest
recent discovery was that there are no Sudoku puzzles with unique solutions and less
than 17 hints [9]. Other than that, many comparisons have been done, for example
"A study of Sudoku solving algorithms by Berggren and Nilsson[7]. However, we
have been unable to find any reports focusing on our chosen algorithms, or the
differences between puzzles with unique and multiple solutions.
1.2
Purpose
As only the general problem is NP-complete, brute-force algorithms can be expected
to work well for small instances of the Sudoku problem, unless P=NP. Approximations and heuristics will have to be used to solve Sudoku-instances of size n. On
Sudoku fields of size 9 · 9 exhaustive search algorithms can be used, and can be
improved upon using heuristics.
Since the exhaustive search provides a tree of possible solutions to one Sudoku
field (some of which could be correct), our heuristic improvement will focus on
pruning this tree, thereby minimizing the exhaustive search in order to make it
more effective.
It should also be mentioned that the Sudoku problem can be reduced to the SATproblem, also known as the Boolean Satisfiability Problem. It is our intention to
use this reduction and known SAT-solvers in order to solve Sudoku fields.
5
CHAPTER 1. INTRODUCTION
A proper Sudoku solution should be unique as well, also known as a well-posed
puzzle, but creating such a field takes many times longer than creating a puzzle
with multiple allowed solutions. Since we want to test our algorithms with as many
different puzzles as possible, the faster creation of puzzles would be preferred. But
it is uncertain in what way that would affect our results. It is also of interest as it
has sometimes been assumed that there is no difference, but we have been unable to
find evidence supporting that assumption. For instance, the report "A search based
Sudoku solver" does this.[4] We will therefore investigate if our algorithms perform
differently on Sudoku fields with multiple solutions as compared to proper Sudoku
fields with unique solutions.
To reiterate, our paper will compare the efficiency of different Sudoku solving algorithms and whether the results differ when multiple solutions exist.
1.3
Problem Definition
• Examine what kind of effect multiple allowed solutions have on solving time
as compared to unique solutions.
• Compare how effective some common algorithms for solving Sudoku are.
6
Chapter 2
Methods
In this chapter we explain the different approaches and algorithms which we used to
solve our Sudoku puzzles, we also explain how our puzzles were generated. In general two distinct approaches can be discerned: The first approach is a backtracking
solution, essentially an exhaustive search testing every possible solution, but improved with the addition of some optimizations. The second one is a reduction to
the exact cover problem, using existing algorithms to solve the original problem.
2.1
Puzzle Generation
Our initial intention was to test our implementations on different sizes of Sudoku
fields, however as it became clear that generating and solving puzzles of size 16 or
25 was more or less hopeless, the scope of our research was limited to traditional
Sudoku of size 9.
The easiest way to generate puzzles is to let an algorithm used for solving Sudoku
solve an empty field, that is a field without any hints. The algorithm then runs until
it has solved the puzzle, after which values are removed from random cells until a
chosen number of hints remain. However, using this method only guarantees that
there is at least one solution, but there could be multiple. This is something the
traditional Sudoku puzzle does not allow.
In order to generate puzzles with unique solutions the previously mentioned method
can first be used to create a solved field. After that, hints are similarly removed
in succession, but in between every removal an algorithm is run which checks for
multiple solutions. If there’s still only one solution another hint is removed. If
however the removed hint enabled multiple solutions, it is put back and another
hint is tested as suitable for removal. This continues until all cells have been tried.
The issue with this method is that it requires running a Sudoku solver many times
for every puzzle generated, a minimum of 81 times, while the first method only
7
CHAPTER 2. METHODS
8
2
4
2 7
1
5
7
5
6 3
2 8 4
1
6
6
2 1
2 9
7
1
3
4
8 5
2
4 3
6 9
(a) Generated Multiple Solution Sudoku Field
4
3
4 7
9
6
8
4
1
2
3
8 1
5 9
(b) Generated Unique Solution Field
Figure 2.1: Examples of two Generated Sudoku fields
requires running it once. It also disallows choosing the number of hints, but it has
been proved that the minimum amount of hints while still having a unique solution
is 17.[9] So if possible, if would be preferable to allow multiple solutions to exist.
Generating 100,000 fields with unique solutions gave an average of 24 hints. It was
therefore decided that when generating multiple-solution fields the number of hints
should be 24 as well.
2.2
Backtracking-based Algorithms
This section describes the backtracking-based algorithms used to solve the Sudoku
problem. A basic backtracking algorithm was used as a building block, subsequent
algorithms were added as improvements, each building upon the previous in order
to optimize results [1].
2.2.1
Backtracking
As mentioned the basis for all backtracking algorithms in this chapter is plain backtracking. In a Sudoku puzzle the backtracking tries every possible number in every
cell, in some order, this is also known as an exhaustive search. If a contradiction
occurs and there is no possible number which can be used in a cell, the algorithm
backtracks to the last cell where it made a decision and makes a new decision using some other number until it has tried every possible number in that cell. This
continues until the whole field has been filled or every possibility has been tried
8
2.2. BACKTRACKING-BASED ALGORITHMS
without finding a solution.
Backtracking(puzzleField)
pos = findUnassignedLocation();
foreach possible number do
pos.value = number;
if len(contradictions)=0 then
Backtracking(puzzleField);
else
pos.value = 0;
end
end
Algorithm 2: Basic representation of how the backtracking was implemented.
Since backtracking tries every possible number, it is generally very slow, but it does
have the advantage of being largely unaffected by the apparent difficulty of the
puzzle. It is generally only affected by the number of hints provided. But for fields
of this size, backtracking should work quite well. However, if you’d want to use
it on a 16x16 field or larger, it would do a lot worse. Time complexity is around
O(9n ) where n is the number of empty cells. It will of course always be less, since
the rules disallow many combinations.
2.2.2
Forward Checking
The first optimization, forward checking, keeps lists of all possible values for all
cells. For instance, if the number nine already exists as a hint in cell [1,1] (first row,
first column) no cell in row 1 or column 1 can contain a nine in the solved field. In
addition, the subgrid containing cell [1,1] can not contain more nines. Each time
the algorithm adds or removes a number it updates all these values in the same row,
column and subgrid.
So when the algorithm attempts to decide a number for a cell, it chooses one at random from the possible numbers, instead of trying any number at random and then
checking for contradictions. If there are no possible values, the algorithm can backtrack right away instead of attempting nine different integers before backtracking.
Unfortunately this is unlikely to improve run-times considerably (it might even be
slower), as it takes a lot of calculations to keep track of all the possible values. However, it is necessary for the following optimizations, which should be quite significant
improvements. For this and the coming algorithms, it is very hard to estimate the
runtime, since it is hard to know when these rules will take effect. However, worst
case should still be around the same size as for backtracking, when the rules are
never used. But that is highly unlikely to ever happen, if not impossible.
9
CHAPTER 2. METHODS
2.2.3
Constraint Propagation
Sometimes when the backtracking algorithm adds a number somewhere in the field,
another cell in the field becomes impossible to fill. So far, the algorithms have
ignored this, but using the lists of possible values implemented for forward checking,
this can be checked. Every time the algorithm decides a number for a cell, the
algorithm checks all other cells. If the algorithm finds an empty cell, but no possible
numbers for it, it knows that the last guessed value was incorrect (at least in the
context of the current field) and it undoes everything done since the backtracking
guessed that number. If it finds a cell without a value and only one possible value,
it chooses that value for the cell and checks the whole field again.
foreach row do
foreach column do
if puzzleField[row, column].possibleValues.size = 1
∧puzzleF ield[row, column].value = 0
then
puzzleField[row, column].value = puzzleField[row,
column].possibleValues[0];
else if puzzleField[row, column].possibleValues.size = 0
puzzleField[row, column].value = 0 then
undoUntilLastBacktrack();
end
end
Algorithm 3: Basic representation of the code handling constraint propagation.
It was inserted into the backtracking code
This should improve run-times substantially, since many unnecessary computations
are now avoided. Say that we put a number in the cell [1,1]. However, this caused
[1,9] to be impossible to fill. Without constraint propagation, we’ll just move on to
[1,2] or some other empty cell and might not notice the already created contradiction
until we reach [1,9]. Now, instead, the algorithm will look through the whole puzzle,
find the contradiction and backtrack immediately. So it should be clear how much
of an improvement this should be.
2.2.4
Minimum Remaining Value
The last addition is known as the minimum remaining value heuristic. It is very
simple compared to the expected run-time improvement. When the backtracking
algorithm is looking for an empty cell to decide a number for (in the pseudocode
named findUnassignedLocation()), it has so far been guessing the cells in order (top
left to bottom right). But now, instead, it chooses the cell with the least amount
of possible values remaining. This reduces the branching factor of the puzzle, and
should result in a significant run-time improvement.
10
2.3. REDUCTION TO EXACT COVER
2.3
Reduction to Exact Cover
In order to solve Sudoku using a reduction a SAT-solver would be needed. As SAT
is one of the most complicated NP problems it was not feasible to build an efficient
enough solver in the time frame given. Fortunately as SAT is one of the oldest
NP-problems there are open source SAT-solvers available.
The SAT solver we used was written by Paul Varoutsos [8]. The Algorithm was
designed to reduce Sudoku to a Boolean query which could be solved by the SAT
solver using the Davis Putnam algorithm. In order to do this the rules of Sudoku
needed to be formalized to Boolean queries, these were: Every cell must contain a
number in the range 1 to 9, every row must contain every number in the range 1 to
9, every column must contain every number in the range 1 to 9 and every subgrid
must contain every number in the range 1 to 9. A parser would read the Sudoku
field and formulate these queries and feed the queries to the SAT-solver, the SAT
solver would find a satisfiable interpretation if one existed and a separate parser
would interpret the answer and formulate it as a Sudoku field.[3]
2.4
Comparing Unique Puzzles to Multiple Solution
Puzzles
In order to properly test the algorithms 100.000 puzzles with unique solutions were
generated and another 100.000 puzzles with multiple solutions were generated. All
algorithms were allowed to attempt to solve each of the 100.000 puzzles generated.
The tests were run one after another where the most basic algorithm was allowed
to run first, and backtracking with forward checking ran subsequently and so on.
The time required to solve each of the 100.000 puzzles was written to a file, where
the time was saved in nanoseconds. The file was imported to MATLAB where it
was loaded as a vector using the MATLAB command load(’File.txt’).
In order to calculate the average running-time for each algorithm, MATLAB commands were used to compute different averages on every vector of running-times.
The results are presented in table 3.1. The different averages used were the mean,
the median, the mode and the geometric mean.
2.5
Measuring Time
Measuring has been done with Java’s built-in method System.nanoTime(). While it
can’t be used to measure differences in nanoseconds, anything close to 1 millisecond
should return accurate results. Furthermore, it was the most accurate method we
could find.
11
CHAPTER 2. METHODS
2.6
Hardware
All tests were performed on the Asus G751JT laptop.
Operatingsystem:
Windows 8.1 64-bit
Processor:
CPU Intel Core i7 4:e gen. 4710HQ / 2.5 GHz
Maximum turbospeed 3.5 GHz
Four cores
6 MB cache
Motherboard:
Mobile Intel HM87 Express
Memory:
RAM 8 GB ( 1 x 8 GB )
DDR3L SDRAM
Storage:
Main 128 GB SSD
Secondary 1 TB HDD / 7200 rpm - SATA 3 Gb/s
GPU:
NVIDIA GeForce GTX 970M - 3 GB GDDR5 SDRAM
2.7
Conditions for Testing
In order to properly perform our intended tests it is important to discuss if the
conditions for using our method can be met. For our test it is necessary that the
computer performing the calculations can be expected to perform very similarly for
each algorithm, for instance any processor demanding tasks cannot be performed
simultaneously as one of the Sudoku algorithms are running. This will be the
first condition. Secondly we need to be sure that our system to measure time is
accurate, much time and resources has been spent investigating the accuracy of
clocks in computers. Our second condition will have to be that our method for
measuring time will be accurate enough to differentiate between different run-times
such that every single Sudoku field does not have the same value for running time
due to inaccurate timing.
As for our first condition, we will, in order for it to be met perform all tests at
night when the computer is not used. Since we run our tests one after another in
one particular order throttling could possibly pose a problem. As the processor
performs calculations it will create excess heat which must be efficiently transferred
and dissipated, otherwise, once the processor has become to hot it will be throttled
in order to not damage any components including the processor itself. As our
12
2.7. CONDITIONS FOR TESTING
testing computer is a laptop heat issues are more of a concern than they would be
on a desktop style computer, however other than minimizing background tasks and
ensuring that the computer is sat in a way that does not block any vents there is
not much we can do. We will assume that the conditions for even performance can
and will be met with the aforementioned measures.
The second condition, that of the accuracy of the timing method used is more
difficult to evaluate if it can be met. Test runs with single Sudoku-fields indicate
that solutions are not faster to calculate than one microsecond, these tests are
however not representative of any ’average’ and can thus not be used as evidence
that this condition can be met as well. It can however be used as an indication,
and in lack of more accurate methods it will have to suffice.
13
Chapter 3
Results
3.1
Multiple Solutions versus Single Solutions
When observing the mean value of the running-times for each algorithm in table 3.1
no clear pattern is discernible, for the first three algorithms the mean running-time
for the 100.000 multiple solution puzzles is greater than that of the 100.000 unique
solutions puzzles, except for the last algorithm where the reverse is true. Table
3.1 shows different averages of the running time for all algorithms on both sets of
puzzles.
The relation column states the relation of the average running time to solve 100.000
puzzles with a unique solution to the average running time to solve 100.000 puzzles
with multiple solutions, for every algorithm. To clarify, if the sign in the relation column is "<" then the average solving time for the unique puzzles was faster (smaller)
than the average solving time for the multiple solution puzzles.
3.2
Algorithm Comparisons
When running the SAT-based algorithm it was clear that it was substantially slower
than the backtrack based solvers. While the backtracking algorithms would run in
the order of milliseconds the SAT-based solver would run in the order of minutes.
Further more the SAT-based solver would not properly solve Sudoku fields, providing solutions that were not correct. The reasons for this will be discussed in the
following sections but because of the substantial difference in running time coupled
with the inability to provide correct solutions it was decided to omit the results
from the SAT-based algorithm and focus on the Backtracking algorithms.
In order to compare running times for the different algorithms, histograms were
used along with the averages from table 3.1. The histograms were normalized so
14
3.2. ALGORITHM COMPARISONS
Algorithm
Backtracking
Forward Checking
Constraint Propagation
Minimum Remaining Value
Type of Average
Mean
Median
Mode
Geometric Mean
Mean
Median
Mode
Geometric Mean
Mean
Median
Mode
Geometric Mean
Mean
Median
Mode
Geometric Mean
Unique
1.7684e+08
3.3780e+07
6.6467e+05
3.3245e+07
3.1949e+08
6.2525e+07
1.0159e+07
6.2882e+07
4.5782e+06
1.5864e+06
3.5266e+05
1.8360e+06
2.2279e+06
1.0342e+06
3.5553e+05
1.1979e+06
Multiple
3.0977e+08
1.8055e+07
5.9611e+05
2.2096e+07
5.0548e+08
3.0050e+07
8.1781e+05
3.8862e+07
5.5281e+06
5.7641e+05
3.9987e+05
8.3901e+05
1.0280e+06
4.0603e+05
3.7811e+05
5.0218e+05
Relation
<
Table 3.1: Different running-time averages for all algorithms with unique and multiple solution Sudoku fields. The Relation column refers to the relation between the
value in Unique and Multiple and whether unique is greater or lesser than multiple.
that they could be compared and some outliers were cut off to make the histograms
easier to understand.
Important to note about the figures 3.1, 3.2, 3.3 and 3.4 is that the y-axis is logarithmic.
15
>
<
>
<
>
<
>
>
<
>
CHAPTER 3. RESULTS
10 0
Backtracking
Forward checking
Constraint propagation
Minimum remaining value
10 -1
Frequency
10 -2
10 -3
10 -4
10 -5
0
0.5
1
1.5
2
2.5
3
3.5
4
×10 4
Running-time in ms
Figure 3.1: The same 100.000 24 hint multi-solution puzzles solved with all four
backtracking-based algorithms. The 40 largest results were omitted from the histogram to make it easier to compare the results.
10 0
Backtracking
Forward checking
Constraint propagation
Minimum remaining value
10 -1
Frequency
10 -2
10 -3
10 -4
10 -5
0
0.5
1
1.5
Running-time in ms
2
2.5
3
×10 4
Figure 3.2: The same 100.000 single-solution puzzles solved with all four
backtracking-based algorithms. The 40 largest results were omitted from the histogram to make it easier to compare the results.
16
3.2. ALGORITHM COMPARISONS
10 0
Constraint propagation
Minimum remaining value
10 -1
Frequency
10 -2
10 -3
10 -4
10 -5
0
50
100
150
200
250
Running-time in ms
Figure 3.3: The same 100.000 single-solution puzzles solved with constraint propagation and minimum remaining value. The 20 largest results were omitted from
the histogram to make it easier to compare the results.
10 0
Single Minimum remaining value
Multiple Minimum remaining value
10 -1
Frequency
10 -2
10 -3
10 -4
10 -5
0
50
100
150
200
250
Running-time in ms
Figure 3.4: 100.000 single-solution puzzles solved with minimum remaining value
and 100.000 multiple-solution puzzles solved with minimum remaining value. The
15 largest results were omitted from the histogram to make it easier to compare the
results.
17
Chapter 4
Discussion
4.1
Larger Puzzles
Originally, the goal was to also compare how well the different algorithms handled
puzzles of larger sizes (16 and 25). Generating puzzles of size 16 was not a problem,
since the average time was well below a second, but the same wasn’t true about
puzzles of size 25. There puzzles took a lot longer to generate, so they would not
be useful for testing where we require large quantities.
But, solving puzzles of size 16 proved to also be impossible, as most puzzles took
many seconds to solve, and some even ran for hours without ever returning an
answer. This shows how quickly the problem grows with increased field-size. Unfortunately this made testing larger puzzles impossible and our focus shifted to
puzzles of size 9.
4.2
Outliers
When a human attempts Sudoku he or she will fill in cells until a ’crash’ occurs, that
is when a cell is unable to be filled by any number because of the rules of Sudoku.
In this case most people would identify the problematic cells and try to resolve the
’crash’ by changing a minimum number of cells. Our algorithms, while good at
predicting crashes do not behave in the same way. When a crash is encountered the
algorithms backtrack one decision at a time in order to resolve the conflict. The
algorithms, much like humans, guess a number for a cell, it does not calculate the
correct number, this might seem obvious to the reader but if an early guess proved
disastrous, the human could when enough logic is applied find the problematic cell.
As the algorithms backtrack, an early incorrect guess could be quite disastrous for
the time it takes to find a solution, the human on the other hand is not backtracking
to find the problem.
18
4.3. MULTIPLE SOLUTIONS VERSUS SINGLE SOLUTIONS
An interesting observation is that our algorithms are searching for a solution, they
are not calculating it. The process of finding a satisfying solution to a Sudoku
field could therefore be described as a search tree. In this tree each ’guess’ made
by the algorithm would spawn a number of new branches, in the case of the most
basic algorithm, the backtracking, up to nine branches could be produced for every
guess. As the average Sudoku-field in the test data has 24 clues, 57 cells remain
for guessing. If every cell spawns nine sub-trees a worst case tree could have up to
1 + 9 + 92 + ... + 957 which is equal to (958 ) − 1/(8) = 2.7732 + e54. This number
is of course well above the actual worst case, as the given hints will prevent many
options, but it is a decent indication of how large the Sudoku problem is. As was
evident in our data, many outliers were produced when running the algorithms and
the reason is just that, the problem grows very quickly.
4.3
Multiple Solutions versus Single Solutions
When calculating averages we used multiple methods. The reason for this is that
the mean, while a popular method to find an average, is not well suited for data
with many and/or large outliers such as ours. It is evident from our table 3.1 that
the mean was not the best method to gauge the expected running time. One reason
is that when we asked the question if there is any obvious difference between solving
a single-solution Sudoku puzzle and a multiple-solution Sudoku puzzle, we expected
that the puzzle with multiple solutions allowed would, if different, be easier. The
reason is that the search tree in general has more correct solutions (as mentioned,
more than one solution is not guaranteed, only possible). But the mean would
suggest that the multiple-solution Sudoku was harder. This is probably due to the
many outliers and a different average was needed to properly compare expected
running times. The mean is also much more susceptible to unlucky guesses, that
could result in rare extremely long run-times. It is however still of interest that the
multiple-solution puzzles seemed to generate more outliers.
The second method used to find an average running time was the median. This
method is obviously not affected by the value of the outliers but could be affected by
the quantity of outliers. But if the quantity of outliers was enough to significantly
affect the median, they would hardly be outliers anymore. The median would
suggest that the single-solution puzzles are harder to solve than the multiple-solution
puzzles, which seems intuitive. For the more basic algorithms the difference in
running-time isn’t very large, but it grows noticeably as the algorithms become
more complex.
The mode average is simply the one value which occurs the most number of times.
In order to gauge the validity of this method it can be compared to the other
averages. It does seem to indicate that multiple solution puzzles are harder for the
two more advanced algorithms. However, the difference between the values is small
19
CHAPTER 4. DISCUSSION
enough to disregard. For all but the first algorithm the mode average returns an
average which is in the vicinity of both the median and the geometric mean, we can
therefore consider the mode an acceptable way of determining the expected time to
solve a Sudoku puzzle but there are better methods.
The geometric mean is the product of all run-times to the power of one divided by
the amount of run-times, as an example if we have n runtimes the geometric mean
would be the nth root of the product of alla runtimes. The geometric mean always
indicates that the unique puzzles are harder than the multiple-solution puzzles,
and just like the median the difference seems to grow as the algorithms become
more complex. It is also worth noting that outliers for some datasets represent
misreadings or incorrect values, which in this case is not true, as our outliers are
the result of poor guesses by the algorithms. But when comparing algorithms and
their running times for Sudoku, we are more interested in expected run-times, in
which case an average that tones down the impact of outliers is preferred. Thus the
median is the most relevant value for us.
Looking at all our averages it seems likely that as the algorithms become more
complex the difference in difficulty between the two types of puzzles grows. However,
when observing the more basic algorithms it would seem that other things, such as
early incorrect guesses, impact the run-time more than whether or not multiple
solutions are allowed. Looking at figure 3.4 we can see that contrary to most other
curves these two cross. The curve for single-solution puzzles is weighted more to
the left side, but still has larger mean and median values. This would suggest that
we can not assume that using puzzles with multiple allowed solutions will yield the
same results as when using single-solution puzzles. Thus only comparisons done on
single-solution puzzles will be of interest for us.
As mentioned we expected the multiple solutions puzzles to be easier than the
unique solution puzzles, and on that hypothesis we chose our average to reflect
that hypothesis. It must however in the interest of a proper scientific report be
mentioned that our hypothesis could be wrong, thus it is possible that we chose
results motivating a faulty hypothesis. It is unlikely that a multiple solution puzzle
is easier, however we set out to determine of the difference was great enough to make
a difference for running-times. Our results must therefore be weighed against the
accuracy of our timing method, the various results seen could be directly related to
improper timing and therefore indicate that unique puzzles are easier than multiple
solution puzzles. In the end we chose to disregard smaller differences in averages
due to this possibility, we did however choose the median because it did fit the
hypothesis, in the end we still feel this choice has been properly motivated such
that we do not simply train our results to our hypothesis.
20
4.4. ALGORITHM COMPARISONS
4.4
Algorithm Comparisons
Although our results didn’t support using multiple-solution puzzles as an equivalent
substitute for single-solution puzzles, we can still see, when comparing figures 3.1
and 3.2, page 16, that they are quite similar. If, for some reason, the time it takes
to generate puzzles with unique solutions is too long, it would not be unreasonable
to allow multiple solutions as you could still acquire relevant results, although not
as precise. But for our comparisons we will only be looking at the single-solution
puzzles.
Looking at figure 3.2, the run-times are mostly as expected. It is somewhat surprising just how much slower it was with forward checking. The problem being that
forward checking requires many calculations and data structures that basic backtracking has no need for. The forward checking optimization itself, not counting
the calculations, is likely to have improved run-time. However, it wasn’t enough to
make up for the apparently very large costs of keeping track of possible values.
In the same figure, we can see that constraint propagation more than made up for
these costs. Looking at table 3.1 we can see that the mean is almost 1/100th of
the mean for forward checking. This is easily explained by the new ability to exit
incorrect branches a lot earlier in many cases, and is not very surprising.
How much of an improvement the addition of the minimum remaining value heuristic
was becomes clear when taking a look at figure 3.3. While not as impressive as the
addition of constraint propagation, the relative difference is still substantial. The
most impressive part is how small a change this optimization was to the code.
4.5
Possible Improvements
When solving other games such as Checkers or Chess, algorithms examine every
possible move simultaneously. It is possible that implementing this on our fastest
algorithm would improve running times even more. Our fastest algorithm is very
good at guessing, but it is still guessing, and wrong guesses mean the algorithm
has to backtrack. However if all guesses were investigated simultaneously, guesses
which lead to conflicts could simply be discarded as they were obviously wrong.
This would however require a greater memory resource.
21
Chapter 5
Conclusions
5.1
Findings
As evident in figure 3.1 using appropriate methods to measure the average a significant difference between unique and multiple solution puzzles could be found. However for the more rudimentary algorithms the difference was smaller, most likely
due to the effects of an early wrong guess.
As we have shown, any results derived from tests on puzzles with multiple allowed
solutions cannot be assumed to hold for traditional Sudoku puzzles. However, general results are still likely to apply, as we have seen that the broader patterns are
similar in both (for example approximate differences between algorithms, determining which is faster etc.).
Concerning the different algorithms, it was regrettable that we were unable to implement the Sudoku solver using SAT, as it would have been interesting to compare
something more different to our quite simplistic backtracking-based solutions. But
some interesting results were still acquired.
Basic backtracking proved to alone be sufficient for solving Sudoku puzzles of size
9 within reasonable time-frames. The algorithm is also very simple and easy to
implement.
However, the final algorithm proved to be an enormous improvement, with the only
downside being the larger memory usage required to save all possible values, which
is close to negligible with the computers of today.
22
5.2. POSSIBLE SOURCES OF ERROR
5.2
5.2.1
Possible Sources of Error
Implementation
It is of course impossible to be sure that our implementations did not give different
algorithms unfair advantages or disadvantages. To prevent this we did our best in
implementing as simple versions as possible. All our code is also available publicly,
if you wish to take a closer look, or use it for projects of your own[10].
The SAT based solver was indexed quite high by Google, which would indicate that
the reason our algorithm provided incorrect solutions was an incorrectly implementation. However, the substantial differences in running times would still make any
comparison pointless, unless that also changed with a correct implementation. A
faster SAT-solver would need to be used in order to build a comparative algorithm.
One such solver which we found was SAT4J. Unfortunately, due to poor documentation and lack of time we were not able to implement it.
5.2.2
Measuring time
As mentioned earlier, the nanosecond measurements do not really have nanosecond
precision, but rather around the millisecond. This problem should be somewhat
alleviated by the large number of tests, and the fact that it is only the different
types of averages that are of interest. Also, the smallest mean we measured was
the approximately 1 milliseconds for the minimum remaining values algorithm used
on multiple-solution puzzles. This number should be fairly accurate, and thus all
other averages as well. The larger issue is most likely that run-times vary a lot
from puzzle to puzzle. Once again our large sample size should have done a lot to
counteract this, but some effects are likely to remain. Our general results should be
quite accurate. To further improve our results, more tests could have been done,
but unfortunately, it would be very unpractical. Solving 100.000 puzzles with the
basic backtracking algorithm took approximately six hours, so any larger amounts
would be hard to test.
5.3
Further Research
While we have gotten some interesting results, there is still a lot of room left for
further research in the area. For example, one very interesting subject is the amount
of solutions in our multiple-solution puzzles. It is quite trivial to compute, except
that it would take very long, since a customized solver would have to find perhaps
thousands of solutions. Different algorithms for generating multiple-solution puzzles
might lead to different average solutions, so it would be very interesting to see how
this would affect run-times.
23
Bibliography
[1]
Mike Schermerhorn, A Sudoku Solver. [Internet].; [cited 14th February
2015] Available from: http://www.cs.rochester.edu/~brown/242/assts/
termprojs/Sudoku09.pdf
[2]
Oracle Java 7 docs for System.nanoTime() [Internet].; [cited 31st March 2015]
Available from: http://docs.oracle.com/javase/7/docs/api/java/lang/
System.html#nanoTime%28%29
[3]
Mattias Harrysson, Hjalmar Laestander Solving Sudoku efficiently
with Dancing Links. [Internet].; [cited 14th February 2015] Available
from:
https://www.kth.se/social/files/54bda0d3f276541354ec0425/
HLaestanderMHarrysson_dkand14.pdf
[4]
Tristan Cazenave, A search based Sudoku solver. [Internet].; [cited 7th May
2015] Available from: http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.64.459&rep=rep1&type=pdf
[5]
Takayuki Yato, Takahiro Seta Complexity and Completeness of Finding Another Solution and Its Application to Puzzles . [Internet].; [cited 26th February 2015] Available from: http://www-imai.is.s.u-tokyo.ac.jp/~yato/
data2/SIGAL87-2.pdf
[6]
Roberto Fontana, Random Latin squares and Sudoku designs generation. [Internet].; [cited 5th February 2015] Available from: http://arxiv.org/pdf/
1305.3697.pdf
[7]
Patrik Berggren, David Nilsson A study of Sudoku solving algorithms. [Internet].;
[cited 14th February 2015] Available from:
http://www.csc.kth.se/utbildning/kth/kurser/DD143X/dkand12/
Group6Alexander/final/Patrik_Berggren_David_Nilsson.report.pdf
[8]
Paul Varoutsos, A SAT-solver based Sudoku puzzle solver. [Internet].; [cited
30th March 2015] Available from: https://github.com/PaulVaroutsos/
SudokuSolver
24
BIBLIOGRAPHY
[9]
Gary McGuire, Bastian Tugemann, Gilles Civario There is no 16-Clue Sudoku:
Solving the Sudoku Minimum Number of Clues Problem.. [Internet].; [cited 8th
May 2015] Available from: http://arxiv.org/abs/1201.0749
[10] Jonathan Golan, Joel Kallin Source code for all solvers and the testing Available
from https://github.com/Jogol/SudokuProject
[11] Jon Kleinberg, Eva Tardos Algorithm Design. 23rd ed. London: Pearson; 2006
25
www.kth.se

Download Report

Measuring Improvements and the Effects of Multiple and Unique

Paperzz.com

Your Paperzz