paper - People Server at UNCW

Automated Maze Generation and Solving Over Epochs
Chad Harris
b.
Abstract
This project entailed the desire to create a
categorization of maze generation algorithms based
on a function of both the time needed for
generation and the effectiveness of the maze as
determined by the solve time of each.
Keywords
Maze Generation, Randomized Prim’s Algorithm,
Randomized Kruskal’s Algorithm, Depth First Search,
Adjacency Lists
Introduction
The attempt at classifying the effectiveness of maze
generation algorithms is a problem rooted in the
idea of establishing a justification and reasoning
behind choosing a particular approach toward the
creation of mazes so as to maximize the trade-off
between time to generate and the time needed to
solve. In providing a ratio of time to generate a maze
versus the solve time, a particular ranking can be
established that would allow an overview analysis as
to which algorithm would be the best approach
when computation time is of primary concern and
the difficulty of any particular maze is understood to
be a deciding factor.
Problem Description
Informal Definition
Find all averages for each maze generation algorithm
over a particular predefined trial number and
generate a ratio when paired with the average of
each solving algorithm trial for each maze over a set
number of epochs.
Constraints
Constraints for Measurements
1.) Recording times for generation of mazes
strictly tied to core algorithm.
2.) Recording times for solving of mazes strictly
tied to universal solve algorithm (DFS).
3.) Starting and finishing points set at (0,0) and
(n - 1, n - 1) respectively (as based in
indexing).
a. Measurements yielded for a maze
of n x n size.
Ranking based on ratio of average
generation time / average solve
time
Formal description
The primary step in identifying the ranking for each
algorithm is in first determining the average for the
summation of each trial run of each maze generation
algorithm. This can be identified as:
(Where DFS is Depth First Search, KRUS is Randomized
Kruskal’s Algorithm, and PRIM is Randomized Prim’s
Algorithm)
DFS(avg) = Summation[DFS(1), DFS(2), DFS(3),…DFS(n)] / n
KRUS(avg) = Summation[KRUS(1), KRUS(2),
KRUS(3),…KRUS(n)] / n
PRIM(avg) = Summation[PRIM(1), PRIM(2),
PRIM(3),…PRIM(n)] / n
This data is then necessary for the pairing of
averages for each solve time average inherent in
each of the generation algorithms over a set number
of trials. The averages for the solve times are created
much the same way as shown below.
(Where DFS is Depth First Search, KRUS is Randomized
Kruskal’s Algorithm, and PRIM is Randomized Prim’s
Algorithm and “_solve” applies to the DFS search solving
algorithm)
DFS_solve(avg) = Summation[DFS(1), DFS(2),
DFS(3),…DFS(n)] / n
KRUS_solve(avg) = Summation[KRUS(1), KRUS(2),
KRUS(3),…KRUS(n)] / n
PRIM_solve(avg) = Summation[PRIM(1), PRIM(2),
PRIM(3),…PRIM(n)] / n
From there an understanding of algorithm rank can
be understood overall as a ratio is established that
allows the rank to be calculated as a quotient of the
generation time over the solve time:
DFS(rank) = [DFS(avg)/DFS_solve(avg)]
KRUS(rank) = [KRUS(avg)/KRUS_solve(avg)]
PRIM(rank) = [PRIM(avg)/PRIM_solve(avg)]
Finally, from the set of algorithm ranks, we find the
optimal algorithm by identifying the algorithm that is
strictly less than all of the other algorithms by
comparison:
ALGO(rank) =set {DFS(rank), KRUS(rank), PRIM(rank)}
ALGO(rank_optimal)(1) < ALGO(rank)(2) &&
ALGO(rank_optimal)(1) < ALGO(rank)(3)
Data Representation
The representation of data for each associated maze
consists essentially of two data sets. The adjacency
list that associates the indexes of cells and every
other cell in each cardinal direction provides both
information for cells around the current cell in
question as well as a Boolean that indicates whether
or not the wall between the two adjacent cells has
been removed. The second main representation
consists of a list of cells and their associated open
positions corresponding to either a 1 or 0 indicating
that there is a barrier. Additionally cell properties
include an additional parameter indicating whether
the cell has been visited or not corresponding to a 1
for having not been touched or a 2 for indicating
that the cell has already been visited (after the depth
first solving algorithm increments said cell property).
Depth First Search
This algorithm is based on the idea that a graph is
traversed depth wise such that when a dead end is
encountered a stack is utilized to remember the last
position if no unvisited nodes exist for the current
node in question. The example illustrated below
shows the essence of depth first searching by
demonstrating that at the crux of the algorithm, B is
the last node visited but has no adjacent nodes that
have been unvisited. As a result B is popped off the
stack and the last node D is encountered. D has an
unvisited node and therefore C results as the last
and final node in the sequence.
Randomized Kruskal’s Algorithm
For this algorithm, an edge is simply selected at
random and joins the two cells on either side of the
edge if those cells are not already connected by a
path.
Kruskal’s finishes when there are no more edges to
draw from. (Only a single set is left)
Randomized Prim’s Algorithm
Unlike Kruskal’s algorithm which attempts to
generate a maze in a seemingly random fashion
across the entire maze, Prim’s algorithm attempts to
grow the maze from one point outward.
A cell is selected at random and adjacent cells are
then identified. From there a random adjacent cell is
selected and the wall broken between the active cell
and the now selected frontier cell. Continuing this
process with the now newly selected frontier cell
that is a part of the maze, the process progresses
then with selecting the frontier cells of the former
frontier. Ultimately since the algorithm is
randomized the process continues not caring which
neighbor is picked so long as the frontier cell is
already connected to a cell within the maze.
Analysis
Runtimes for each implementation are calculated
below:
V = Vertex
E = Edges
Generating mazes using Depth First Search ends up
becoming on order of time complexity a function of
the number of nodes (cells) and of the number of
edges for a given maze. Since for the number of cells
visited each edge has to be considered, the runtime
ends up becoming O(|V| + |E|). The Randomized
Kruskal’s Implementation on the other hand ends up
becoming O(logV) since for every vertex (node) each
node need only be visited once for which every edge
need not be considered. Finally the Randomized
Prim’s Algorithm implementation ends up becoming
on order of O(V^2) since for each cell frontier cells
must be generated and added to the maze.
Results
In conclusion, Kruskal’s algorithm ended up
becoming the overall optimal solution with regards
to the comparison of generation and solve times
overall. As illustrated in the graphics below with the
seemingly insignificant example of a 2x2 maze along
with the overall comparison of maze generation and
solves from a 2x2 maze all the way up to a 20x20
maze, Kruskal’s algorithm repeatedly performed the
best. Solve times unfortunately provided such a
small influence in the overall ranking that the
generation times ended up becoming the deciding
factor ultimately with regards as to which algorithm
had the best ranking.
2x2 Maze Sample Data
2x2 Maze Sample Solving Data
Overall Results of Maze Rankings Based on Averages of 100 Trials Each
ALGORITHM RANKING VS. MAZE SIZE
DFS RANK
350
KRUS RANK
PRIMS RANK
300
250
200
150
100
50
0
2X2
3X3
4X4
5X5
6X6
Why 380?
The implementation of designing and examining the
results of the experiment necessary for maze
generation and solving end up becoming elements
7X7
8X8
9X9
10X10 15X15 20X20
studied in such a course as CSC 380.
For this reason CSC 380 is deemed as an appropriate
environment for examining such an approach.