Ira Dunn

Programming Assignment 1
Computer Science 447: Artificial Intelligence
Genetic Algorithm Report
Due September 23, 2002
By Ira Dunn
Introduction
Given some NP-complete problem, genetic algorithms and evolutionary computation
have been shown to supply acceptable solutions in the face of a very large search space.
This is especially true when the EA methodology shows a clear advantage over some
sequential or random search method. In this paper, I outline the solution to the
Conjunctive Normal Form Satisfiability problem via a simple genetic algorithm involving
single point cross over and 3-bit mutation. I first attempt a methodology prone to
inbreeding and show its shortcomings. Then, I perform the algorithm using a method that
keeps diffuse genetic information in the population via a probabilistic ranked selection of
two vectors to mate for offspring.
Problem
Given a Boolean expression in Conjunctive Normal Form (CNF) [Figure 1], a solution
may exist that would result in a “true” evaluation of the expression. The question in hand
is that of solving the expression with three or-terms and-ed together in 275 expressions:
thus the 3-SAT designation. Since the possible solutions space is 250, solving with a
methodological or random search is quite time consuming and processor intensive.
Because of the inter-relationships of each expression, finding segments of solutions to
keep constant is also not possible.
CNF(eval) = (x12 | x43 | ~x6)&(x20 | x38 | x2)&…
Figure 1: an example section of the CNF satisfiability equation.
For this problem, the equation was given in signed integer format in the file 3sat.dat,
which has been left off for space purposes. The expression contained 275 3-variable
terms, where negated terms represented NOT-ed variables. Population members are
Boolean vectors of length 50 (for variable values x1 through x50).
Solution
Our answer to this problem is to implement a
Genetic Algorithm to select through a
population of random solutions and reproduce
them through several generations, biasing the
more-fit vectors. Two methodologies were
implemented and compared. The first is
performed as seen in Figure 2. After
determining the first, second, and last ranked
vector members, the top two are combined via
probabilistic Crossover with bit-wise Mutation
of up to 8 mutations. Then, the new child
vector is compared with the worst ranked
vector in the population, the winner taking the
bottom ranked vector’s place in the
population. This method was chosen to
preserve superior features of the highest
ranking population members. This author
posits that for large populations (N>100), this
method would be a reasonable system for
removing very poor performers. However, for
the sake of processing time, a smaller
population has typically been chosen.
V_rank_1
Crossover
V_rank_2
V_rank_3
Mutation
Mutation
V_rank_rand
V_rank_last
Figure 3: Secondaryl genetic
methodology for the CNF_3SAT
Crossover
V_rank_1
V_rank_2
V_rank_3
Mutation
V_rank_last
Figure 2: Initial genetic
methodology for the CNF_3SAT
Second, a different method of solving the
CNF_3SAT problem was applied. In this
model, seen below in Figure 3, a ranked
probabilistic function selects two vectors
for possible Crossover and Mutation. The
exact workings of the ranking function
multiplies the evaluation of each vector by
an iterative random float with range [0 1].
Those vectors with higher rankings will
have a statistically higher chance of being
ranked above those with lower
evaluations. The top two of these
randomized vectors are chosen for
crossover/mutation/child generation.
Also, a random mutation operator picks
one population member and flips a small
random number of bits between
generations. This adds random mutation
to the regeneration process in a processorfriendly manner (fewer calls to random),
but requires a high probability in the
Bit_Mutation variable.
Experimental Methodology
The file “hw1.cpp” [Attachment 1] is the central code for running this problem. Please
note that all vectors are initialized to random values using the long integer random() c++
command. Subsequent calls for random variables implement the modulo of random()
with some range value to return a [0 X-1] random integer.
The code reads in parameters from the file “params.dat” [Attachment 4], then initializes
the member vectors. By executing the CGA class member function eval(vector), a
floating point number between 0 and 1 is returned corresponding to the decimal
percentage of terms evaluated to “TRUE.” That is, if 250 of the 275 terms were
evaluated true for a particular vector m_x, then the evaluation function returns 250/275 =
.909091.
After evaluating the members, the population and its corresponding evaluation vectors
are sent to the Spawn function, which performs all of the ranking, crossover, and bit
mutation for the vectors.
Once the code has been compiled, appropriate parameter values can be chosen and saved
into params.dat. The screen outputs Generation number, maximum evaluation for that
generation, and an average for the same. It also indicates when bit mutation and
crossover has taken place in a child vector. From the code, it can be seen that many
“cout” statements have been commented out. If the population would like to be viewed,
ga.cpp contains a two-fold for-loop for printing to the screen via the “vomit(vector)”
member function.
Should the program exit before Gens generations have been reached, a solution has been
found and the solution vector should be printed to the screen lastly.
Experimental Results
As of this report, no solution has been found for this CNF_3SAT problem using the
methods applied above. Very near solutions have been repeatedly found (99.3%).
Several different parameter manipulations were used, and the results of these are listed
below in Figures 4 and 5. Figure 4 is a chart outlining the parameters for each run.
Run
1
2
3
4
5
6
Population
15
15
15
15
15
20
Crossover
Probability
Bit Mutation
Probability
Experience
Factor
0.02
0.01
0.01
0.01
0.01
0.01
0.45
0.55
0.55
0.55
0.55
0.55
-0.01
-0.01
-0.01
-0.01
-0.01
-0.01
Notes
Probabilistic Ranked Selection
Probabilistic Ranked Selection
Determine. Ranked Selection
Determine. Ranked Selection
Add Nuke to inc randomness
Child Randomly Placed
Figure 4: Table of Parameter values
The runs presented below in Figure 5 were generally indicative of system performance
for the same values. This chart indicates the progression of maximum evaluation values
per every 10 generations in bold, with the corresponding average values in a thin line of
the same color. It is important to note that the superior performance of the 6th run has
somewhat to do with its initial starting evaluation of over .94.
Evaluation
1
1 Eval[max]
1 Eval[avg]
2 Eval[max]
2 Eval[avg]
3 Eval[max]
3 Eval[avg]
4 Eval[max]
4 Eval[avg]
5 Eval[max]
5 Eval[avg]
6 Eval[max]
6 Eval[avg]
0.98
0.96
0.94
0.92
0.9
0.88
0.86
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
Generations (x10)
Figure 5: Evaluation of 6 runs of the genetic algorithm.
The most dramatic effect of this chart is to compare the variance among the different
runs. Run 6 showed the greatest variance among the population members: this owing to a
larger member set and the switch to random placement of the child vector rather than
bottom-replacement.
Conclusion
Though my algorithm revealed no profound or successful solutions to the CNF
satisfiability problem, it showed great potential and a definitive movement in the proper
direction. The genetic information of more successful population members was passed
on and mutated to generate new, more-fit populations.
The most profound shortcoming of my methodology is the inbreeding caused by genetic
inheritance. With the exception of run 6, all runs revealed convergence and loss of
variance after 150 generations. The implementation of the random child placement
prevented “loser lose all” without compromising “winner take most-all.” I therefore
conclude that future programs will implement a more traditional genetic algorithm
selection schema
Attachment 1
hw1.cpp
/* hw1.cpp
Ira Dunn
CS328, FS2002
*/
#include<fstream.h>
#include<vector.h>
#include<iostream.h>
#include<time.h>
#include<stdlib.h>
#include "ga.h"
#define LEN 50
void new_line(ifstream&); //searches for end of lines in input files
void vomit(vector<bool>&); //outputs to screen a vector
int main()
{
//**********Allocations***********//20
unsigned Gens;
//constants from files
unsigned Gen_per_log, generations;
//constant
unsigned Pop_size;
//dynamic
float Cross_Prob;
float BitMutation_Prob;
float Experience;
ifstream file_in;
ofstream file_out;
unsigned i, j, tmp_rnd;
//vector<bool> m_y(LEN);
vector<float> m_eval;
bool exit_flag=0;
float avg_eval,max_eval;
//working paramaters.
vector<vector <bool> > pop_x; //population of vectors
vector<bool> m_x(LEN);
//member vector
CGA CNF_3SAT;
//instantiation of the class
//**********Program***********//
//1*
//2*
//3*
//4*
//5*
Assign Rand values to all 50 x_vars
Eval each member (if eval=1, then finish)
Spawn offspring via mutation
Place child in population (currently via rand)
Return to 2
//open file and read in parameter values
file_in.open("params.dat");
file_out.open("gens.log");
file_in>>Gens;
new_line(file_in);
file_in>>Gen_per_log; new_line(file_in);
file_in>>Pop_size;
new_line(file_in);
file_in>>Cross_Prob;
new_line(file_in);
file_in>>BitMutation_Prob; new_line(file_in);
file_in>>Experience;
//1*
Initialization
CNF_3SAT.set_params(Pop_size, Cross_Prob, BitMutation_Prob, Experience);
//CNF_3SAT.init();
srandom(time(NULL));
for(j=1; j<=Pop_size; j++)
{
for(i=0; i<LEN; i++)
{
m_x[i]=random()&01;
}
pop_x.push_back(m_x);
m_eval.push_back(0.0);
}
cout<<("\n\n\n\n\n
generations=0;
****RUN****\n\n\n\n\n");
//Generation counter for logging
while(generations<Gens)
{
avg_eval=0; max_eval=0;
//2* Eval population member (if eval=1, then finish)
cout<<("Generation ")<<generations+1<<("
");
for(i=0; i<Pop_size; i++)
{
m_x=pop_x[i];
m_eval[i]=CNF_3SAT.eval(m_x);
avg_eval+=m_eval[i];
if(m_eval[i]>max_eval)
max_eval=m_eval[i];
//cout<<("Member ")<<i+1<<("\n");
//vomit(m_x);
//cout<<("\n")<<("evaluation =")<<m_eval[i];
if(m_eval[i]==1.0)
exit_flag = 1;
//Flags a correct solution
}
cout<<("Average Eval is ")<<(avg_eval/Pop_size)<<("
");
cout<<("Max Eval is ")<<max_eval<<("\n");
//Output data to file "gen.log"
if((generations%Gen_per_log)==0)
{
//send data to file
/*file_out<<("Generation ")<<generations<<("\n");
for(i=0; i<Pop_size; i++)
{
m_x=pop_x[i];
for(j=0; j<LEN; j++)
file_out<<m_x[j]<<("");
file_out<<(" ")<<m_eval[i]<<("\n");
}*/
file_out<<max_eval<<(",")<<(avg_eval/Pop_size)<<("\n");
}
if(!exit_flag)
{
//3* Spawn offspring via mutation
m_x=CNF_3SAT.Spawn(pop_x, m_eval);
CNF_3SAT.Nuke(pop_x); //mutates one random population member
//4* Competition between child and old worst vector in Spawn
}
else
{
generations = Gens;
//will exit loop prematurely due to solution
cout<<("A solution has been found: ")<<("\n");
for(i=0; i<Pop_size; i++)
{
m_x=pop_x[i];
if(m_eval[i]==1)
vomit(m_x);
}
}
generations++;
}
file_out.close();
file_in.close();
return 1;
}
void new_line(ifstream& in_stream)
{
char symbol;
do
{
in_stream.get(symbol);
}while(symbol !='\n');
} //Taken from Savtich, "Problem solving with C++" Second Edition
void vomit(vector<bool>& mem)
{
for(unsigned j=0; j<LEN; j++)
cout<<mem[j]<<(" ");
}
Attachment 2
ga.h
/* ga.h
Ira Dunn
CS328, FS2002
*/
//ifndef#
#include<vector.h>
#define FILELEN 275
class CGA
{
//constructors
public:
void set_params(int, float, float, float);
void init();
float eval(vector<bool>&);
vector<bool> Spawn(vector <vector<bool> >&, vector<float>);
vector<bool> Fight(vector <bool>, vector <bool>);
void Nuke(vector<vector<bool> >&);
private:
int Pop_size;
//Size of population
float Crs_prob; //Probability of Crossover
float Bit_mut_prob; //Probability of three-bit mutation
float Expr;
//Experience factor for inter-generational fight
};
//endif#
Attachment 3
ga.cpp
/* ga.cpp
Ira Dunn
CS328, FS2002
*/
#include
#include
#include
#include
#include
#include
<fstream.h>
<stdlib.h>
<vector.h>
<iostream.h>
<time.h>
"ga.h"
#define LEN 50
#define TERMS 275
//unused--intiation and data used inline of Main()
void CGA::init()
{
}
//Returns floating point between 0 and 1 for evaluation value
float CGA::eval(vector<bool>& v1)
{
//Evaluate POS terms
fstream fin;
int par1, par2, par3;
bool p1, p2, p3;
float temp;
fin.open("3sat.dat");
for(unsigned i=0; i<TERMS; i++)
{
fin>>par1>>par2>>par3;
p1=v1[abs(par1)];
p2=v1[abs(par2)];
p3=v1[abs(par3)];
if(par1<0)
p1 = !p1;
if(par2<0)
p2 = !p2;
if(par3<0)
p3 = !p3;
if(p1|p2|p3)
temp+=1.0;
}
temp/=(TERMS-1);
fin.close();
return temp;
}
//Returns vector (unutilized) and mutates, crosses, and performs various
//functions on the members of the population according to the evals
vector<bool> CGA::Spawn(vector<vector<bool> >& pop_v, vector<float> evals)
{
vector<bool> v1_temp(LEN, 0), v2_temp(LEN, 0);
vector<bool> child(LEN, 0);
unsigned b, j, k;
unsigned v1i, v2i, vboti;
unsigned prob_win, prob_sec;
float deter1_temp=0;
float deter2_temp=0;
float deter3_temp=1;
float prob1_temp=0;
float prob2_temp=0;
float f_rand;
//Select two best members, worst member, and ranked random member
for(j=0; j<Pop_size; j++)
{
//Find top two vectors
if((evals[j])>=deter1_temp)
{
v2i = v1i;
v1i = j;
v2_temp = v1_temp;
v1_temp = pop_v[j];
deter2_temp = deter1_temp;
deter1_temp = evals[j];
}
else if(evals[j]>deter2_temp)
{
v2i = j;
v2_temp = pop_v[j];
deter2_temp = evals[j];
}
//Find lowest vector and index
if(evals[j]<=deter3_temp)
{
vboti=j;
deter3_temp = evals[j];
}
//select 2 ranked best from rand*eval (high eval has advantage)
f_rand=(random()%100000)/100000.0;
if((f_rand*(evals[j]))>=prob1_temp)
{
prob_sec = prob_win;
prob_win = j;
prob2_temp = prob1_temp;
prob1_temp = evals[j];
}
else if(f_rand*evals[j]>prob2_temp)
{
prob_sec = j;
prob2_temp = evals[j];
}
}
//when used, these lines use a probabilistic approach to
//determining the parents v1 and v2 for crossover
//v1_temp = pop_v[prob_win];
//v2_temp = pop_v[prob_sec];
//Variable point crossover (single point)
//Produces sexual offspring in crossover swap
if((float)(.0000001*(random()%10000000))<Crs_prob)
{
b=(random()%LEN);
cout<<("Single Point Crossover at ")<<b<<("\n");
for(j=0; j<LEN; j++)
{
if(j>b)
{
child[j]=v1_temp[j];
//v1_temp[j]=v2_temp[j];
//v2_temp[j]=child[j];
}
else
{
child[j]=v2_temp[j];
//v2_temp[j]=v1_temp[j];
//v1_temp[j]=child[j];
}
}
}
else
child=v1_temp;
//Random number of random mutations (0-9) per mut_option
if((float)(.0000001*(random()%10000000))<Bit_mut_prob)
{
j=(random()%10);
cout<<("Swap Bits ");
for(unsigned k=0; k<j; k++)
{
b=(random()%LEN);
cout<<b<<(" ");
if(child[b])
child[b]=0;
else
child[b]=1;
}
cout<<("\n");
}
pop_v[random()%Pop_size]=child;
//Rather than the above line, setting pop_v[ivbot] = following replaces least fit
//Fight(pop_v[vboti], child); //worst member replaced by the young generation
return pop_v[vboti];
}
vector<bool> CGA::Fight(vector<bool> _old, vector<bool> _new)
{
//Experienced vector has a small advantage through expr parameter
float ev1, t1, t2;
/*In the following algorithm, if the new and old vectors have very
close evaluation values, then the old vector will be at a disadvantage
for negative Expr values, but at an advantage for positive values*/
t1=eval(_old);
t2=eval(_new)-t1;
ev1 = eval(_old)*(1+Expr)*(1-t1);
if(ev1>eval(_new))
{
cout<<("(old)")<<("\n");
return _old;
}
else
return _new;
}
void CGA::Nuke(vector<vector<bool> >& vec_v)
{
//mutates one random member of the population
if((float)(.0000001*(random()%10000000))<Bit_mut_prob)
{
unsigned i, j, k, b; //working parameters
vector<bool> _temp(LEN);
j=(random()%8); //number of mutations
i=(random()%Pop_size); //member of population
_temp = vec_v[i];
for(k=0; k<j; k++)
{
b=(random()%LEN);
if(_temp[b])
_temp[b]=0;
else
_temp[b]=1;
}
vec_v[i]=_temp;
}
}
void CGA::set_params(int i1, float f2, float f3, float f4)
{
Pop_size = i1;
Crs_prob = f2;
Bit_mut_prob = f3;
Expr = f4;
}
Attachment 4
Parmas.dat
1000
//generations
10
//generations per log
15
//population size
0.010 //cross over prob
0.55 //mutation prob.
-.01
//experience