LEC 6: Generating New Strategies: Genetic Algorithm vs Alpha

LEC 6: Generating New Strategies: Genetic Algorithm vs Alpha-Go Deep Learning plus Goals
National Prayer Breakfast, Canberra "Up to 500 people attend on 50 tables. There are a wide range of protocols, as
well as mix of guests to be considered and each table is carefully arranged with designated seating. For example we aim
for two Members of Parliament at each table, from the same party and seated opposite each other but not with their back
to the stage, and where possible a male and a female. The top tables are significant with diplomats, guest speakers and
head of churches and government. Each table is a mix of denominations and gender and special requests. The Prime
Minister and VIPs have additional requirements and maybe a body guard or two. Attendance is often changed at the last
minute, this year we didn't know if the Prime Minister would arrive as he was coming from overseas. This meant
building in flexibility to fill spaces as it is not nice to have a space on the top table. We arranged a possible stand-in
and strategically placed students in positions where they could be juggled if necessary.
Add to this the security requirement that the Table Seating list had to be published and printed on the wall. In past
years we have done this manually and it has been really difficult. This year we used PerfectTablePlan and it made it so
much easier. Thank you PerfectTablePlan for making our function so much easier to organise."
The Problem: Given a rugged landscape, what is good search algorithm?
You search for a good search algorithm by seeing what works on different types of landscapes. Genetic Algorithm
evolves new strategies to find better solutions by mimicking evolutionary processes. It uses cross-over and mutation to
create new strategies in a code where elements are viewed as genes on a chromosome. It infers which factors give high
values from outcomes and increases the share of the population with those factors. It lives on good solutions being
neighbors and thus on correlated landscapes. But GA can fail on “royal road” set up to make it work well when it
converges too quickly or if optimum is far from other good solutions. Consider it external system or market solution.
Alternative is deep learning via many layered neural nets . It has succeeded in visualization and in Alpha Go
beating humans in Go. Multi-layered neural net finds solutions by mimicking human mind, with neurons and lots of
connections. Algorithm is presented with many objects many times and some supervised goal – classification in
visualization; winning Go game/other goal in another case – and changes its connection weights until it gets good result.
Genetic Algorithm
The GA EVOLVES a population of strategies into a new population with higher average profitability using trial
and error improvements and reproduction/survival of the fittest via Proportional Fitness Reproduction. To do a
GA search you: CODE strategies as 0/1 for different attributes.
Determine profits of current strategies
Create new population using some form of PFR. This is the mating population
Use “genetic operators” to create new strategies by :
cross-over, take two strings -- 101101 and 001001; split and join to get: 101001 and 001101.
random mutation, switch a random 0/1 or randomly change each point by some probability.
Inversion pick two points and switch them 10001 becomes 01001
EXAMPLE: In choosing research paper, you decide who to work with, data/theory, modes of analysis, content . You
have data on past papers in course and plan your paper on the basis of their strategies and success.
Which mix of attributes will make your paper a hit? Data on three attributes of previous papers show:
Alone or with friend
1
Friend
2
Friend
3
Alone
4
Friend
Emp/Theory
Empirical
Theory
Empirical
Empirical
Content/Presentation
Yes
Yes
No
No
Binary Rep Profits
011
3
001
1
110
6
010
2
Why is 110 so profitable? Here are all possible explanations consistent with data
It's working alone
1**, where * refers to any value/doesn’t matter.
It's empirical
*1*
It's no endorsement
**0
It's working alone and empirical
11*
It's working alone and no endorsement
1*0
It's empirical and no endorsement
*10
It's alone, empirical, and no endorsement
110
Nothing matters
***
These are schema/schemata, which generalize strings in terms of a common property, * for the generalized property:
*11 is the schemata for 011 and 111. It’s: anything 11
*1* is the schemata for 110, 010, 011, 110. It’s anything 1 anything.
A schemata is a theory of how the world works. Take the theory 1**, then profits rise. If 0s and 1s arrive randomly, 1**
will occur in 1/2th of the cases. It won't require too many cases to determine if it is right. A few 0s in first spot and high
profits will kill 1** off. A more specific rule would be if 111. This occurs in 1/8th of the cases. It will be harder to see
if it is right. Easier to learn from general rules than from specific ones.
The # of schemata > # possible data points. Our problem has 8 possible observables ( 23 ) but 33 = 27 schemata, since
3 symbols for each bit in the string instead of 2. The GA gains power because each observed string gives information
about many schemata. Observations about “other data points” can help determine why 110 is profitable. If 0 ** has
profits the 1 in first space is cannot be necessary for profits. But cannot tell if it is the 0 at the end or the * nothing that
matters or this is unique configuration.
How many schemata does a single string belong to? 110, most profitable strategy, belongs to 8 schemata listed above
A string with 3 bytes contributes to 23 =8 schemata; One with L bytes belongs to 2L schemata, Membership in any of
these schemata could explain why it is/is not profitable.
Why not pick the most profitable string and stick with it? Because you learn nothing about the landscape. If must
decide which strategy today, choose 110. But further search can teach about the other ways of making profit: 111,
100,101, 000. Economics says we should compare costs to benefits of search. The cost of experimenting is the
opportunity cost of selecting current best: what we make on others vs what we would make with 110:
for 111, we make 7, so the “cost” is 7-6 =1 (it’s a benefit, not a cost)
for 100, we make 4, so the cost is 4-6= -2
for 101, we make 5, so the cost is 5-6 = -1
for 000, we make 0, so the cost is 0-6= -6
If we randomly select the possibilities to explore, the cost would average -2. But if we direct our search to more
profitable areas, the cost would be smaller. The GA searches in areas where we expect more profitable outcomes. As
long as global optimum is near areas of high probablility, GA would direct us to the right areas since it “infers” that
good profits come from strings like 110. If the global max was 000, we would go in the wrong direction.
In two armed bandit problem there are two slot machines, each has a payoff with a variance v. The optimal
solution is not to pick the one that has the highest value after your first “experiment” nor the one with highest value even
after a number of tries. Always check once in a while the lower paying one -- that you simply had bad luck on it
previously. The optimal allocation gives proportionately more trials to the higher-paying arm, with the proportions
dependent on variances. If environment can change best strategy at t' may differ from the one at t so keep exploring.
The relative fitness/profits of a string determines the probability that it enters the next generation. Total profits/fitness
is12, the worst is 1 and the best is 6. Average profits are 3. A string with a score of 6 would twice as much as chance of
being in the next generation as a string with 3: its probability of being selected is 0.5. If we pick four strings, a string
with 0.5 fitness would very likely (94%) be selected at least once (prob you will not be selected is 0.54 = .06). A string
with .05 fitness would have a probability of NOT being selected at all of .954 = 81%
Drawing from the probability urn differentiates this procedure from the computer tournament, where we stock
the next generation with the relevant proportions. The probabilistic approach increases the chance that strategy with
low return might persist. This adds diversity into the population. Allowing some low return strategy to persist is a
common way to try to keep away from local optimum, as in simulated annealing.
Four steps
1) Represent strategy with 0s and 1s on a string. Neighbors are points close together and should be so coded
when they interact. Holland calls these compact building blocks. The GA works best with such blocks, so code them
early (but see discussion at http://en.wikipedia.org/wiki/Genetic_algorithm)
2) Associate profits to the strategies.
3) Make some technical decisions: % of cross-overs; rates of mutation or reversion ; size of population; etc.
4) Stop following some rule Economics rule is to evaluate the return from putting all resources in the current
best strategy vs the likely improvement from searching more about the strategy-profit space.
FOR SHORT TUTORIAL Some youtube simple videos, in 4-10 minutes. https://www.youtube.com/watch?v=YXMh-iw07w Two good lecs from IIT India https://www.youtube.com/watch?v=Z_8MpZeMdD4 and
https://www.youtube.com/watch?v=ra13Sv7XZ3M http://math.hws.edu/xJava/GA/ has a nice java applet
window showing the trend upward with lots of variation in a GA search for an optimum. Whitley, Darell is good
http://www.cs.colostate.edu/~genitor/MiscPubs/tutorial.pdf
The Fundamental Theorem of GA (Wasserman, p 87) says that GA using fitness-proportionate reproduction and
cross-over and mutation gives proportionate growth to more fit schemas.
If all we had was fitness proportionate reproduction, the best first generation strategy would dominate. In the GA this is
captured by f(S,t)/F(P,t) the reproductive part. If ratio is constant, n t+1/nt ~ 1+r, where r is S's value compared to the
average so r is the growth for profitable strategies/decline for non profitable ones. Need cross-over/mutation to search
more widely and explore more profitable spaces.
GA works by partitioning variable space into areas of higher and lower profitability and focusing search in the
higher profitability/fitness areas. Left graph partitions space by first element, 0 to left and 1 to right. If strings in 0**
partition /// have higher average profit than those in the 1** area search the hatched area. Say 0 in 2nd position has
higher average value, denoted by \\\. Take the two and the algorithm says search in double-hatched areas.
Now take a third point where XXX denotes profitable areas. Most of your search would be in the two disjoint areas that
combine the profitable places for the three “search mes”.
The GA divides the space into areas where the three schemata (0**, *0* and **0) say are promising. This is largely
unaffected by local optima since the partition is for above average outcomes. (But if the highest value was to the
right you would miss it). Like simulated annealing GA sacrifices some value to look for new strategies.
The extent to which this damages a profitable strategy depends on the the defining length of the string, d, which is the
distance between furthest specific (0 or 1) positions –the nearness of the well-specified parts. The 7 element schema
1**0*** has defining length 3 because you move 3 spaces to get from 1 to 0; whereas *001*00 has defining length 5
and ** 011** has defining length 2.
Crossover is more likely to lose schema with long defining lengths than short ones. With L bytes, there are 1/L-1 places
to do the cross-over. Take the schema *001*00. There is a 1/6th chance you will KEEP 001*00. The only break is
between 1st and 2nd * 001*00. But ** 011** will be preserved in 4 out of the 6 situations where you make the cut.
The implication is that you want to connect things that are linked next to each other. Short defining lengths survive; long
ones don’t. If you profit from endorsement and write your representation as **11** where the middle two bits refer to S
and I. Don’t represent this situation as 1****1. The dependence of GA on the defining length shows that how you
code the problem affects algorithm performance.
Genetic Algorithm and Simulated annealing and related search models ASSUME correlated landscapes. If you find high
scorers in one area, likely to find high scorers near by. If we have a 2 dimensional lattice NS and EW then the trick for
the GA is to find high values in North and high values in West and combine them to get the potential highest values in
North West. There is an implicit cost to search since there are so many points to examine that you cannot check the
whole universe. But you can go back and choose a higher point so you are not locked into “pick the final point”. Given
that you are going to use early random searches to decide which areas/attributes look best, might you do better if you
randomly search to get some reservation value for different areas and then do your combinations? In the simulated
annealing this would be to make the “temperature” depend on early information.
Problems with GA: the Royal Road Clunker – GA works well because it samples spaces – hyperplanes –efficiently
by combining good building blocks. Obtain small groups that work well together and combine them.
The Royal Road Function is a seemingly perfect set-up for the GA. The maximum is 1111 1111 1111 a combination
of three blocks of four units: A = 1111 **** ****
B = **** 1111 ****
C = **** **** 1111
Since GA lives on connecting correlated bundles, analysts expected it to do great on this problem. Mitchell, Forest and
Holland compared GA with 3 types of hill-climbing over this landscape: SAHC steepest ascent -- check all Neighbors
and pick biggest; NAHC -- nearest ascent -- choose first point with increase; RMHC -- random mutation -- random flip
of bit, choose if works best. Here are results: GA beats SAHC and NAHC but loses to RMHC
What goes wrong? GA GETS CAUGHT ON LOCAL MAX because you are not sampling independently in
each region. By contrast RM does a slow (tortoise) search over the entire space. GA focuses search in areas where by
chance you get an early good return.
A way to see what happens is in terms of the fitness function in Mitchell (p 122):
if x=111***
f=2
if x= 0*****
f =1
But when x=100*** or 110*** or 101*** f = 0
The max is 111*** but the algorithm says you should choose 0 on the first element. Because 1 on the first gives
you 2 in 1/4 cases 0 otherwise so average is ½ while you get 1 when start with 0 no matter what.
0***** is a local maximum, since it beats its neighbor 1***.
GA CAUGHT in the local maximum.
But this occurs because you are seeking best algorithm for “final outcome”. If GA is about on-line performance, the
question would be “what is the PV of outcomes from GA vs the PV of outcomes from RM?” With a modest discount
rate, the GA would beat out RM since GA would spend most of its time in good areas while RM covers entire space.
If caught in local optimum, what do you do? Increase the rate of MUTATION to widen search or modify cross over. If
the GA destroys good groups too readily, you want to strengthen its proportionate fitness reproduction: Genitor replaces
the worst performing strategy with the offspring (via cross-over) of better-performing strategies. Another variant is to
use ranks rather than profits in proportionate fitness reproduction. Island models use multiple genetic algorithms
operating separately then allows for migration. Key is to weigh properly exploring the broad space vs “exploitation”.
Example of GA. Economics Model: J Arifovic, 'Genetic Algorithm Learning and the Cobweb Model ', Journal of
Economic Dynamics and Control, vol. 18, Issue 1, (January 1994), 3-28 Firms' quantity production decisions are
initially random but they learn each period and converge to equilibrium for different modes of learning. In social
learning, each firm uses a single string as its quantity production decision, copying best performer last period. It then
compares this string against other firm's strings. In individual learning agents are endowed with a pool of strings, which
are compared against other strings within the agent's population pool ---competing ideas within firm. In either case with
identical cost functions, agents production decisions are identical. However, if the cost functions are not identical, this
will result in a heterogeneous solution, where firms produce different quantities. After agents decide on quantity, the
quantities are aggregated into a demand function to get a price. Each firm's profit is then calculated. Fitness values are
calculated as a function of profits. After the offspring pool is generated, hypothetical fitness values are calculated. These
hypothetical values are based on estimate of the price level, often just by taking the previous price level.
MODELING TAX EVASION WITH GENETIC ALGORITHMS O'Reilly,Warner, Wijesinghe,Marques, Badar,
Rosen and Hemberg (Economics of Governance 16(2):1-14 · November 2014)
ALPHA-GO – neural nets (to be dealt with in some detail in class xxx)
Search algorithm that explores space widely are needed in complex landscape where CONFIGURATIONS matter bcs
complexity grows rapidly with choices and attributes. In Kauffman landscape with N factors and N-1 K-factors (# of
elements affecting each factors contribution to profitability) gives max interactions. With N=10, and each attribute
(measured as 0,1) having its profitability dependent on all the others total profitability depends on 210 configurations. If
each attribute had 4 choices, you'd have 410 --> 410 or 1,048,576 combinations and 510 = 9,765,625 schemata. But these
are small compared to complexity of board games.
AlphaGo combines neural network algorithms and machine-learning techniques to reduce depth and breadth so
the algorithm searches over smaller space.
It reduce depth by a value network that estimates how likely a given board position will lead to a win; and a
policy network that reduces the breadth of the game by learning to chose the best moves for that position. These networks
are “deep learning” because they have many layers, trained by a novel combination of supervised learning from human
expert games and from games of self-play. The policy network generates possible moves that the value network then
judges on their likelihood to vanquish the opponent.
In first stage , a 13-layer policy neural network started was trained on 30 million board positions from 160,000
real-life games taken from a go database. The bottom layer of the network has a 19-by-19 array of neurons that basically
takes a snapshot of the state of the board and uses it as an input. The top layer consists of a similar array that shows all
the possible locations for laying the next stone and the probability for making each of those moves. In between lie 11
more layers. The neural network was able to predict the human move 57 percent of the time by adjusting the connections
in the intervening network, which then subtly encode all the "knowledge" in the data.
They then let the program further teach itself by playing against itself over and over again. Thus, AlphaGo learned from
experience to tell a better move from a poorer one. "The way we've developed the system, it plays more like a human
does," Hassabis says.
In the second stage, the "value network" would evaluate whether black or white holds the advantage by
estimating probability that one side or the other would eventually win the game. To train it, the researchers fed the
network configurations and outcomes from games AlphaGo played with itself. The value network helped AlphaGo play
faster, Silver says. Instead of playing out many scenarios to the very end, as in a Monte Carlo tree search, AlphaGo could
play the game forward a few moves and use the network to estimate the final result. The key is that the program
mimics human learning by training, rather than by being programmed.
Dennis Hassabis, CEO Google DeepMind
“the most significant aspect of all this for us is that AlphaGo isn’t just an“expert” system built with hand-crafted
rules; instead it uses general machine learning techniques to figure out for itself how to win at Go. While games are the
perfect platform for developing and testing AI algorithms quickly and efficiently, ultimately we want to apply these
techniques to important real-world problems. Because the methods we’ve used are general-purpose, our hope is that one
day they could be extended to help us address some of society’s toughest and most pressing problems, from climate
modelling to complex disease analysis. We’re excited to see what we can use this technology to tackle next!
One response to the victory of Go machine: “it’s flawed. It can’t do everything we humans can do. In fact, it can’t even
come close. It can’t carry on a conversation. It can’t play charades. It can’t pass an eighth grade science test. It can’t
account for God’s Touch.” (Lee move in game 4 that won the only human victory in 5 matches)
But “Another machine can carry on a conversation. – cleverbot.com or Google chatbot.
Can play charades. Allen Institute for AI … is working on it https://arxiv.org/pdf/1604.01753.pdf Google, Quickdraw
Can pass 8th grade science test.
Already pulled off “God's Touch' in game 2 of
Use different machines for different purposes just the way we use different body parts or tools.