LEC 6: Generating New Strategies: Genetic Algorithm vs Alpha-Go Deep Learning plus Goals National Prayer Breakfast, Canberra "Up to 500 people attend on 50 tables. There are a wide range of protocols, as well as mix of guests to be considered and each table is carefully arranged with designated seating. For example we aim for two Members of Parliament at each table, from the same party and seated opposite each other but not with their back to the stage, and where possible a male and a female. The top tables are significant with diplomats, guest speakers and head of churches and government. Each table is a mix of denominations and gender and special requests. The Prime Minister and VIPs have additional requirements and maybe a body guard or two. Attendance is often changed at the last minute, this year we didn't know if the Prime Minister would arrive as he was coming from overseas. This meant building in flexibility to fill spaces as it is not nice to have a space on the top table. We arranged a possible stand-in and strategically placed students in positions where they could be juggled if necessary. Add to this the security requirement that the Table Seating list had to be published and printed on the wall. In past years we have done this manually and it has been really difficult. This year we used PerfectTablePlan and it made it so much easier. Thank you PerfectTablePlan for making our function so much easier to organise." The Problem: Given a rugged landscape, what is good search algorithm? You search for a good search algorithm by seeing what works on different types of landscapes. Genetic Algorithm evolves new strategies to find better solutions by mimicking evolutionary processes. It uses cross-over and mutation to create new strategies in a code where elements are viewed as genes on a chromosome. It infers which factors give high values from outcomes and increases the share of the population with those factors. It lives on good solutions being neighbors and thus on correlated landscapes. But GA can fail on “royal road” set up to make it work well when it converges too quickly or if optimum is far from other good solutions. Consider it external system or market solution. Alternative is deep learning via many layered neural nets . It has succeeded in visualization and in Alpha Go beating humans in Go. Multi-layered neural net finds solutions by mimicking human mind, with neurons and lots of connections. Algorithm is presented with many objects many times and some supervised goal – classification in visualization; winning Go game/other goal in another case – and changes its connection weights until it gets good result. Genetic Algorithm The GA EVOLVES a population of strategies into a new population with higher average profitability using trial and error improvements and reproduction/survival of the fittest via Proportional Fitness Reproduction. To do a GA search you: CODE strategies as 0/1 for different attributes. Determine profits of current strategies Create new population using some form of PFR. This is the mating population Use “genetic operators” to create new strategies by : cross-over, take two strings -- 101101 and 001001; split and join to get: 101001 and 001101. random mutation, switch a random 0/1 or randomly change each point by some probability. Inversion pick two points and switch them 10001 becomes 01001 EXAMPLE: In choosing research paper, you decide who to work with, data/theory, modes of analysis, content . You have data on past papers in course and plan your paper on the basis of their strategies and success. Which mix of attributes will make your paper a hit? Data on three attributes of previous papers show: Alone or with friend 1 Friend 2 Friend 3 Alone 4 Friend Emp/Theory Empirical Theory Empirical Empirical Content/Presentation Yes Yes No No Binary Rep Profits 011 3 001 1 110 6 010 2 Why is 110 so profitable? Here are all possible explanations consistent with data It's working alone 1**, where * refers to any value/doesn’t matter. It's empirical *1* It's no endorsement **0 It's working alone and empirical 11* It's working alone and no endorsement 1*0 It's empirical and no endorsement *10 It's alone, empirical, and no endorsement 110 Nothing matters *** These are schema/schemata, which generalize strings in terms of a common property, * for the generalized property: *11 is the schemata for 011 and 111. It’s: anything 11 *1* is the schemata for 110, 010, 011, 110. It’s anything 1 anything. A schemata is a theory of how the world works. Take the theory 1**, then profits rise. If 0s and 1s arrive randomly, 1** will occur in 1/2th of the cases. It won't require too many cases to determine if it is right. A few 0s in first spot and high profits will kill 1** off. A more specific rule would be if 111. This occurs in 1/8th of the cases. It will be harder to see if it is right. Easier to learn from general rules than from specific ones. The # of schemata > # possible data points. Our problem has 8 possible observables ( 23 ) but 33 = 27 schemata, since 3 symbols for each bit in the string instead of 2. The GA gains power because each observed string gives information about many schemata. Observations about “other data points” can help determine why 110 is profitable. If 0 ** has profits the 1 in first space is cannot be necessary for profits. But cannot tell if it is the 0 at the end or the * nothing that matters or this is unique configuration. How many schemata does a single string belong to? 110, most profitable strategy, belongs to 8 schemata listed above A string with 3 bytes contributes to 23 =8 schemata; One with L bytes belongs to 2L schemata, Membership in any of these schemata could explain why it is/is not profitable. Why not pick the most profitable string and stick with it? Because you learn nothing about the landscape. If must decide which strategy today, choose 110. But further search can teach about the other ways of making profit: 111, 100,101, 000. Economics says we should compare costs to benefits of search. The cost of experimenting is the opportunity cost of selecting current best: what we make on others vs what we would make with 110: for 111, we make 7, so the “cost” is 7-6 =1 (it’s a benefit, not a cost) for 100, we make 4, so the cost is 4-6= -2 for 101, we make 5, so the cost is 5-6 = -1 for 000, we make 0, so the cost is 0-6= -6 If we randomly select the possibilities to explore, the cost would average -2. But if we direct our search to more profitable areas, the cost would be smaller. The GA searches in areas where we expect more profitable outcomes. As long as global optimum is near areas of high probablility, GA would direct us to the right areas since it “infers” that good profits come from strings like 110. If the global max was 000, we would go in the wrong direction. In two armed bandit problem there are two slot machines, each has a payoff with a variance v. The optimal solution is not to pick the one that has the highest value after your first “experiment” nor the one with highest value even after a number of tries. Always check once in a while the lower paying one -- that you simply had bad luck on it previously. The optimal allocation gives proportionately more trials to the higher-paying arm, with the proportions dependent on variances. If environment can change best strategy at t' may differ from the one at t so keep exploring. The relative fitness/profits of a string determines the probability that it enters the next generation. Total profits/fitness is12, the worst is 1 and the best is 6. Average profits are 3. A string with a score of 6 would twice as much as chance of being in the next generation as a string with 3: its probability of being selected is 0.5. If we pick four strings, a string with 0.5 fitness would very likely (94%) be selected at least once (prob you will not be selected is 0.54 = .06). A string with .05 fitness would have a probability of NOT being selected at all of .954 = 81% Drawing from the probability urn differentiates this procedure from the computer tournament, where we stock the next generation with the relevant proportions. The probabilistic approach increases the chance that strategy with low return might persist. This adds diversity into the population. Allowing some low return strategy to persist is a common way to try to keep away from local optimum, as in simulated annealing. Four steps 1) Represent strategy with 0s and 1s on a string. Neighbors are points close together and should be so coded when they interact. Holland calls these compact building blocks. The GA works best with such blocks, so code them early (but see discussion at http://en.wikipedia.org/wiki/Genetic_algorithm) 2) Associate profits to the strategies. 3) Make some technical decisions: % of cross-overs; rates of mutation or reversion ; size of population; etc. 4) Stop following some rule Economics rule is to evaluate the return from putting all resources in the current best strategy vs the likely improvement from searching more about the strategy-profit space. FOR SHORT TUTORIAL Some youtube simple videos, in 4-10 minutes. https://www.youtube.com/watch?v=YXMh-iw07w Two good lecs from IIT India https://www.youtube.com/watch?v=Z_8MpZeMdD4 and https://www.youtube.com/watch?v=ra13Sv7XZ3M http://math.hws.edu/xJava/GA/ has a nice java applet window showing the trend upward with lots of variation in a GA search for an optimum. Whitley, Darell is good http://www.cs.colostate.edu/~genitor/MiscPubs/tutorial.pdf The Fundamental Theorem of GA (Wasserman, p 87) says that GA using fitness-proportionate reproduction and cross-over and mutation gives proportionate growth to more fit schemas. If all we had was fitness proportionate reproduction, the best first generation strategy would dominate. In the GA this is captured by f(S,t)/F(P,t) the reproductive part. If ratio is constant, n t+1/nt ~ 1+r, where r is S's value compared to the average so r is the growth for profitable strategies/decline for non profitable ones. Need cross-over/mutation to search more widely and explore more profitable spaces. GA works by partitioning variable space into areas of higher and lower profitability and focusing search in the higher profitability/fitness areas. Left graph partitions space by first element, 0 to left and 1 to right. If strings in 0** partition /// have higher average profit than those in the 1** area search the hatched area. Say 0 in 2nd position has higher average value, denoted by \\\. Take the two and the algorithm says search in double-hatched areas. Now take a third point where XXX denotes profitable areas. Most of your search would be in the two disjoint areas that combine the profitable places for the three “search mes”. The GA divides the space into areas where the three schemata (0**, *0* and **0) say are promising. This is largely unaffected by local optima since the partition is for above average outcomes. (But if the highest value was to the right you would miss it). Like simulated annealing GA sacrifices some value to look for new strategies. The extent to which this damages a profitable strategy depends on the the defining length of the string, d, which is the distance between furthest specific (0 or 1) positions –the nearness of the well-specified parts. The 7 element schema 1**0*** has defining length 3 because you move 3 spaces to get from 1 to 0; whereas *001*00 has defining length 5 and ** 011** has defining length 2. Crossover is more likely to lose schema with long defining lengths than short ones. With L bytes, there are 1/L-1 places to do the cross-over. Take the schema *001*00. There is a 1/6th chance you will KEEP 001*00. The only break is between 1st and 2nd * 001*00. But ** 011** will be preserved in 4 out of the 6 situations where you make the cut. The implication is that you want to connect things that are linked next to each other. Short defining lengths survive; long ones don’t. If you profit from endorsement and write your representation as **11** where the middle two bits refer to S and I. Don’t represent this situation as 1****1. The dependence of GA on the defining length shows that how you code the problem affects algorithm performance. Genetic Algorithm and Simulated annealing and related search models ASSUME correlated landscapes. If you find high scorers in one area, likely to find high scorers near by. If we have a 2 dimensional lattice NS and EW then the trick for the GA is to find high values in North and high values in West and combine them to get the potential highest values in North West. There is an implicit cost to search since there are so many points to examine that you cannot check the whole universe. But you can go back and choose a higher point so you are not locked into “pick the final point”. Given that you are going to use early random searches to decide which areas/attributes look best, might you do better if you randomly search to get some reservation value for different areas and then do your combinations? In the simulated annealing this would be to make the “temperature” depend on early information. Problems with GA: the Royal Road Clunker – GA works well because it samples spaces – hyperplanes –efficiently by combining good building blocks. Obtain small groups that work well together and combine them. The Royal Road Function is a seemingly perfect set-up for the GA. The maximum is 1111 1111 1111 a combination of three blocks of four units: A = 1111 **** **** B = **** 1111 **** C = **** **** 1111 Since GA lives on connecting correlated bundles, analysts expected it to do great on this problem. Mitchell, Forest and Holland compared GA with 3 types of hill-climbing over this landscape: SAHC steepest ascent -- check all Neighbors and pick biggest; NAHC -- nearest ascent -- choose first point with increase; RMHC -- random mutation -- random flip of bit, choose if works best. Here are results: GA beats SAHC and NAHC but loses to RMHC What goes wrong? GA GETS CAUGHT ON LOCAL MAX because you are not sampling independently in each region. By contrast RM does a slow (tortoise) search over the entire space. GA focuses search in areas where by chance you get an early good return. A way to see what happens is in terms of the fitness function in Mitchell (p 122): if x=111*** f=2 if x= 0***** f =1 But when x=100*** or 110*** or 101*** f = 0 The max is 111*** but the algorithm says you should choose 0 on the first element. Because 1 on the first gives you 2 in 1/4 cases 0 otherwise so average is ½ while you get 1 when start with 0 no matter what. 0***** is a local maximum, since it beats its neighbor 1***. GA CAUGHT in the local maximum. But this occurs because you are seeking best algorithm for “final outcome”. If GA is about on-line performance, the question would be “what is the PV of outcomes from GA vs the PV of outcomes from RM?” With a modest discount rate, the GA would beat out RM since GA would spend most of its time in good areas while RM covers entire space. If caught in local optimum, what do you do? Increase the rate of MUTATION to widen search or modify cross over. If the GA destroys good groups too readily, you want to strengthen its proportionate fitness reproduction: Genitor replaces the worst performing strategy with the offspring (via cross-over) of better-performing strategies. Another variant is to use ranks rather than profits in proportionate fitness reproduction. Island models use multiple genetic algorithms operating separately then allows for migration. Key is to weigh properly exploring the broad space vs “exploitation”. Example of GA. Economics Model: J Arifovic, 'Genetic Algorithm Learning and the Cobweb Model ', Journal of Economic Dynamics and Control, vol. 18, Issue 1, (January 1994), 3-28 Firms' quantity production decisions are initially random but they learn each period and converge to equilibrium for different modes of learning. In social learning, each firm uses a single string as its quantity production decision, copying best performer last period. It then compares this string against other firm's strings. In individual learning agents are endowed with a pool of strings, which are compared against other strings within the agent's population pool ---competing ideas within firm. In either case with identical cost functions, agents production decisions are identical. However, if the cost functions are not identical, this will result in a heterogeneous solution, where firms produce different quantities. After agents decide on quantity, the quantities are aggregated into a demand function to get a price. Each firm's profit is then calculated. Fitness values are calculated as a function of profits. After the offspring pool is generated, hypothetical fitness values are calculated. These hypothetical values are based on estimate of the price level, often just by taking the previous price level. MODELING TAX EVASION WITH GENETIC ALGORITHMS O'Reilly,Warner, Wijesinghe,Marques, Badar, Rosen and Hemberg (Economics of Governance 16(2):1-14 · November 2014) ALPHA-GO – neural nets (to be dealt with in some detail in class xxx) Search algorithm that explores space widely are needed in complex landscape where CONFIGURATIONS matter bcs complexity grows rapidly with choices and attributes. In Kauffman landscape with N factors and N-1 K-factors (# of elements affecting each factors contribution to profitability) gives max interactions. With N=10, and each attribute (measured as 0,1) having its profitability dependent on all the others total profitability depends on 210 configurations. If each attribute had 4 choices, you'd have 410 --> 410 or 1,048,576 combinations and 510 = 9,765,625 schemata. But these are small compared to complexity of board games. AlphaGo combines neural network algorithms and machine-learning techniques to reduce depth and breadth so the algorithm searches over smaller space. It reduce depth by a value network that estimates how likely a given board position will lead to a win; and a policy network that reduces the breadth of the game by learning to chose the best moves for that position. These networks are “deep learning” because they have many layers, trained by a novel combination of supervised learning from human expert games and from games of self-play. The policy network generates possible moves that the value network then judges on their likelihood to vanquish the opponent. In first stage , a 13-layer policy neural network started was trained on 30 million board positions from 160,000 real-life games taken from a go database. The bottom layer of the network has a 19-by-19 array of neurons that basically takes a snapshot of the state of the board and uses it as an input. The top layer consists of a similar array that shows all the possible locations for laying the next stone and the probability for making each of those moves. In between lie 11 more layers. The neural network was able to predict the human move 57 percent of the time by adjusting the connections in the intervening network, which then subtly encode all the "knowledge" in the data. They then let the program further teach itself by playing against itself over and over again. Thus, AlphaGo learned from experience to tell a better move from a poorer one. "The way we've developed the system, it plays more like a human does," Hassabis says. In the second stage, the "value network" would evaluate whether black or white holds the advantage by estimating probability that one side or the other would eventually win the game. To train it, the researchers fed the network configurations and outcomes from games AlphaGo played with itself. The value network helped AlphaGo play faster, Silver says. Instead of playing out many scenarios to the very end, as in a Monte Carlo tree search, AlphaGo could play the game forward a few moves and use the network to estimate the final result. The key is that the program mimics human learning by training, rather than by being programmed. Dennis Hassabis, CEO Google DeepMind “the most significant aspect of all this for us is that AlphaGo isn’t just an“expert” system built with hand-crafted rules; instead it uses general machine learning techniques to figure out for itself how to win at Go. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently, ultimately we want to apply these techniques to important real-world problems. Because the methods we’ve used are general-purpose, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis. We’re excited to see what we can use this technology to tackle next! One response to the victory of Go machine: “it’s flawed. It can’t do everything we humans can do. In fact, it can’t even come close. It can’t carry on a conversation. It can’t play charades. It can’t pass an eighth grade science test. It can’t account for God’s Touch.” (Lee move in game 4 that won the only human victory in 5 matches) But “Another machine can carry on a conversation. – cleverbot.com or Google chatbot. Can play charades. Allen Institute for AI … is working on it https://arxiv.org/pdf/1604.01753.pdf Google, Quickdraw Can pass 8th grade science test. Already pulled off “God's Touch' in game 2 of Use different machines for different purposes just the way we use different body parts or tools.
© Copyright 2026 Paperzz