|
|
Evolutionary Computation and Artificial Life (202-1-5171)
Exercise 3
Bord Jonathan Iwanir Elad
| ||||||
|
Question 1: Theory of Genetic Algorithms
a. Given are the following two schemata: S1=*0**11***0** and S2=*****0*1****. Answer me these questions five:
i. Give the order and the defining length of S1 and S2.
Answer: The order of the S1 is 4, because there are 4 defining bits. The order of S2 is 2, for the same reason. The defining length of S1 is 8, the defining length of S2 is 2.
ii. What is the probability for one-point crossover with crossover rate pc that crossover breaks S1? What about S2?
Answer: Let d(S1) be the defining length
of S1. Let l be the length of the population’s genome. According to
the formula studied in class, the probability that a single point crossover
breaks S1 is the probability that the crossover point falls between
two defining bits, hence: For
S2, we’ll have:
iii. What is the probability that a mutation with mutation rate pm breaks S1? What about S2?
Answer: Let o(S1) be the order of S1.
According to what was studied in class, the probability that a simple mutation
breaks S1 is the probability that we mutate one of the defining
bits, hence: For
S2:
iv. What is the probability that S1 survives the application both of crossover and mutation? What about S2?
Answer:
The probability S1
survives cross-over: Similarly
for S2:
v. Is it correct to call any of these two schemata a "building block"? Explain why, or why not.
Answer: according to Goldberg, building blocks are “short, low order, highly fit schemata”. While much debate goes on around what “highly fit schemata” means[1], what is pretty clear in this question is that we have no knowledge of the fitness of the given schemata. Hence, we are in no position to decide whether those are building blocks or not. In terms of order and defining length, only S2 can be considered to be a building block in our minds, as S1, while being of moderate order, is very long, relative to the genome length. In conclusion: If we had to pick yes or no, we are going to say that we can’t call either of the two schemata building blocks, because we have no information about their fitness.
b. While optimizing a 3-bit problem, you notice that your population, of size 100, consists of 25 copies each of strings 100 (with fitness 10), 111 (with fitness 20), 011 (with fitness 15), and 010 (with fitness 15).
i. What is the estimated fitness of schema 1**?
Answer: The estimated fitness of schema 1** is
ii. Assuming fitness-proportionate selection with no crossover or mutation, show one way by which you could calculate the estimated fitness of that schema in the next generation.
Answer: Observation: Without mutation or crossover, the only thing that can change in the population is the number of copies of each genome. We have 4 genomes, 2 of which belong to “our” schema. Meaning that for all generations, our schema can only be represented by these two genomes (i.e. no other genomes of that schema can appear in the genetic pool.)
We’ll treat each of the two genomes in our schema, S1=100 and S2=111, as schemata, and we’ll calculate their estimated number in the next generation using the formula from the Eiben & Smith lecture: Chapter 11: Theory, slide 7. Let
m(S1, t)=25 be the number of instances of S1 at generation t, and similarly
m(S2, t)=25. The average fitness of the entire population is m(S2,
t+1)= Hence,
the estimated fitness of the schema 1** at t+1 is
c. A non-binary genome of length l, over an alphabet of size p (p>2), is an instance of how many different schemata? Why?
Answer: First, let us define two concepts: the genome alphabet, and the schema alphabet. The genome alphabet is all the symbols a gene can receive in our representation and the schema alphabet are all the letters which are needed to represent schemata for our representation. To represent a schema, we must use symbols that represent more than one possible “gene”. For example, in the case of a binary string genome, the genome alphabet is {0,1}, but to represent a schema we needed 3 characters: 1, 0, and *, each character representing a non-empty subset of the genome alphabet. However, if our genomes are represented using an alphabet with more than 2 letters, then the number of non-empty subsets grows exponentially, in fact, if a genome alphabet has P letters, then the schema alphabet has 2p-1 symbols (one for each of the subsets, minus the empty set). For instance, for a genome represented on an alphabet of 4 characters {a, b, c, d}, we need 24-1 symbols to represent a schema of this genome:
Symbol: Representing: a a b b c c d d ab a or b ac a or c ad a or d bc b or c bd b or d cd c or d abc a or b or c acd a or c or d abd a or b or d bcd b or c or d abcd either one of {a,b,c,d}
For example, a 5 gene genome <a,d,c,a,d> is a member of the following schemata: <a, abcd, cd, ad, ad>, <a, d, c, a, d>, <abcd, abcd, abcd, abcd, abcd>… etc.
Hence,
for a given genome I, in a GA where each genome is represented by a string with
the length of l, with p>2 characters each gene in I can be a represented by
2(p-1) symbols in the schemata alphabet (all the subsets of the
genome alphabet that contain that gene), meaning there are
However, after speaking with Prof. Sipper, apparently we need only one character in the schemata alphabet to represent “all the characters” (and not a character for each non-empty subset), hence to represent a schema for an alphabet of p characters we need p+1 characters in the schema alphabet: all the letters in p plus the joker character. All in all, each gene in a genome of length l can be represented by 2 characters: the original character, or the * character, hence, a he’s member of 2l schemata.
References: [1] S. N. Sivanandam, S. N. Deepa: Introduction to Genetic Algorithms, chapter 3.12.1 – Building Block Hypothesis | ||||||
|
Question 2: Symbolic Regression.
In this question we were supplied with 3 functions, represented by 201 points each, and we were supposed to find the original function these points represent.
Representation:
In this assignment, we represented an individual as a lisp expression in a form of a tree.
Terminal set: {1,2,3,4,5,X} Function set: {DIV, MULT, ADD, SUB, ABS, ACOS, ASIN, ATAN, COS, EXP, LOG, MAX, MIN, POW, SIN, SQRT, TAN, ADF, CEIL, ROUND, REM, FLOOR} Mutation: 1. Point Mutation: goes over all the nodes in the tree, and changes each node with probability Ppm. 2. Chop Mutation: With probability Pcm, removes the tree’s head and replaces it with one of its children. 3. End Mutation: With probability Pem, changes an internal node into a terminal from the terminal set. 4. Ordinary mutation: Picks an internal node and with probability Pm regenerates one of its children randomly. Recombination: Crossover: with probability Pc, randomly selects a node from tree A and replaces a node in tree B with it. Population initialization: 400 randomly generated trees, with height no bigger than 2. Fitness (we presented 3 different methods): 1. Hits: fitness is equal to the number of hits (points that have been matched exactly) ranges from 0-201. 2. Distance: fitness=1/distance, where distance is the sum of distances between what we should get and what we got for each X in our fitness cases. 3. Hits*Distance: a combination of method 1 and 2, we multiply the number of hits with the 1/distance. Termination: The run terminates after 1000 generations or after discovering a function with more than 195 hits.
Selection: In this question we used elitism, as well as roulette wheel selection. We also implemented and used tournament selection however we did not see any difference in performance between the two.
Bloat Control: Bloat was a big problem for us in this problem, and we attempted several methods to control bloat:
1. Dynamic size restriction: As outlined in [1], we removed any individual whose size (number of nodes in tree) was larger than C*(Size of best individual). Where C is a constant, usually in the range [1, 1.5]. This method proved very useful in eliminating bloat problems, and made it possible for us to run the algorithm for more generations and larger populations. Although we fear that this operation reduced overall diversity. Individuals who were deemed too large we either removed by having their fitness reduced to 0, or replaced by a randomly generated individual, both of these approaches produced similar results, but the latter resulted in worse performance. 2. Hill Climbing: As outlined in [1]. Using size altering operators only if their outcome results in a smaller individual, or an individual with higher fitness. We tried this approach, however it caused many of our runs to converge early to sub-optimal solutions, and contributed little to the speed of the algorithm if used in conjunction with the first operator. These is close to the results in [1], so we weren’t surprised. 3. Parsimony pressure: We also attempted to include a size parameter in the fitness function so that large individuals have a lesser chance of being selected. This method too caused early convergence, and was proven extremely ineffective. 4. Size reducing operators: Such as chop mutation and end mutation as presented in the representation section. These operators, while effective at destroying large individuals, are also likely to reduce diversity by damaging good solutions and removing them from the gene-pool. For that reason, their use was extremely limited. 5. Hard limiting sizes: in contrast to the first approach, here we limit the size of each individual to a fixed number. This method failed to produce good results when the limit was too low, and failed to help the algorithm perform faster when the limit was too high.
Results (before hint):
All results produced with the following run parameters: Pm=0.1 Pc=0.9 Pcm=0.1 Ppm=0.03 Pem=0.1 C=1.1
Fitness as hits:
Figure 2.1: f1.
Best Individual: Fitness: 4 Distance: 806610.5528
Expression: (MULT(MULT(X)(X))(MULT(TWO)(MAX(ADD(MULT(X)(ADD(X)(TWO)))(TWO))(DIV(ONE)(MULT(FIVE)(X)))))
Figure 2.2: f2.
Best individual: Fitness: 194 Distance: 1.111111111127272E-4
Expression: (MULT(MIN(MULT(TWO)(MAX(ASIN(ASIN(X)))(X)))(MIN(X)(TWO)))(MAX(X)(ACOS(MAX(SIN(FIVE))(THREE)))))
Figure 2.3: f3
Best individual: Fitness: 109 Distance: 6.555866960411549E-13
Expression:
(ABS(SUB(SUB(MULT(X)(X))(X))(X)))
Discussion: As we can see, we have extremely mixed results. Please note these are the best runs out of 20 for each graph, so for instance for f1, we have that our very best run we only found 4 hits out of 201, and the sum of the distances is more than 80,000! That’s incredibly bad performance. However, for f2 we see that we found 194 hits our of 201, and our distance from the fitness cases is a mere 0.000111111111112727. As for f3, we have found a function that covers a bit more than half of all fitness cases, with 109 hits. On the surface, it’s no comparison to f2 where we found 194 of all cases, however, looking at the distance we see another story: our function’s distance from the fitness cases is 6.555866960411549X1013, which for most practical applications is a 100% hit.
We can also note that in figure 2.3, the median fitness suffered a sharp drop around generation 325, from which it never revived. We’re not sure how to interpret this, however we think this is a very bad indicator for the variability of the population, indicating that it’s mostly comprised of low fitness individuals.
Fitness as 1/distance:
Figure 2.4: f1. Please note it’s a logarithmic scale.
Best individual: Fitness: 16.1845882292796 Distance: 0.061787175912878405 Hits: 41 Expression: (ADD(X)(ADD(ADD(MULT(POW(X)(FOUR))(FOUR))(MULT(MULT(X)(X))(MIN(FLOOR(TWO))(ABS(ADD(ADD(SQRT(ONE))(COS(DIV(ONE)(ADD(CEIL(FLOOR (ADD(FLOOR(ABS(FIVE)))(DIV(TWO)(TWO)))))(TWO)))))(ABS(MULT(MIN(ADD(ADD(MULT(POW(X)(FOUR))(FOUR))(MULT(MULT(X)(X))(MIN(FLOOR(TWO)) (ABS(ADD(ADD(SQRT(ONE))(COS(DIV(X)(ASIN(ASIN(FIVE))))))(ABS(MULT(MIN(FOUR)(X))(DIV(ABS(X))(MULT(ADD(CEIL(FLOOR(X)))(TWO))(THREE))) )))))))(MULT(MULT(MAX(MULT(X)(X))(SUB(SIN(X))(THREE)))(X))(THREE)))(X))(DIV(ABS(ABS(X)))(MULT(ADD(CEIL(FLOOR(FIVE)))(FIVE))(THREE) )))))))))(MULT(MULT(MAX(MULT(X)(X))(SUB(SIN(MAX(THREE)(FOUR)))(FIVE)))(X))(THREE))))
Figure 2.5: f2. Please note it’s a logarithmic scale. There are only ~340 generations because a solution was found.
Fitness: 9002.385 Distance: 1.11E-04 Hits: 196 Expression:
(MULT(X)(MIN(MAX(ACOS(MULT(X)(DIV(MAX(MAX(SIN(LOG(X)))(ONE))(MAX(ATAN(X))(DIV(COS(SIN(X)))(ABS(X)))))(MAX(ASIN(FOUR))(MULT(X)(ONE) )))))(MAX(ACOS(THREE))(POW(POW(X)(X))(ONE))))(MIN(MAX(MAX(X)(DIV(MAX(DIV(ONE)(ABS(X)))(DIV(ONE)(ABS(DIV(MAX(DIV(ONE)(ABS(X)))(LOG (SQRT(THREE))))(ABS(X))))))(ABS(X))))(ONE))(TWO))))
Figure 2.6: f3. Please note it’s a logarithmic scale.
Fitness: 1.8689074083911177E12 Distance: 5.350719867180942E-13 Hits: 97 Expression: (SUB(ADD(SUB(ADD(ABS(ABS(SUB(POW(SUB(X)(ASIN(TWO)))(TWO))(TAN(ATAN(ONE))))))(FOUR))(POW(ONE)(ADD(SUB(ADD(ABS(ASIN(SQRT(TWO))))(ONE) )(COS(TWO)))(FIVE))))(TWO))(FIVE))
Discussion: As we can see, we had much better results with this method of generating fitness across the board. f1 improved from 4 hits to 41, and the distance was ~0.06, which is good performance. Also we found a match for f1 within 350 generations, which is also not bad performance. As for f3, we start to notice a disturbing pattern, which is evident from both the graph and the end result: Each run converges on a particular set of functions that has a distance of around 10-13 and roughly 100 hits, which seems to smother all other individuals, as can be seen from the median fitness graph, which turns flat the moment the “super function” emerges.
Fitness as hits*1/distance:
Figure 2.7: f1.
Best individual: Distance: 49.08399999999398 Hits: 21 Fitness: 0.4482112297286835 Expression:
(ROUND(MULT(MULT(MULT(ADD(X)(ATAN(SQRT(SIN(ATAN(SQRT(THREE)))))))(X))(X))(MULT(ADD(X)(TAN(DIV(TWO)(MULT(FOUR)(ADD(X)(POW(DIV(SIN(SQRT (ROUND(FOUR))))(POW(THREE)(TWO)))(ATAN(MAX(SIN(ATAN(SQRT(ATAN(MULT(ADD(X)(TAN(DIV(TWO)(MULT(FOUR)(ADD(X)(FIVE))))))(ABS(FOUR))))))) (COS(POW(DIV(FIVE)(X))(MAX(TWO)(ADD(CEIL(SIN(FIVE)))(THREE)))))))))))))(ABS(FOUR)))))
Figure 2.8: f2.
Best Individual:
Distance: 5.186073792629031E-12 Hits: 195 fitness: 3.7793523161697945E13 Expression: (MAX(ADD(X)(MIN(SUB(POW(X)(TWO))(X))(X)))(MIN(DIV(POW(ADD(FLOOR(FIVE))(COS(POW(ADD(FIVE)(COS(MULT(MAX(FIVE)(X))(MULT(TWO)(TWO)))))(ABS (FOUR)))))(MULT(SUB(POW(ADD(FIVE)(MIN(FOUR)(SQRT(CEIL(FOUR)))))(MULT(SUB(ONE)(POW(ADD(X)(MIN(CEIL(MAX(THREE)(FIVE)))(MIN(SUB(ROUND(ONE) )(X))(CEIL(FOUR)))))(POW(ADD(X)(MIN(X)(FOUR)))(FIVE))))(POW(FIVE)(FIVE))))(POW(ADD(X)(MIN(FOUR)(SUB(ONE)(X))))(POW(ADD(X)(MIN(SIN (ACOS(FOUR)))(TAN(LOG(SUB(CEIL(FOUR))(X))))))(FIVE))))(POW(FOUR)(FIVE))))(X))(X)))
Figure 2.9: f3.
Best Individual: Distance: 4.959921362512887E-13 Hits: 108 Fitness: 2.197615486887002E14
Expression: (ABS(ABS(MULT(X)(SUB(SUB(THREE)(X))(ONE)))))
Discussion: As we can see, introducing a mixed formula of hits and distance did not achieve results that are any better than just using distance, though these results are still better than using hits alone as a criterion. We can also note that f3 continues the behavior we witnessed in its other graphs, and continues to exhibit an invariably low median line throughout the run.
Results (with hint):
Followed are the results after the hint was given. All runs were run with the 1/distance fitness calculation.
Figure 2.10: f1.
Best individual: Fitness: 3.45206989E9 Distance: 2.90E-10 Hits: 66 Expression: (ADD(MULT(FOUR)(POW(X)(FOUR)))(ADD(ADD(ADD(POW(X)(TWO))(POW(X)(TWO)))(POW(X)(ROUND(ONE))))(ADD(DIV(FIVE)(ROUND(MULT(MULT(ONE)(FOUR))(SUB (MULT(MULT(TWO)(POW(SUB(FIVE)(POW(THREE)(THREE)))(MULT(ADD(MULT(FIVE)(FIVE))(X))(FOUR))))(DIV(LOG(FIVE))(ROUND(ADD(EXP(FIVE))(FIVE))))) (POW(FIVE)(FIVE))))))(MULT(POW(X)(THREE))(THREE)))))
Figure 2.11: f2.
Best individual: Distance: 0 Fitness: infinity Expression: (MAX(MIN(X)(MIN(X)(DIV(SUB(X)(SUB(X)(MIN(ONE)(FOUR))))(X))))(ADD(X)(MIN(X)(MULT(SUB(X)(MIN(SQRT(SUB(X)(SUB(ADD(X)(THREE))(MAX(FOUR)(TWO) ))))(ONE)))(X)))))
Figure 2.12: f3.
Best individual:
Fitness: 2.28E+12 Distance: 4.38E-13 Hits: 112 Expression: (ABS(SUB(ADD(FIVE)(SUB(ADD(CEIL(FIVE))(MULT(SUB(TWO)(SUB(FIVE)(X)))(SUB(ADD(TWO)(X))(TWO))))(SUB(FIVE)(X))))(FIVE)))
Discussion and conclusions: Here we see an improvement mainly in f1. Both f2 and f3 performed similarly to what they did without the hint. What bugged us the most is that even with the hint we were unable to develop a function that has 100% hits in f1 and f3. However, with f3’s distance being 4.38X10-13 and f1’s distance being 2.90X10-10, we can safely say that for all practical purposes, these are indeed the original functions.
Bonus: ADFs.
For this part we introduced a single ADF into our implementation. Our ADF has a single child node (to simulate a single variable function), and our ADF is evaluated by first evaluating the child node, and then evaluating the ADF with the value of the child node and input. The ADF can have any node from the function/terminal set, except for the “ADF” function, to prevent looping.
Number of ADFs: 1. ADF Crossover: 1. Two individuals swap their ADFs with probability Pas. 2. Two ADFs from two individuals can crossover like individuals can with probability Pac. ADF Mutation: a random node is selected in the ADF and regenerated randomly (along with its children sub-nodes) with probability Pam. Structural Mutation: None.
Bloat: To limit bloat in ADFs, we limited their size to 20 nodes.
Run parameters: As above, in addition: Pac=0.6 Pas=0.1 Pam=0.2
Results:
Figure 2.13: f1
Best individual:
Hits: 41 Fitness: 15.625019 Distance: 0.063999922 Expression:
(ADD(MULT(THREE)(POW(X)(THREE)))(ADD(ADD(MULT(X)(X))(ADD(X)(MULT(X)(X))))(MULT(FOUR)(POW(X)(FOUR)))))
ADF: (EXP(FIVE))
41 15.625019
Figure 2.14: f2.
Best individual: Hits: 197 Fitness: 9000 Distance 1/9000 Expression: (MAX(MIN(DIV(ONE)(X))(X))(ADD(MIN(MULT(DIV(ONE)(FIVE))(MULT(X)(MULT(SUB(X)(DIV(X)(X)))(FIVE))))(X))(X)))
ADF: (TAN(ABS(TWO)))
Figure 2.15: f3.
Best individual: Hits: 111 Distance: 4.81E-13 Fitness: 2.08090549E12 Expression: (ABS(MULT(SUB(ADD(X)(TWO))(TWO))(SUB(SUB(FOUR)(X))(TWO))))
ADF: (MULT(LOG(LOG(MULT(LOG(FOUR))(ABS(POW(ONE)(X))))))(ABS(X)))
Discussion and conclusions: As we can see from the final emergent best individuals, none of them use ADFs in their trees. This is not a bug (trust us, we checked), many individuals did use ADFs in the population, however, the ones who emerged as the best did not have ADFs in them, which tells us that ADFs are not a good addition to the implementation. Also, the overall performance of the GP algorithm was slower (because of the added overhead of keeping ADFs) and provided worse or comparable results to the results we achieved without them. The reason for this is because we think the functions we were given do not use complex expressions that appear many times in the formula, which is something an ADF could help with, as it’s a mean to copy/paste fixed “code” in various places. If the original function which created the fitness cases does not use “reusable code” itself, there is little chance that an ADF would be of any help, on the contrary. Keeping an ADF as a possible node actually increases the size of the search-space, making it harder to come up with a solution.
Final thoughts and conclusions: Although we did not find a perfect fit for the given functions, we believe that for each of the functions, we found an adequate approximation. What’s noteworthy is that with the exception of f2, we managed to achieve roughly the same results with and without the hints, which surprised us, as we expected the hint to improve our performance greatly by reducing the search space of the algorithm.
References: For this question we reviewed a few academic sources, some of which we used, and some we filed under “good to know”. Mainly: [1] P.Monsieurs, E.Flerackers: Reducing bloat in genetic programming. [2] H. Xie: Diversity control in GP with ADF for Regression Tasks [3] A.Ekart, S.Z.Nemeth: Maintaining the Diversity of Genetic Programs | ||||||
|
Question 3: The Robotic problem Description of the problem:
In the robotic problem we are required to "program" the behavior of a "robot". The robot walks on a board and collects the cans that were left on the board. A "board" is a square of N*N, where each cell can be empty, contain a wall or contain a can. Our robot has limited moves allowed per board, so clearly we want our robot to collect as many cans he can, to do so we want the GP to find a good "program" for our robot.
Finally we will discuss the differences between the GP version to the GA version.
Representation: Every genome in our population was represented as a tree as normally is done with Genetic Programming, however, we opted to use a game tree. Each Genome holds one Tree object and every Tree object holds a root which is a Node object.
Keeping in mind the robot question from Exercise 2 we began thinking about a representation which we will be able to use in the same manner as the representation in Ex 2, that is to say, we wanted to "tell" the genome what is our current situation and it will return an answer on what to do with the robot.
Terminal Set: {'Move North', 'Move South', 'Move East', 'Move West', 'Pick Can', 'Stay Put' and 'Make a random Move'}
Explanation: In Ex-2 we had 7 possible moves: 'Move North', 'Move South', 'Move East', 'Move West', 'Pick Can', 'Stay Put' and 'Make a random Move', naturally we set those 7 possible movements to be our terminal set in the representation (actual values: [0,1,2,3,4,5,6] respectively).
Function Set: {Check if X==Y, Check Last Move} Where X is one of 5 possible values: North, South, East, West or Current, and Y is either: Can, Empty or Wall.
Explanation: When evaluating the function node we check if in the neighbor cell from the North/South/East/West/Current (depends on the direction value we set on creation) is the value (also, set on creation), if so, we return the evaluation of the right child, otherwise, the left child. Evaluating our tree is made top to bottom. We start with the root node, if it’s a Terminal we return it, otherwise we evaluate the function, if true we return the Terminal from the right sub tree, if false from the left. This process is done via recursion until we reach a terminal, which is the action that the robot will take.
While this function set produced nice results we decided to improve on them and introduced another function: 'Check Last Move', this function gets either one of the following values when constructed: 'Move North' ,'Move South' ,'Move East' ,'Move West' ,'Stay Put' ,'Pick Can' Or 'Make a random Move', and checks if that value was the last move the robot made, if so the right child was evaluated, otherwise the left child was.
This function allows our robot to make smarter decisions based on the decision it preformed last time. For instance: if we wanted our robot to go left until it reaches a wall, and then move one down, and then go right until it reaches the opposite wall, we could not represent this strategy without a function that asks what the previous move was.
Fitness: For every robot and every board, we evaluate its tree 200 times, each time we send the evaluation method the following parameters: what do the cells around him contain, what the cell where the robot stands contains and what was his last move, we then perform the action specified by the game tree, and evaluate the tree again to get the next action. We ran each robot on 100 boards each generation, and calculated the fitness as follows.
We tried several approaches to calculate fitness: 1. First we tried a similar method to what we did in the last exercise: we gave reward for each can picked and a penalty for each 'bad' move such as slamming into the walls or picking non existing cans. The results were good, in the first generations (about 200) the fitness rose steadily, however then we noticed our runs converge on a sub-optimal solution and the algorithm gets stuck. 2. In the second attempt to define fitness, we decided that we don't really care if the robot keeps slamming into the walls, as long as he picks up as much cans as he can, we've calculated for each robot how many cans he picked (from all the boards) and divided that number with the total cans (from all the boards), the result is a number, denote by 'p', in the range [0,1] which is the percent of cans that were picked. Thus,
the fitness was calculated as follows: This calculation was designed to allow us to give much better fitness to values which get close to 1.0, and helps us a little with the exploitation of good results.
Board Generation: As with the previous robotic problem, we decided to try to approaches at generating boards. 1. Using fixed boards throughout the run of the GP. 2. Using randomly generated boards each generation.
Crossover: We've defined the crossover operator to select a random node in each of the two parents genomes and swap them with each other. This crossover was discussed in class and was easy to implement under the design of the genome we chose.
Mutation: 1. Branch Mutation: this mutation chooses a node from the genome and changes it with probability 0.25 to "Function", with probability 0.25 to "check last move" and with probability 0.5 to Terminal. If a terminal node is changed to a non-terminal node, we generate random trees as its children.
2. Chop-Mutation: This mutation also picked a random node from the tree and chop all the sub tree below (putting a Terminal instead of that Node).
3. Pick mutation: just as chop mutation, but instead of replacing a random node with a random terminal, it places the “Pick a Can” terminal. This mutation was introduced to compel the individual to pick up more cans.
4. Greedy-Mutation: this mutation created a backup of the genome, made a random change in it and checked if the fitness got better or not. If it did we saved the change, otherwise we restored the backup.
Mutation parameters: To get the most benefit from each mutation, we set the probability of each mutation and crossover to be self adapting. Meaning, we set for start the branch mutation, chop-mutation and pick mutation to be some probability while the greedy mutation got zero probability. Then when we've reached to the point where the best individual picks up 60% of the cans, we reduced the 3 mutations to zero probability and gave more probability to the Greedy-Mutation. In this manner we've managed to push our success rate higher (as shown bellow).
Population: 100.
Termination: After 3500 generations have passed.
Plots: Run parameters are specified in the analysis section.
The following graph (Figure 1.1) shows the percent of picked cans vs generations.
Figure 3.1: percent of picked cans, best 3 individuals out of 10, static boards set.
Figure 3.2: Fitness vs. Generations, best 3 individuals out of 10, static boards set.
We've created a mechanism that allows us to control whether the boards set will change for every generation or not. Then we've ran the experiment 10 more times with changing boards set. (Figures 2.1, 2.2)
Figure 3.3: percent of picked cans, best 3 individuals out of 10, allowing the boards set to change for each generation.
Figure 3.4: Fitness vs. Generations, best 3 individuals out of 10, allowing the boards set to change for each generation.
Analysis: Our first plots (Figures 1.1, 1.2) show the results when we performed the 10 experiments (chose best 3) where all the robots were evolving on a specific boards set. The fact that GP tends to bloat genomes made the job a little more complicated, we got nice results very fast but it seemed almost impossible to reach a "perfect" result, that is a robot which picks every can in the given boards set. The fact that our population suffered from bloating made the fitness evaluation very difficult plus each mutation action barely helped (in the exploitation phase), then we've introduced the 3 extra mutations operators, to help fight the bloating phenomena and to produce more exploitation. The process included a lot of trial and error. Our observation was that when getting above ~60% picked cans the fitness growth got slower and slower while the genome got bigger(Aka BLOATING). A switch was created, when reaching 60% the probabilities of the crossover and mutations operators changed. Below 60% the crossover operator probability was 0.6 , the regular mutation probability was 0.1, chop-mutation probability was 0.1 and Chop-Mutation-Pick probability was 0.1 (the probability for an individual not to change was and stayed 0.1). When passing the 60% we reduced the crossover probability to 0.4, the regular mutation probability to 0.0, chop-mutation probability to 0.0 , Chop-Mutation-Pick probability to 0.0 but set a high probability for the greedy-mutation : 0.5. Finally with this fine tuning we got satisfying results. Then when we had a working GP with good results we tried making the boards set change after every generation. From our conclusions in the GA version we've expected better results, but that wasn't the case, the results were nice but less than those of our original configuration.
Comparison of the GA version vs. the GP version: A link for the GA version can be found here!.
Static boards set: When we look at the fitness vs. generations graphs we can see that in the GP we reach almost the best fitness at ~700 generations while in the GA it is done by ~1000. It seems like the GP has a slight advantage over the GA.
Changing boards set: When we look again at the fitness vs. generations graphs we can see again that in the GP version the max fitness was achieved much faster (~400 generations) while in the GA it remained ~1000 generations. By those two differences we might say that the GP is much better, but we need to keep several things in mind:
1. The complexity of the GP is greater than that of the GA, computation, time-wise. 2. The design of the GP version was a lot more complex, needed to create several mutations to fight bloating and it was very hard to get an excellent individual.
3. In the GA we found a robot which picked up nearly 100% of the cans in any board set presented to him, while in the GP we could only find a robot which picks roughly 90% of the cans (figure 3.3).
4. If we were to choose from the two representations according to the readability of the two best robots from both versions clearly we'll chose the genome from the GA version which is a simple table of 243 rows over the GP version which is a huge, unreadable tree.
We've noticed another interesting difference between the two versions: In the GP the static boards set got better results than the changing boards set. While in the GA it was exactly the opposite.
Our Conclusions: One can find a good programming for a robot, using each of the versions. We prefer the GA version, its better in almost every aspect (for this problem).
Possible future improvements: · We thought about creating a new mutation, the new mutation will scan a tree top to bottom. And will eliminate nodes that are in the inactive part of the tree (introns). A small example: let the root node (denote by Nr) be the function "if the cell from the north contains a Can" ('Func' Object…) and let the left child node (denote by Nl) be the also a function node with exactly the same parameters. Then we can safely eliminate the right child node of Nl (because if in this evaluation of the tree one function was once false it cannot be true a moment later..) so the right child node of Nl is considered to be in the inactive part. As we believe this mutation will demand a lot of computation time it will probably be worth it in terms of shorter genomes.
· Introducing dynamic tree size restrictions as presented in the second question.
Best Robot:
Link for our best robot genome
| ||||||
|
Question 4: Tic Tac Toe Problem description: In this question we are asked to provide a GP program that evolves a Tic Tac Toe (henceforth – TTT) playing strategy that wins. TTT is a game that is played on a 3X3 board, by two players, each playing O or X. Each player in his turn scribes X or O in one of the empty cells of the board. The first player to reach a row, column or diagonal of X’s or O’s, wins the game.
Representation: We represent the playing board as an array of 9 cells, each cell represents a cell on the board, like so:
Figure 1.2: each cell number and what it represents.
We denote the cell C with [C], meaning Cell numbered 7 is [7]. All cells are initially 0, and each player, 1 or 2, makes a move by placing 1 or 2 in a cell that has 0. A game is over once all cells are non-zero, or a winner has been detected. We denote a play by player n on cell k as [k]=n.
Figure 4.2: A board after player one and two each made a single move.
For the board in figure 4.2: [2]==1, [3]==2, [8]==0, and so on…
We represent each individual as a game tree, comprised of internal nodes (conditional “if”) with two children, and terminals (an action to take) with no children.
Example individual:
(if [2]=0) ((if [8]=1) ((if [0]=0) (<Play cell: 2>) (<Play cell: 2>)) ((if [1]=0) ((if [1]=1) (<Play cell: 5>) (<Play cell: 4>)) (<Play cell: 1>))) ((if [0]=0) ((if [1]=2) (<Play cell: 4>) ((if [1]=1) (<Play cell: 4>) (<Play cell: 7>))) ((if [4]=2) (<Play cell: 0>) (<Play cell: 1>)))
Mutation: In this representation we defined mutation as choosing a random internal node (cond), and changing one of its children as a randomly generated tree with the height of 3.
Crossover: When we perform crossover between trees A and B, we randomly pick a subtree in A, and insert it as a child in a randomly selected internal node in B.
Other mutations: Our variations of point mutations that only change a single node/leaf:
Terminal Point Mutation: We select a random terminal in the tree and change its value to a random value.
Conditional Point Mutation: We select a conditional node and change its condition to a random condition.
With this question, we tried 3 different approaches, hoping to get different results.
First GP: Relative evaluation.
In this approach, we tried to build a GP that measures the fitness of an individual relative to the other individuals in the generation pool.
1. Terminal Set: T={‘play [0]’, ‘play [1]’,…,‘play [8]’} Every terminal node will have an action of the type “Make a move by marking cell X”. 2.
Function
Set: F = 3. Fitness: Each tree plays two games of TTT against every other tree. In one it plays first, and in the second it plays second. For each victory the fitness is raised by one, for each tie, the fitness is raised by 0.5. 4. Parameters: Population size=100, Pc=0.4, Pm=0.1. Generations: 1000. 5. Termination: When 1000 generations have passed.
We reviewed two distinct cases:
1. In the first Case, once a tree finishes and evaluation, and the move it evaluates to is illegal, the player plays the first available tile. 2. Once a tree finishes and evaluation, and the move it evaluates to is illegal, the player plays a random tile from all available plays.
The difference between these two approaches is that one of them provides a deterministic evaluation for a given strategy, and the other can provide different results, even when all other factors remain the same.
Figure 4.3: Generations versus fitness, Random.
Figure 4.4: Generations versus fitness, Deterministic.
Example of a best individual for random method: Fitness: 153.5 Wins when first:78 Wins when second:62 Ties when first: 13 Ties when second: 14
Discussion of the first approach: As we can clearly see from the charts, the deterministic variant provides better fitness, hinting perhaps that tactics have a difficulty evolving if they are inherently random. Maybe that’s because a random tactic can provide, at times, good fitness, which causes the randomness to survive from generation to generation, and propagate the gene pool. Out of 200 games, the best solution of each generation manages to score roughly 185 fitness points against its brethren, which is close to optimum (200 fitness). Is it a sign that it’s a good individual? Or are the rest simply very bad? We felt that we can achieve more relevant results with other approaches.
Second GP: Absolute evaluation.
In this approach we decided to generate 200 random tactics, and test our evolving tactics against them for the entire run of the program. This method allowed us to measure fitness on an absolute scale, and enable us to pick a best tactic.
1. Terminal Set: T={‘play [0]’, ‘play [1]’,…,‘play [8]’} Every terminal node will have an action of the type “Make a move by marking cell X”. 2.
Function
Set: F = 3. Fitness: Each tree plays two games of TTT against a set of 200 fixed trees. In one it plays first, and in the second it plays second. For each victory the fitness is raised by one, for each tie, the fitness is raised by 0.5. 4. Parameters: Population size=100, Pc=0.4, Pm=0.1. Generations: 1000. 5. Termination: When 1000 generations have passed.
Figure 4.4: Generations versus fitness, Deterministic.
Best individual: Fitness: 192.5 Wins when first: 100 Wins when second: 91 Ties when first: 0 Ties when second: 3
Figure 4.5: Generations versus fitness, Random.
Best individual: Fitness: 170.0 Wins when starting first: 94 Wins when starting second: 71 Ties when first: 4 Ties when second: 6
Discussion and Conclusions: Once again we see that the random method brings worse fitness than the deterministic method, and converges on a local minimum as soon as generation 150. Also not surprising is that the fact that the fitness we achieved with this approach is higher than the one we achieved with the former approach, since the population has much time to adapt to the fixed tactics.
However, will our evolved best individual stay good when presented a player that he hasn’t seen?
In our third attempt we tried to develop an individual that is more likely to be good against any tactic, and not just a tactic that it knows.
Third GP: Co-Evolutionary Approach (Yes, we know the first approach was also technically co-evolutionary)
In this representation, we decided that maybe the tactics of playing TTT when you start first and when you start second are different, requiring two different tactics for each case. Hence, we split the population in two groups of 100 tactics: A and B. Each time we evaluate fitness for a member in group A, we have him play a game against each of the members of group B where he starts first. When we want to measure fitness of someone who is from group B, we match him against all the members in group A, where he starts second.
1. Terminal Set: T={‘play [0]’, ‘play [1]’,…,‘play [8]’} Every terminal node will have an action of the type “Make a move by marking cell X”. 2.
Function
Set: F = 3. Fitness: Each tree plays against all the other trees in the other group. The different scoring is outlined ahead. 4. Parameters: Population size=100+100, Pc=0.4, Pm=0.1. Generations: 600. 5. Termination: When 600 generations have passed.
We tried 3 different point approaches: 1. Equal points: each member of A and B gets one point for victory, and half a point for tie. 2. A strives to win, B strives to tie: seeing the an optimal TTT tactic only wins when it’s playing the first turn, we want to evolve a good offensive strategy for when it starts first, and a good defensive strategy when it starts second. 3. A strives to win, B strives to win or tie: each member of A gets a point for victory, members of B get a point for victory or tie.
Results (in all charts, group A was marked as 1 and group B as 2):
Figure 4.6: The first point approach Best Tactic for group A: Fitness: 93.5 Wins:90 Ties: 7 Best Tactic for group B: Fitness: 46.0 Wins:40 Ties: 12
Figure 4.7: The second point approach
Best Tactic for group A: Fitness: 93.0 Wins:93 Ties: 2 Best Tactic for group B: Fitness: 35.0 Wins:26 Ties: 35
Figure 4.8: The third point approach Best Tactic for group A: Fitness: 80.0 Wins:80 Ties: 8 Best Tactic for group B: Fitness: 65.0 Wins:56 Ties: 9
Discussion and conclusions: If the performance of group A members is any indication, then the best method we tried evolving second turn players in group B is awarding them points for victories and ties, as shown in figure 4.8. Our intuition that a good “tie” player can be evolved by preferring members that score ties was proven wrong by the poor performance of the B population in figure 4.7. Also, giving equal scoring to both groups may have helped develop a better “best” individual in group B, the performance of group A remained more or less the same throughout the three runs.
While intuitively, a player playing first has a great advantage over the player who plays second, we hoped we would see the emergence of a “perfect” defense strategy, that can score a tie against any other player (such a tactic exists in the game), as well as a “perfect” offense strategy, that either ties or wins against any strategy when playing first. And although we did develop tactics that could win 93 out of 100 games when playing first, we think it’s more of a sign of how bad the tactics in group B were.
Which Method is best?
While some of these methods seemed to generate more fitness than others, we however feel that the third, co-evolutionary method, when used in the 3rd configuration presented here is the best way tackle is issue. This method provides us with individuals who can win close to 90% of the time when playing first against a population adapt at playing second, as well as individuals who can play second and win more than 50% of the time, against individuals who are adapt at playing first.
For these reasons, we feel, though we have not tested it on any benchmark, that the individuals we get from this method are the best.
Future improvements:
Attempting a GA representation - The number of possible board positions is less than 39 (don’t know how less, this calculation takes more time than we’re willing to give it), meaning it’s possible (maybe a bit costly) to do a programming table for each possible position and try to evolve it as a GA. This will eliminate bloat and other inherent problems in GPs, however, it remains to be seen how well strategies evolved so will perform. | ||||||
