Page 40 - 49A Field Guide to Genetic Programming
P. 40
26 3 Getting Ready to Run Genetic Programming
In some problems we are interested in the output produced by a program,
namely the value returned when we evaluate the tree starting at the root
node. In other problems we are interested in the actions performed by a
program composed of functions with side effects. In either case the fitness
of a program typically depends on the results produced by its execution on
many different inputs or under a variety of different conditions. For example
the program might be tested on all possible combinations of inputs x1, x2,
..., xN. Alternatively, a robot control program might be tested with the
robot in a number of starting locations. These different test cases typically
contribute to the fitness value of a program incrementally, and for this reason
are called fitness cases.
Another common feature of GP fitness measures is that, for many prac-
tical problems, they are multi-objective, i.e., they combine two or more dif-
ferent elements that are often in competition with one another. The area of
multi-objective optimisation is a complex and active area of research in GP
and machine learning in general. See Chapter 9 and also (Deb, 2001).
3.4 Step 4: GP Parameters
The fourth preparatory step specifies the control parameters for the run.
The most important control parameter is the population size. Other control
parameters include the probabilities of performing the genetic operations, the
maximum size for programs and other details of the run.
It is impossible to make general recommendations for setting optimal
parameter values, as these depend too much on the details of the application.
However, genetic programming is in practice robust, and it is likely that
many different parameter values will work. As a consequence, one need not
typically spend a long time tuning GP for it to work adequately.
It is common to create the initial population randomly using ramped
half-and-half (Section 2.2) with a depth range of 2–6. The initial tree sizes
will depend upon the number of the functions, the number of terminals
and the arities of the functions. However, evolution will quickly move the
population away from its initial distribution.
Traditionally, 90% of children are created by subtree crossover. How-
ever, the use of a 50-50 mixture of crossover and a variety of mutations (cf.
Chapter 5) also appears to work well.
In many cases, the main limitation on the population size is the time
taken to evaluate the fitnesses, not the space required to store the individ-
uals. As a rule one prefers to have the largest population size that your
system can handle gracefully; normally, the population size should be at
4
least 500, and people often use much larger populations. Often, to a first
4 There are, however, GP systems that frequently use much smaller populations. These