Page 120 - 49A Field Guide to Genetic Programming
P. 120
106 11 GP Theory and its Applications
on average the same size as the code it replaces. In Hoist mutation (Kinnear,
1994a) the new subtree is selected from the subtree being removed from the
parent, guaranteeing that the new program will be smaller than its parent.
Shrink mutation (Angeline, 1996) is a special case of subtree mutation where
the randomly chosen subtree is replaced by a randomly chosen terminal.
McPhee and Poli (2002) provides theoretical analysis and empirical evidence
that combinations of subtree crossover and subtree mutation operators can
control bloat in linear GP systems.
Other methods which control bloat by exploiting the bias of the operators
were discussed in Section 9.4.
Anti-Bloat Selection
As clarified by the size evolution equation discussed in the previous section,
in systems with symmetric operators, bloat can only happen if there are
some longer-than-average programs that are fitter than average or some
shorter-than-average programs that are less fit than average, or both. So,
it stands to reason that in order to control bloat one needs to somehow
modulate the selection probabilities of programs based on their size.
As we have discussed in Section 9.2.1, recent methods also include the
use of multi-objective optimisation to control bloat. This typically involves
the use of a modified selection based on the Pareto criterion.
A recent technique, the Tarpeian method (Poli, 2003), controls bloat
by acting directly on the selection probabilities in Equation (11.2). This is
done by setting the fitness of randomly chosen longer-than-average programs
to 0. This prevents them being parents. By changing how frequently this
is done the anti-bloat intensity of Tarpeian control can be modulated. An
advantage of the method is that the programs whose fitness is zeroed are
never executed, thereby speeding up runs.
The well-known parsimony pressure method (Koza, 1992; Zhang and
M¨uhlenbein, 1993, 1995; Zhang et al., 1997) changes the selection probabili-
ties by subtracting a value based on the size of each program from its fitness.
Bigger programs have more subtracted and, so, have lower fitness and tend
to have fewer children. That is, the new fitness function is f(x) − c × `(x),
where `(x) is the size of program x, f(x) is its original fitness and c is a con-
stant known as the parsimony coefficient. 2 Zhang and M¨uhlenbein (1995)
showed some benefits of adaptively adjusting the coefficient c at each gen-
eration but most implementations actually keep the parsimony coefficient
constant.
2
While the new fitness is used to guide evolution, one still needs to use the original
fitness function to recognise solutions and stop runs.