Page 118 - 49A Field Guide to Genetic Programming
P. 118
104 11 GP Theory and its Applications
Crossover Bias Theory of Bloat
We conclude this review on theories of bloat with a recent explanation for
bloat called the crossover bias theory (Dignum and Poli, 2007; Poli et al.,
2007). This is based on and is consistent with the size evolution equation
(Equation 11.1).
On average, each application of subtree crossover removes as much ge-
netic material as it inserts; consequently crossover on its own does not pro-
duce growth or shrinkage. While the mean program size is unaffected, how-
ever, higher moments of the distribution are. In particular, crossover pushes
the population towards a particular distribution of program sizes, known
as a Lagrange distribution of the second kind, where small programs have a
much higher frequency than longer ones. For example, crossover generates a
very high proportion of single-node individuals. In virtually all problems of
practical interest, however, very small programs have no chance of solving
the problem. As a result, programs of above average size have a selective
advantage over programs of below average size, and the mean program size
increases.
Because crossover will continue to create small programs, which will then
be ignored by selection (in favour of the larger programs), the increase in
average size will continue generation by generation.
11.3.2 Bloat Control in Practice
Numerous empirical techniques have been proposed to control bloat (Lang-
don et al., 1999; Soule and Foster, 1998b). We cannot look at them all.
However, we briefly review some of the most important.
Size and Depth Limits
Rather naturally, the first and simplest method to control code growth is the
use of hard limits on the size or depth of the offspring programs generated
by the genetic operators.
Many implementations of this idea (e.g., (Koza, 1992)) apply a genetic
operator and then check whether the offspring is beyond the size or depth
limit. If it isn’t, the offspring enters the population. If, instead, the off-
spring exceeds the limit, one of the parents is returned. Obviously, this
implementation does not allow programs to grow too large. However, there
is a serious problem with this way of applying size limits, or more generally,
constraints to programs: parent programs that are more likely to violate a
constraint will tend to be copied (unaltered) more often than programs that
don’t. That is, the population will tend to be filled up with programs that
nearly infringe the constraint, which is typically not what is desired.