Page 118 - 49A Field Guide to Genetic Programming
P. 118

104                                 11 GP Theory and its Applications


            Crossover Bias Theory of Bloat
            We conclude this review on theories of bloat with a recent explanation for
            bloat called the crossover bias theory (Dignum and Poli, 2007; Poli et al.,
            2007). This is based on and is consistent with the size evolution equation
            (Equation 11.1).
               On average, each application of subtree crossover removes as much ge-
            netic material as it inserts; consequently crossover on its own does not pro-
            duce growth or shrinkage. While the mean program size is unaffected, how-
            ever, higher moments of the distribution are. In particular, crossover pushes
            the population towards a particular distribution of program sizes, known
            as a Lagrange distribution of the second kind, where small programs have a
            much higher frequency than longer ones. For example, crossover generates a
            very high proportion of single-node individuals. In virtually all problems of
            practical interest, however, very small programs have no chance of solving
            the problem. As a result, programs of above average size have a selective
            advantage over programs of below average size, and the mean program size
            increases.
               Because crossover will continue to create small programs, which will then
            be ignored by selection (in favour of the larger programs), the increase in
            average size will continue generation by generation.



            11.3.2   Bloat Control in Practice
            Numerous empirical techniques have been proposed to control bloat (Lang-
            don et al., 1999; Soule and Foster, 1998b). We cannot look at them all.
            However, we briefly review some of the most important.


            Size and Depth Limits

            Rather naturally, the first and simplest method to control code growth is the
            use of hard limits on the size or depth of the offspring programs generated
            by the genetic operators.
               Many implementations of this idea (e.g., (Koza, 1992)) apply a genetic
            operator and then check whether the offspring is beyond the size or depth
            limit. If it isn’t, the offspring enters the population. If, instead, the off-
            spring exceeds the limit, one of the parents is returned. Obviously, this
            implementation does not allow programs to grow too large. However, there
            is a serious problem with this way of applying size limits, or more generally,
            constraints to programs: parent programs that are more likely to violate a
            constraint will tend to be copied (unaltered) more often than programs that
            don’t. That is, the population will tend to be filled up with programs that
            nearly infringe the constraint, which is typically not what is desired.
   113   114   115   116   117   118   119   120   121   122   123