Page 116 - 49A Field Guide to Genetic Programming
P. 116

102                                 11 GP Theory and its Applications


            Three Classic Explanations for Bloat

            The replication accuracy theory (McPhee and Miller, 1995) states that the
            success of a GP individual depends on its ability to have offspring that are
            functionally similar to the parent. As a consequence, GP evolves towards
            (bloated) representations that increase replication accuracy.
               The nodes in a GP tree can often be crudely categorised into two classes:
            active code and inactive code. Roughly speaking, inactive code is code
            that is not executed, or is executed but its output is then discarded. All
            remaining code is active code. The removal bias theory (Soule and Foster,
            1998a) observes that inactive code in a GP tree tends to be low in the tree,
            residing, therefore, in smaller-than-average-size subtrees. Crossover events
            excising inactive subtrees produce offspring with the same fitness as their
            parents. On average the inserted subtree is bigger than the excised one, so
            such offspring are bigger than average while retaining the fitness of their
            parent, leading ultimately to growth in the average program size.
               Finally, the nature of program search spaces theory (Langdon and Poli,
            1997; Langdon, Soule, Poli, and Foster, 1999) predicts that above a certain
            size, the distribution of fitnesses does not vary with size. Since there are more
            long programs, the number of long programs of a given fitness is greater than
            the number of short programs of the same fitness. Over time GP samples
            longer and longer programs simply because there are more of them.




            Executable Models of Bloat

            The explanations for bloat provided by these three theories are largely qual-
            itative. There have, however, been some efforts to mathematically formalise
            and verify these theories. For example, Banzhaf and Langdon (2002) defined
            an executable model of bloat where only the fitness, the size of active code and
            the size of inactive code were represented (i.e., there was no representation of
            program structures). Fitnesses of individuals were drawn from a bell-shaped
            distribution, while active and inactive code lengths were modified by a size-
            unbiased mutation operator. Various interesting effects were reported which
            are very similar to corresponding effects found in GP runs. Rosca (2003) pro-
            posed a similar, but slightly more sophisticated model which also included
            an analogue of crossover. This provided further interesting evidence.
               A strength of these executable models is their simplicity. A weakness is
            that they suppress or remove many details of the representation and opera-
            tors typically used in GP. This makes it difficult to verify if all the phenom-
            ena observed in the model have analogues in GP runs, and if all important
            behaviours of GP in relation to bloat are captured by the model.
   111   112   113   114   115   116   117   118   119   120   121