Page 119 - 49A Field Guide to Genetic Programming
P. 119

11.3 Bloat                                                    105


               It is well known, for example, that depth thresholds lead to the popu-
            lation filling up with very bushy programs where most branches reach the
            depth limit (being effectively full trees). On the contrary, size limits produce
            populations of stringy programs which tend to all approach the size limit.
            See (Crane and McPhee, 2005; McPhee, Jarvis, and Crane, 2004) for more
            on the impact of size and depth limits, and the differences between them.
               The problem can be fixed by not returning parents if the offspring violates
            a constraint. This can be realised with two different strategies. Firstly, one
            can just return the oversize offspring, but give it a fitness of 0, so that
            selection will get rid of it at the next generation. Secondly, one can simply
            declare the genetic operation failed, and try again. This can be done in two
            alternative ways: a) the same parent or parents are used again, but new
            mutation or crossover points are randomly chosen (which can be done up
            to a certain number of times before giving up on those parents), or b) new
            parents are selected and the genetic operation is attempted again.
               If a limit is used, programs must not be so tightly constrained that they
            cannot express any solution to the problem. As a rule of thumb, one should
            try to estimate the size of the minimum possible solution (using the terminals
            and functions given to GP) and add some percentage (e.g., 50-200%) as a
            safety margin. In general, however, it may be hard to heuristically come up
            with good limits, so some trial and error may be required. Alternatively,
            one can use one of the many techniques that have been proposed to adjust
            size limits during runs. These can be both at the level of individuals and the
            population. See for example the work by Silva and Almeida (2003); Silva
            and Costa (2004, 2005a,b); Silva, Silva, and Costa (2005).

            Anti-bloat Genetic Operators
            One can control bloat by using genetic operators which directly or indirectly
            have an anti-bloat effect.
               Among the most recent bloat-control methods are size fair crossover
            and size fair mutation (Crawford-Marks and Spector, 2002; Langdon, 2000).
            These work by constraining the choices made during the execution of a
            genetic operation so as to actively prevent growth. In size-fair crossover, for
            example, the crossover point in the first parent is selected randomly, as in
            standard crossover. Then the size of the subtree to be excised is calculated.
            This is used to constrain the choice of the second crossover point so as
            to guarantee that the subtree chosen from the second parent will not be
            “unfairly” big.
               Older methods include several mutation operators that may help control
            the average tree size in the population while still introducing new genetic
            material. Kinnear (1993) proposes a mutation operator which prevents the
            offspring’s depth being more than 15% larger than its parent. Langdon
            (1998) proposes two mutation operators in which the new random subtree is
   114   115   116   117   118   119   120   121   122   123   124