Page 121 - 49A Field Guide to Genetic Programming
P. 121

11.3 Bloat                                                    107


               The parsimony pressure method can be seen as a way to address the
            generalisation–accuracy tradeoff common in machine learning (Rosca and
            Ballard, 1996b; Zhang and M¨uhlenbein, 1995). There are also connections
            between this method and the Minimum Description Length (MDL) principle
            used to control bloat in (Iba, 1997; Iba et al., 1994; Iba, de Garis, and
            Sato, 1995a). The MDL approach uses a fitness function which combines
            program complexity (expressed as the number of bits necessary to encode
            the program’s tree) and classification error (expressed as the number of
            bits necessary to encode the errors on all fitness cases). Rosca also linked
            the parsimony pressure method to his approximate evolution equations for
            rooted-tree schemata (Rosca, 1996, 1997; Rosca and Ballard, 1996b, 1999).
               Controlling bloat while at the same time maximising fitness turns the
            evolution of programs into either a multi-objective optimisation problem or,
            at least, into a constrained optimisation problem. The parsimony pressure
            method effectively treats the minimisation of size as a soft constraint and
            attempts to enforce this constraint using the penalty method, i.e., by decreas-
            ing the fitness of programs by an amount that depends on their size. The
            penalty is typically simply proportional to program size. The intensity with
            which bloat is controlled is, therefore, determined by the parsimony coeffi-
            cient. The value of this coefficient is very important: too small a value and
            runs will still bloat wildly; too large a value and GP will take the minimisa-
            tion of size as its main target and will almost ignore fitness, thus converging
            towards extremely small but useless programs (Soule, 1998). However, good
            values of the parsimony coefficient are highly dependent on particulars such
            as the problem being solved, the choice of functions and terminals, and vari-
            ous parameter settings. Furthermore, with a constant parsimony coefficient
            the method can only achieve partial control over the dynamics of the average
            program size over time.
               Recently, a theoretically sound method for setting the parsimony coeffi-
            cient in a principled manner has been proposed (Poli and McPhee, 2008b).
            The covariant parsimony pressure method is based on an analysis of the size
            evolution Equation (11.1), and is easy to implement. It recalculates the par-
            simony coefficient c at each generation using c = Cov(`, f)/ Var(`), where
            Cov(`, f) is the covariance between program size ` and program fitness f
            in the population, and Var(`) is the variance of program sizes. Note that c
            needs to be recalculated each generation because both Cov(`, f) and Var(`)
            change from generation to generation. As shown in Figure 11.2 (in the por-
            tion labelled “Local”), using this equation ensures that the mean program
            size remains at the value set by the initialisation procedure (although there
            can be a small amount of drift). There is a variant of the method that allows
            the user to even decide what function the mean program size should follow
            over time. As shown in the figure this provides complete control over the
            population size dynamics.
   116   117   118   119   120   121   122   123   124   125   126