Page 121 - 49A Field Guide to Genetic Programming

P. 121

11.3 Bloat 107

The parsimony pressure method can be seen as a way to address the
generalisation–accuracy tradeoﬀ common in machine learning (Rosca and
Ballard, 1996b; Zhang and M¨uhlenbein, 1995). There are also connections
between this method and the Minimum Description Length (MDL) principle
used to control bloat in (Iba, 1997; Iba et al., 1994; Iba, de Garis, and
Sato, 1995a). The MDL approach uses a ﬁtness function which combines
program complexity (expressed as the number of bits necessary to encode
the program’s tree) and classiﬁcation error (expressed as the number of
bits necessary to encode the errors on all ﬁtness cases). Rosca also linked
the parsimony pressure method to his approximate evolution equations for
rooted-tree schemata (Rosca, 1996, 1997; Rosca and Ballard, 1996b, 1999).
Controlling bloat while at the same time maximising ﬁtness turns the
evolution of programs into either a multi-objective optimisation problem or,
at least, into a constrained optimisation problem. The parsimony pressure
method eﬀectively treats the minimisation of size as a soft constraint and
attempts to enforce this constraint using the penalty method, i.e., by decreas-
ing the ﬁtness of programs by an amount that depends on their size. The
penalty is typically simply proportional to program size. The intensity with
which bloat is controlled is, therefore, determined by the parsimony coeﬃ-
cient. The value of this coeﬃcient is very important: too small a value and
runs will still bloat wildly; too large a value and GP will take the minimisa-
tion of size as its main target and will almost ignore ﬁtness, thus converging
towards extremely small but useless programs (Soule, 1998). However, good
values of the parsimony coeﬃcient are highly dependent on particulars such
as the problem being solved, the choice of functions and terminals, and vari-
ous parameter settings. Furthermore, with a constant parsimony coeﬃcient
the method can only achieve partial control over the dynamics of the average
program size over time.
Recently, a theoretically sound method for setting the parsimony coeﬃ-
cient in a principled manner has been proposed (Poli and McPhee, 2008b).
The covariant parsimony pressure method is based on an analysis of the size
evolution Equation (11.1), and is easy to implement. It recalculates the par-
simony coeﬃcient c at each generation using c = Cov(`, f)/ Var(`), where
Cov(`, f) is the covariance between program size ` and program ﬁtness f
in the population, and Var(`) is the variance of program sizes. Note that c
needs to be recalculated each generation because both Cov(`, f) and Var(`)
change from generation to generation. As shown in Figure 11.2 (in the por-
tion labelled “Local”), using this equation ensures that the mean program
size remains at the value set by the initialisation procedure (although there
can be a small amount of drift). There is a variant of the method that allows
the user to even decide what function the mean program size should follow
over time. As shown in the ﬁgure this provides complete control over the
population size dynamics.

116 117 118 119 120 121 122 123 124 125 126