Page 121 - 49A Field Guide to Genetic Programming
P. 121
11.3 Bloat 107
The parsimony pressure method can be seen as a way to address the
generalisation–accuracy tradeoff common in machine learning (Rosca and
Ballard, 1996b; Zhang and M¨uhlenbein, 1995). There are also connections
between this method and the Minimum Description Length (MDL) principle
used to control bloat in (Iba, 1997; Iba et al., 1994; Iba, de Garis, and
Sato, 1995a). The MDL approach uses a fitness function which combines
program complexity (expressed as the number of bits necessary to encode
the program’s tree) and classification error (expressed as the number of
bits necessary to encode the errors on all fitness cases). Rosca also linked
the parsimony pressure method to his approximate evolution equations for
rooted-tree schemata (Rosca, 1996, 1997; Rosca and Ballard, 1996b, 1999).
Controlling bloat while at the same time maximising fitness turns the
evolution of programs into either a multi-objective optimisation problem or,
at least, into a constrained optimisation problem. The parsimony pressure
method effectively treats the minimisation of size as a soft constraint and
attempts to enforce this constraint using the penalty method, i.e., by decreas-
ing the fitness of programs by an amount that depends on their size. The
penalty is typically simply proportional to program size. The intensity with
which bloat is controlled is, therefore, determined by the parsimony coeffi-
cient. The value of this coefficient is very important: too small a value and
runs will still bloat wildly; too large a value and GP will take the minimisa-
tion of size as its main target and will almost ignore fitness, thus converging
towards extremely small but useless programs (Soule, 1998). However, good
values of the parsimony coefficient are highly dependent on particulars such
as the problem being solved, the choice of functions and terminals, and vari-
ous parameter settings. Furthermore, with a constant parsimony coefficient
the method can only achieve partial control over the dynamics of the average
program size over time.
Recently, a theoretically sound method for setting the parsimony coeffi-
cient in a principled manner has been proposed (Poli and McPhee, 2008b).
The covariant parsimony pressure method is based on an analysis of the size
evolution Equation (11.1), and is easy to implement. It recalculates the par-
simony coefficient c at each generation using c = Cov(`, f)/ Var(`), where
Cov(`, f) is the covariance between program size ` and program fitness f
in the population, and Var(`) is the variance of program sizes. Note that c
needs to be recalculated each generation because both Cov(`, f) and Var(`)
change from generation to generation. As shown in Figure 11.2 (in the por-
tion labelled “Local”), using this equation ensures that the mean program
size remains at the value set by the initialisation procedure (although there
can be a small amount of drift). There is a variant of the method that allows
the user to even decide what function the mean program size should follow
over time. As shown in the figure this provides complete control over the
population size dynamics.