Page 153 - 49A Field Guide to Genetic Programming
P. 153
13.9 Control Bloat 139
13.9 Control Bloat
If you are running out of memory or your execution times seem inordinately
high, look at how your average program size is changing over time. If pro-
grams are growing extremely fast, you may want to implement some form
of bloat control (see Section 11.3). Naturally, long runs may simply be the
result of the population being very large or the fitness evaluation being slow.
In these cases, you may find the techniques described in Chapter 10 helpful.
Controlling bloat is also important if your goal is to find a comprehensible
model, since in practice smaller models are easier to understand. A large
model will not only be difficult to understand but also may over-fit the
training data (Gelly, Teytaud, Bredeche, and Schoenauer, 2006).
13.10 Checkpoint Results
Where GP run time is long, it is important to periodically save the current
state of the run. Should the system crash, the run can be restarted from
part way through rather than at the start. Care should be taken to save the
entire state, so restarting a run does not introduce any unknown variation.
The bulk of the state to be saved is the current population.This can be
compressed, e.g., using gzip. While compression can add a few percent
to run time, reductions in disk space to less than one bit per primitive
in the population have been achieved. Checkpointing also allows you to
later continue runs that seemed particularly promising when they reached
whatever maximum generation you set initially.
13.11 Report Well
There are many potential reasons why work may be poorly received. Here
are a few: insufficient explanation of methods and algorithms, insufficient
experimental evidence, insufficient analysis, lack of statistical significance,
lack of replicability, reading too much into one’s results, insufficient novelty,
poor presentation and poor English. In scientific, rather than commercial,
work it is vital to report enough details so that someone else can reproduce
your results. One very useful idea is to publish a table summarising your
GP run. Table 4.1 (page 31) contains an example tableau.
As explained in Section 13.2, it is essential to ensure that results are
statistically significant so that nobody can dismiss them as the consequence
of a lucky fluke. Complex ideas are often best explained by diagrams. When
possible, descriptions of non-trivial algorithms should be accompanied by
pseudocode, along with text describing the most important components of
the algorithm.