Page 153 - 49A Field Guide to Genetic Programming

P. 153

13.9 Control Bloat 139

13.9 Control Bloat

If you are running out of memory or your execution times seem inordinately
high, look at how your average program size is changing over time. If pro-
grams are growing extremely fast, you may want to implement some form
of bloat control (see Section 11.3). Naturally, long runs may simply be the
result of the population being very large or the ﬁtness evaluation being slow.
In these cases, you may ﬁnd the techniques described in Chapter 10 helpful.
Controlling bloat is also important if your goal is to ﬁnd a comprehensible
model, since in practice smaller models are easier to understand. A large
model will not only be diﬃcult to understand but also may over-ﬁt the
training data (Gelly, Teytaud, Bredeche, and Schoenauer, 2006).

13.10 Checkpoint Results

Where GP run time is long, it is important to periodically save the current
state of the run. Should the system crash, the run can be restarted from
part way through rather than at the start. Care should be taken to save the
entire state, so restarting a run does not introduce any unknown variation.
The bulk of the state to be saved is the current population.This can be
compressed, e.g., using gzip. While compression can add a few percent
to run time, reductions in disk space to less than one bit per primitive
in the population have been achieved. Checkpointing also allows you to
later continue runs that seemed particularly promising when they reached
whatever maximum generation you set initially.

13.11 Report Well

There are many potential reasons why work may be poorly received. Here
are a few: insuﬃcient explanation of methods and algorithms, insuﬃcient
experimental evidence, insuﬃcient analysis, lack of statistical signiﬁcance,
lack of replicability, reading too much into one’s results, insuﬃcient novelty,
poor presentation and poor English. In scientiﬁc, rather than commercial,
work it is vital to report enough details so that someone else can reproduce
your results. One very useful idea is to publish a table summarising your
GP run. Table 4.1 (page 31) contains an example tableau.
As explained in Section 13.2, it is essential to ensure that results are
statistically signiﬁcant so that nobody can dismiss them as the consequence
of a lucky ﬂuke. Complex ideas are often best explained by diagrams. When
possible, descriptions of non-trivial algorithms should be accompanied by
pseudocode, along with text describing the most important components of
the algorithm.

148 149 150 151 152 153 154 155 156 157 158