Page 153 - 49A Field Guide to Genetic Programming
P. 153

13.9 Control Bloat                                            139

            13.9     Control Bloat


            If you are running out of memory or your execution times seem inordinately
            high, look at how your average program size is changing over time. If pro-
            grams are growing extremely fast, you may want to implement some form
            of bloat control (see Section 11.3). Naturally, long runs may simply be the
            result of the population being very large or the fitness evaluation being slow.
            In these cases, you may find the techniques described in Chapter 10 helpful.
               Controlling bloat is also important if your goal is to find a comprehensible
            model, since in practice smaller models are easier to understand. A large
            model will not only be difficult to understand but also may over-fit the
            training data (Gelly, Teytaud, Bredeche, and Schoenauer, 2006).


            13.10     Checkpoint Results

            Where GP run time is long, it is important to periodically save the current
            state of the run. Should the system crash, the run can be restarted from
            part way through rather than at the start. Care should be taken to save the
            entire state, so restarting a run does not introduce any unknown variation.
            The bulk of the state to be saved is the current population.This can be
            compressed, e.g., using gzip. While compression can add a few percent
            to run time, reductions in disk space to less than one bit per primitive
            in the population have been achieved. Checkpointing also allows you to
            later continue runs that seemed particularly promising when they reached
            whatever maximum generation you set initially.


            13.11     Report Well

            There are many potential reasons why work may be poorly received. Here
            are a few: insufficient explanation of methods and algorithms, insufficient
            experimental evidence, insufficient analysis, lack of statistical significance,
            lack of replicability, reading too much into one’s results, insufficient novelty,
            poor presentation and poor English. In scientific, rather than commercial,
            work it is vital to report enough details so that someone else can reproduce
            your results. One very useful idea is to publish a table summarising your
            GP run. Table 4.1 (page 31) contains an example tableau.
               As explained in Section 13.2, it is essential to ensure that results are
            statistically significant so that nobody can dismiss them as the consequence
            of a lucky fluke. Complex ideas are often best explained by diagrams. When
            possible, descriptions of non-trivial algorithms should be accompanied by
            pseudocode, along with text describing the most important components of
            the algorithm.
   148   149   150   151   152   153   154   155   156   157   158