Page 149 - 49A Field Guide to Genetic Programming
P. 149

13.6 Study your Populations                                   135


            the system in an undesirable and unexpected way? Similar questions can be
            asked for almost any flavour of GP; think about your goals and expectations,
            and explore your populations to see to what degree those are being met.
               Similarly, it can be valuable to look at the way your population changes
            over time in more detail than that provided by the standard plot of fitness
            vs. time. You might look at the distribution of tree sizes during your run,
            or the distribution of fitness values. The distribution of fitness values might
            suggest things about the structure of the search space as seen by your GP
            system. If it seems to be dominated by disjoint values with large gaps
            between them, then jumping those gaps may be a major challenge for your
            system and it may be the cause for poor performance.
               While it is important to look inside your populations, the time and ef-
            fort required to do so is effectively a function of how much information is
            recorded. Computer algorithms can easily generate enormous amounts of
            data, especially if you produce a detailed log of events and individuals gener-
            ated during your runs. Consequently, processing those results may become a
            challenging data-mining exercise. Finding good ways to visualise those large
            data sets can be extremely valuable. While there are a handful of papers
            that specifically address visualisation, e.g., (Daida, Hilss, Ward, and Long,
            2005; Pohlheim, 1999; Yamashiro, Yoshikawa, and Furuhashi, 2006), and
            even the occasional workshop (Smith, Bullock, and Bird, 2002), most visu-
            alisation techniques are scattered through the literature and we are unaware
            of any comprehensive review. Where we can provide a bit more guidance is
            program visualisation.
               An obvious (but easy to forget) advantage of GP is that we create visible
            programs. This need not be the case with other approaches. So, when
            presenting GP results, as a matter of routine one should consider making
            a figure which contains the whole evolved program. The dot component of
                               3
            the Graphviz package can be particularly helpful in this regard; Figure 6.2
            is an example of a tree diagram generated with a simple dot input file. The
                            4
            program lisp2dot can help with the conversion from Lisp-style expressions
            to dot input files.
               As the evolved trees can often be very large, it is usually helpful to per-
            form at least some basic simplifications such as removing excess significant
            digits in constants and combining constant terms. Naturally, after clean-
            ing up the evolved program, one should make sure it still works; you should
            also clearly indicate in any presentation or write-up that the program you’re
            presenting has been cleaned and is not the actual tree generated by GP.
               There are methods to automatically simplify expressions (e.g., in Mathe-
            matica and Emacs). However, since in general there is an exponentially large
            number of equivalent expressions, automatic simplification is hard. Another
               3
               http://www.graphviz.org/
               4
               http://www.cs.ucl.ac.uk/staff/W.Langdon/lisp2dot.html
   144   145   146   147   148   149   150   151   152   153   154