Page 149 - 49A Field Guide to Genetic Programming
P. 149
13.6 Study your Populations 135
the system in an undesirable and unexpected way? Similar questions can be
asked for almost any flavour of GP; think about your goals and expectations,
and explore your populations to see to what degree those are being met.
Similarly, it can be valuable to look at the way your population changes
over time in more detail than that provided by the standard plot of fitness
vs. time. You might look at the distribution of tree sizes during your run,
or the distribution of fitness values. The distribution of fitness values might
suggest things about the structure of the search space as seen by your GP
system. If it seems to be dominated by disjoint values with large gaps
between them, then jumping those gaps may be a major challenge for your
system and it may be the cause for poor performance.
While it is important to look inside your populations, the time and ef-
fort required to do so is effectively a function of how much information is
recorded. Computer algorithms can easily generate enormous amounts of
data, especially if you produce a detailed log of events and individuals gener-
ated during your runs. Consequently, processing those results may become a
challenging data-mining exercise. Finding good ways to visualise those large
data sets can be extremely valuable. While there are a handful of papers
that specifically address visualisation, e.g., (Daida, Hilss, Ward, and Long,
2005; Pohlheim, 1999; Yamashiro, Yoshikawa, and Furuhashi, 2006), and
even the occasional workshop (Smith, Bullock, and Bird, 2002), most visu-
alisation techniques are scattered through the literature and we are unaware
of any comprehensive review. Where we can provide a bit more guidance is
program visualisation.
An obvious (but easy to forget) advantage of GP is that we create visible
programs. This need not be the case with other approaches. So, when
presenting GP results, as a matter of routine one should consider making
a figure which contains the whole evolved program. The dot component of
3
the Graphviz package can be particularly helpful in this regard; Figure 6.2
is an example of a tree diagram generated with a simple dot input file. The
4
program lisp2dot can help with the conversion from Lisp-style expressions
to dot input files.
As the evolved trees can often be very large, it is usually helpful to per-
form at least some basic simplifications such as removing excess significant
digits in constants and combining constant terms. Naturally, after clean-
ing up the evolved program, one should make sure it still works; you should
also clearly indicate in any presentation or write-up that the program you’re
presenting has been cleaned and is not the actual tree generated by GP.
There are methods to automatically simplify expressions (e.g., in Mathe-
matica and Emacs). However, since in general there is an exponentially large
number of equivalent expressions, automatic simplification is hard. Another
3
http://www.graphviz.org/
4
http://www.cs.ucl.ac.uk/staff/W.Langdon/lisp2dot.html