Page 101 - 49A Field Guide to Genetic Programming
P. 101
10.2 Reducing Cost of Fitness with Caches 87
only must the answer be stored, but the interpreter needs to know that the
subtree’s inputs are the same too. The common practices of GP come to our
aid here. Usually every tree in the population is run on exactly the same
inputs for each of the fitness cases. Thus, for a cache to work, the interpreter
does not need to know a tree’s inputs in detail, it need only know which of
the fixed set of test cases was used.
A simple means of implementing this type of cache is to store a vector of
values returned by each subtree for each of the test cases. Whenever a sub-
tree is created (i.e., in the initial generation, by crossover or by mutations)
the interpreter is run and the cache of values for its root node is set. Note
this is recursive, so caches can also be calculated for subtrees within it at
the same time. Now, when the interpreter is run and comes to a subtree’s
root node, it will simply retrieve the value it calculated earlier, using the
test case’s number as an index into the cache vector.
If a subtree is created by mutation, then its cache of values will be
initially empty and will have to be calculated. However, this costs no more
than it would without caches.
When code is inserted into an existing tree, be it by mutation or
crossover, the chance that the new code behaves identically to the old code
is normally very small. This means that the caches of every node between
the new code and the root node may be invalid. The simplest solution is
to re-evaluate them all. This may sound expensive, but the caches in all
the other parts of the individual remain valid and can be used when the
cache above them is re-evaluated. Thus, in effect, if the crossed over code is
inserted at depth d, only d nodes need to be evaluated.
The whole question of monitoring how effective individual caches are,
what their hit-rates are, etc. has been little explored. In practice, impressive
savings have been achieved by simple implementations, with little monitor-
ing and rudimentary garbage collection. Recent analysis (Ciesielski and Li,
2004; Dignum and Poli, 2007; Langdon and Poli, 2002; Poli et al., 2007)
has shown that GP trees tend not to have symmetric shapes, and many
leaves are very close to the root. This provides a theoretical explanation for
why considerable computational saving can be made by using fitness caches.
While it is possible to use hashing schemes to efficiently find common code,
in practice assuming that common code only arises because it was inherited
from the same location (e.g., by crossing over) is sufficient.
As well as the original Directed acyclic graph (DAG) implementation
(Handley, 1994) other work includes (Ciesielski and Li, 2004; Keijzer, 1996;
McPhee, Hopper, and Reierson, 1998; Yangiya, 1995). While so far we have
only considered programs where no side effects take place, there are cases
where caching can be extended outside this domain. For example, Langdon
(1998) used fitness caches in evolved trees with side effects by exploiting
syntax rules about where in the code the side-effects could lie.