Page 41 - 49A Field Guide to Genetic Programming

P. 41

3.5 Step 5: Termination and solution designation 27

approximation, GP runtime can be estimated by the product of: the number
of runs R, the number of generations G, the size of the population P, the
average size of the programs s and the number of ﬁtness cases F.
Typically, the number of generations is limited to between ten and ﬁfty;
the most productive search is usually performed in those early generations,
and if a solution hasn’t been found then, it’s unlikely to be found in a
reasonable amount of time. The folk wisdom on population size is to make
it as large as possible, but there are those who suggest using many runs with
much smaller populations instead. Some implementations do not require
arbitrary limits of tree size. Even so, because of bloat (the uncontrolled
growth of program sizes during GP runs; see Section 11.3), it is common to
impose either a size or a depth limit or both (see Section 11.3.2).
Sometimes the number of ﬁtness cases is limited by the amount of train-
5
ing data available. In this case, the ﬁtness function should use all of it.
(One does not necessarily need to use veriﬁcation or holdout data, since
over-ﬁtting can be avoided by other means, as discussed in Section 13.12,
page 140.) In other cases, e.g. 22-bit even parity, there can almost be too
much training data. Then the ﬁtness function may be reduced to use just a
subset of the training data. This does not necessarily have to be done man-
ually as there are a number of algorithms that dynamically change the test
set as the GP runs. (These and other speedup techniques will be discussed
in Chapter 13, particularly Section 10.1, page 83.)
It is common to record these details in a tableau, such as Table 4.1 on
page 31.

3.5 Step 5: Termination and solution desig-
nation

The ﬁfth preparatory step consists of specifying the termination criterion
and the method of designating the result of the run. The termination cri-
terion may include a maximum number of generations to be run as well as
a problem-speciﬁc success predicate. Typically, the single best-so-far indi-
vidual is then harvested and designated as the result of the run, although
one might wish to return additional individuals and data as necessary or
appropriate for the problem domain.

typically rely more on mutation than crossover for their primary search mechanism.
5
Training data refers to the test cases used to evaluate the ﬁtness of the evolved
individuals.

36 37 38 39 40 41 42 43 44 45 46