Page 98 - 49A Field Guide to Genetic Programming
P. 98
84 10 Fast and Distributed Genetic Programming
tiple training examples. The use of many examples provides an accurate
evaluation of a program’s quality. However, ultimately the point of fitness
evaluation is to make a binary decision — does this individual get a child or
not? The overwhelming proportion of GP’s computational effort (or indeed
the effort in any evolutionary computation technique) goes into adjusting
the probability of this binary decision. However, it is not clear that a high-
precision fitness evaluation is always necessary to decide well. Indeed, even
1
when the fitness evaluation is very accurate, most selection algorithms, be-
ing stochastic, inject noise into the decision of which points in the search
space to proceed from and which to abandon. In these cases, reducing the
number of times a program is evaluated is effectively an additional source
of noise. If a program has already demonstrated it works poorly compared
to the rest of the population on a fraction of the available training data, it
not likely to be chosen as a parent. Conversely, if it has already exceeded
many programs in the population after being tested on only a fraction of
the training set, then it is likely to be chosen as a parent (Langdon, 1998).
In either case, it is apparent that we do not gain much by running it on
the remaining training examples. Teller and Andre (1997) developed these
ideas into a useful algorithm called the rational allocation of trials.
As well as the computational cost, there are other negatives consequences
that come from using all the training data all the time, as doing so gives rise
to a static fitness function. In certain circumstances this may encourage the
population to evolve into a cul-de-sac where it is dominated by offspring of a
single initial program which did well on some fraction of the training cases,
but was unable to fit others. A static fitness function can create conditions
where good programs that perform moderately well on most portions of the
training data have lower fitness than those that do very well in only a few
small regions. With high selection pressure, it takes surprisingly little time
for the best individual to dominate the whole population. 2
Gathercole and Ross (1994, 1997) investigated a number of ways of dy-
3
namically changing training samples, yielding a number of interacting ef-
fects. Firstly, by using only a subset of the available data, the GP fitness
evaluation took less time. Secondly, by changing which examples were being
1 Common selection algorithms include roulette wheel selection (Goldberg, 1989), SUS
(Baker, 1987) and tournament selection.
2 This is called the take over time (Goldberg, 1989). This can be formally analysed
(Blickle, 1996; Droste, Jansen, Rudolph, Schwefel, Tinnefeld, and Wegener, 2003), but
for tournament selection, a simple rule of thumb is often sufficient. If T is the tour-
nament size, roughly log T (Pop size) generations are needed for the whole population to
become descendants of a single individual. If, for example, we use binary tournaments
(T = 2), then “take over” will require about ten generations for a population of 1,024.
6
Alternatively, if we have a population of one million (10 ) and use ten individuals in each
tournament (T = 10), then after about six generations more or less everyone will have
the same great 6 great 5 great 4 great 3 grand 2 mother 1 .
3 Siegel (1994) proposed a rather different implementation.