Page 98 - 49A Field Guide to Genetic Programming
P. 98

84                         10 Fast and Distributed Genetic Programming


            tiple training examples. The use of many examples provides an accurate
            evaluation of a program’s quality. However, ultimately the point of fitness
            evaluation is to make a binary decision — does this individual get a child or
            not? The overwhelming proportion of GP’s computational effort (or indeed
            the effort in any evolutionary computation technique) goes into adjusting
            the probability of this binary decision. However, it is not clear that a high-
            precision fitness evaluation is always necessary to decide well. Indeed, even
                                                                         1
            when the fitness evaluation is very accurate, most selection algorithms, be-
            ing stochastic, inject noise into the decision of which points in the search
            space to proceed from and which to abandon. In these cases, reducing the
            number of times a program is evaluated is effectively an additional source
            of noise. If a program has already demonstrated it works poorly compared
            to the rest of the population on a fraction of the available training data, it
            not likely to be chosen as a parent. Conversely, if it has already exceeded
            many programs in the population after being tested on only a fraction of
            the training set, then it is likely to be chosen as a parent (Langdon, 1998).
            In either case, it is apparent that we do not gain much by running it on
            the remaining training examples. Teller and Andre (1997) developed these
            ideas into a useful algorithm called the rational allocation of trials.
               As well as the computational cost, there are other negatives consequences
            that come from using all the training data all the time, as doing so gives rise
            to a static fitness function. In certain circumstances this may encourage the
            population to evolve into a cul-de-sac where it is dominated by offspring of a
            single initial program which did well on some fraction of the training cases,
            but was unable to fit others. A static fitness function can create conditions
            where good programs that perform moderately well on most portions of the
            training data have lower fitness than those that do very well in only a few
            small regions. With high selection pressure, it takes surprisingly little time
            for the best individual to dominate the whole population. 2
               Gathercole and Ross (1994, 1997) investigated a number of ways of dy-
                                            3
            namically changing training samples, yielding a number of interacting ef-
            fects. Firstly, by using only a subset of the available data, the GP fitness
            evaluation took less time. Secondly, by changing which examples were being

               1 Common selection algorithms include roulette wheel selection (Goldberg, 1989), SUS
            (Baker, 1987) and tournament selection.
               2 This is called the take over time (Goldberg, 1989). This can be formally analysed
            (Blickle, 1996; Droste, Jansen, Rudolph, Schwefel, Tinnefeld, and Wegener, 2003), but
            for tournament selection, a simple rule of thumb is often sufficient. If T is the tour-
            nament size, roughly log T (Pop size) generations are needed for the whole population to
            become descendants of a single individual. If, for example, we use binary tournaments
            (T = 2), then “take over” will require about ten generations for a population of 1,024.
                                                     6
            Alternatively, if we have a population of one million (10 ) and use ten individuals in each
            tournament (T = 10), then after about six generations more or less everyone will have
            the same great 6 great 5 great 4 great 3 grand 2 mother 1 .
               3 Siegel (1994) proposed a rather different implementation.
   93   94   95   96   97   98   99   100   101   102   103