Page 41 - 49A Field Guide to Genetic Programming
P. 41

3.5 Step 5: Termination and solution designation               27


            approximation, GP runtime can be estimated by the product of: the number
            of runs R, the number of generations G, the size of the population P, the
            average size of the programs s and the number of fitness cases F.
               Typically, the number of generations is limited to between ten and fifty;
            the most productive search is usually performed in those early generations,
            and if a solution hasn’t been found then, it’s unlikely to be found in a
            reasonable amount of time. The folk wisdom on population size is to make
            it as large as possible, but there are those who suggest using many runs with
            much smaller populations instead. Some implementations do not require
            arbitrary limits of tree size. Even so, because of bloat (the uncontrolled
            growth of program sizes during GP runs; see Section 11.3), it is common to
            impose either a size or a depth limit or both (see Section 11.3.2).
               Sometimes the number of fitness cases is limited by the amount of train-
                   5
            ing data available. In this case, the fitness function should use all of it.
            (One does not necessarily need to use verification or holdout data, since
            over-fitting can be avoided by other means, as discussed in Section 13.12,
            page 140.) In other cases, e.g. 22-bit even parity, there can almost be too
            much training data. Then the fitness function may be reduced to use just a
            subset of the training data. This does not necessarily have to be done man-
            ually as there are a number of algorithms that dynamically change the test
            set as the GP runs. (These and other speedup techniques will be discussed
            in Chapter 13, particularly Section 10.1, page 83.)
               It is common to record these details in a tableau, such as Table 4.1 on
            page 31.


            3.5    Step 5: Termination and solution desig-
                   nation

            The fifth preparatory step consists of specifying the termination criterion
            and the method of designating the result of the run. The termination cri-
            terion may include a maximum number of generations to be run as well as
            a problem-specific success predicate. Typically, the single best-so-far indi-
            vidual is then harvested and designated as the result of the run, although
            one might wish to return additional individuals and data as necessary or
            appropriate for the problem domain.








            typically rely more on mutation than crossover for their primary search mechanism.
               5
               Training data refers to the test cases used to evaluate the fitness of the evolved
            individuals.
   36   37   38   39   40   41   42   43   44   45   46