Page 40 - 49A Field Guide to Genetic Programming
P. 40

26                         3 Getting Ready to Run Genetic Programming


               In some problems we are interested in the output produced by a program,
            namely the value returned when we evaluate the tree starting at the root
            node. In other problems we are interested in the actions performed by a
            program composed of functions with side effects. In either case the fitness
            of a program typically depends on the results produced by its execution on
            many different inputs or under a variety of different conditions. For example
            the program might be tested on all possible combinations of inputs x1, x2,
            ..., xN. Alternatively, a robot control program might be tested with the
            robot in a number of starting locations. These different test cases typically
            contribute to the fitness value of a program incrementally, and for this reason
            are called fitness cases.
               Another common feature of GP fitness measures is that, for many prac-
            tical problems, they are multi-objective, i.e., they combine two or more dif-
            ferent elements that are often in competition with one another. The area of
            multi-objective optimisation is a complex and active area of research in GP
            and machine learning in general. See Chapter 9 and also (Deb, 2001).


            3.4    Step 4: GP Parameters

            The fourth preparatory step specifies the control parameters for the run.
            The most important control parameter is the population size. Other control
            parameters include the probabilities of performing the genetic operations, the
            maximum size for programs and other details of the run.
               It is impossible to make general recommendations for setting optimal
            parameter values, as these depend too much on the details of the application.
            However, genetic programming is in practice robust, and it is likely that
            many different parameter values will work. As a consequence, one need not
            typically spend a long time tuning GP for it to work adequately.
               It is common to create the initial population randomly using ramped
            half-and-half (Section 2.2) with a depth range of 2–6. The initial tree sizes
            will depend upon the number of the functions, the number of terminals
            and the arities of the functions. However, evolution will quickly move the
            population away from its initial distribution.
               Traditionally, 90% of children are created by subtree crossover. How-
            ever, the use of a 50-50 mixture of crossover and a variety of mutations (cf.
            Chapter 5) also appears to work well.
               In many cases, the main limitation on the population size is the time
            taken to evaluate the fitnesses, not the space required to store the individ-
            uals. As a rule one prefers to have the largest population size that your
            system can handle gracefully; normally, the population size should be at
                                                             4
            least 500, and people often use much larger populations. Often, to a first

               4 There are, however, GP systems that frequently use much smaller populations. These
   35   36   37   38   39   40   41   42   43   44   45