Page 44 - 49A Field Guide to Genetic Programming
P. 44

30                                4 Example Genetic Programming Run


            Section 3.1. Thus the terminal set, T, is
                                        T = {x, <}.

               The statement of the problem does not specify which functions may be
            employed in the to-be-evolved program. One simple choice for the function
            set is the four ordinary arithmetic functions: addition, subtraction, mul-
            tiplication and division. Most numeric regression problems will require at
            least these operations, sometimes with additional functions such as sin() and
            log(). We will use the simple function set

                                      F = {+, -, *, %},
            where % is protected division as discussed in Section 3.2.1. Note that the
            target polynomial can be expressed exactly using the terminal and function
            sets we have chosen, so these primitives are sufficient (cf. page 23) for the
            quadratic polynomial problem.
               The third preparatory step involves constructing the fitness measure that
            specifies what the user wants. The high-level goal of this problem is to find
            a program whose output is equal to the values of the quadratic polynomial
             2
            x +x+1. Therefore, the fitness assigned to a particular individual in the
            population must reflect how closely the output of an individual program
                                         2
            comes to the target polynomial x + x + 1.
               In principle, the fitness measure could be defined in terms of the mathe-
            matical integral of the difference between the evolved function and the target
            function. However, for most symbolic regression problems, it is not practical
            or even possible to compute the value of the integral analytically. Thus, it
            is common to define the fitness to be the sum of absolute errors measured
            at different values of the independent variable x in the range [−1.0, +1.0].
            In particular, we will measure the errors for x ∈ {−1.0, −0.9, · · · , 0.9, 1.0}.
            A smaller value of fitness (error) is better; a fitness (error) of zero would
            indicate a perfect fit. With this definition, our fitness is (approximately)
                                                        2
            proportional to the area between the parabola x + x + 1 and the curve
            representing the candidate individual (see Figure 4.2 for examples).
               The fourth step is where we set our run parameters. The population size
            in this small illustrative example will be just four. The population size for
            a run of GP typically consists of thousands of individuals, but we will use
            this tiny population size to keep the example manageable. The crossover
            operation is commonly used to generate about 90% of the individuals in
            the population; the reproduction operation (where a fit individual is sim-
            ply copied from one generation to the next) is used to generate about 8%
            of the population; the mutation operation is used to generate about 1% of
            the population; and the architecture-altering operations (see Section 6.1.2)
            are used to generate perhaps 1% of the population. However, because this
            example involves an abnormally small population of only four individuals,
   39   40   41   42   43   44   45   46   47   48   49