Page 44 - 49A Field Guide to Genetic Programming
P. 44
30 4 Example Genetic Programming Run
Section 3.1. Thus the terminal set, T, is
T = {x, <}.
The statement of the problem does not specify which functions may be
employed in the to-be-evolved program. One simple choice for the function
set is the four ordinary arithmetic functions: addition, subtraction, mul-
tiplication and division. Most numeric regression problems will require at
least these operations, sometimes with additional functions such as sin() and
log(). We will use the simple function set
F = {+, -, *, %},
where % is protected division as discussed in Section 3.2.1. Note that the
target polynomial can be expressed exactly using the terminal and function
sets we have chosen, so these primitives are sufficient (cf. page 23) for the
quadratic polynomial problem.
The third preparatory step involves constructing the fitness measure that
specifies what the user wants. The high-level goal of this problem is to find
a program whose output is equal to the values of the quadratic polynomial
2
x +x+1. Therefore, the fitness assigned to a particular individual in the
population must reflect how closely the output of an individual program
2
comes to the target polynomial x + x + 1.
In principle, the fitness measure could be defined in terms of the mathe-
matical integral of the difference between the evolved function and the target
function. However, for most symbolic regression problems, it is not practical
or even possible to compute the value of the integral analytically. Thus, it
is common to define the fitness to be the sum of absolute errors measured
at different values of the independent variable x in the range [−1.0, +1.0].
In particular, we will measure the errors for x ∈ {−1.0, −0.9, · · · , 0.9, 1.0}.
A smaller value of fitness (error) is better; a fitness (error) of zero would
indicate a perfect fit. With this definition, our fitness is (approximately)
2
proportional to the area between the parabola x + x + 1 and the curve
representing the candidate individual (see Figure 4.2 for examples).
The fourth step is where we set our run parameters. The population size
in this small illustrative example will be just four. The population size for
a run of GP typically consists of thousands of individuals, but we will use
this tiny population size to keep the example manageable. The crossover
operation is commonly used to generate about 90% of the individuals in
the population; the reproduction operation (where a fit individual is sim-
ply copied from one generation to the next) is used to generate about 8%
of the population; the mutation operation is used to generate about 1% of
the population; and the architecture-altering operations (see Section 6.1.2)
are used to generate perhaps 1% of the population. However, because this
example involves an abnormally small population of only four individuals,