Page 113 - 49A Field Guide to Genetic Programming
P. 113
11.2 Search Spaces 99
developed as fully as the schema theory model.
Exact mathematical models of GP are probabilistic descriptions of the
operations of selection, reproduction, crossover and mutation. They explic-
itly represent how these operations determine which areas of the program
space will be sampled by GP, and with what probability. These models treat
the fitness function as a black box, however. That is, there is no represen-
tation of the fact that in GP, unlike in other evolutionary techniques, the
fitness function involves the execution of computer programs on a variety of
inputs. In other words, schema theories and Markov chains do not tell us
how fitness is distributed in the search space. Yet, without this information,
we have no way of closing the loop and fully characterising the behaviour of a
GP systems which is always the result of the interaction between the fitness
function and the search biases of the representation and genetic operations
used in the system.
11.2 Search Spaces
The characterisation of the space of computer programs explored by GP has
been another main topic of theoretical research (Langdon and Poli, 2002).
Of course results describing the space of all possible programs are widely
applicable, not only to GP and other search-based automatic programming
techniques, but also to many other areas ranging from software engineering
to theoretical computer science.
In this category are theoretical results showing that the distribution of
functionality of non Turing-complete programs approaches a limit as pro-
gram length increases. That is, although the number of programs of a
particular length grows exponentially with length, beyond a certain thresh-
old the fraction of programs implementing any particular functionality is
effectively constant. For example, in Figure 11.1 we plot the proportion of
binary program trees composed of NAND gates which implement each of the
2 2 3 = 256 Boolean functions of three inputs. Notice how, as the length of
programs increases, the proportion of programs implementing each function
approaches a limit.
This does not happen by accident. There is a very substantial body of
empirical evidence indicating that this happens in a variety of other systems.
In fact, there are also mathematical proofs of these convergence results for
two important forms of programs: Lisp (tree-like) S-expressions (without
side effects) and machine code programs without loops (Langdon, 2002a,b,
2003a,b, 2005b; Langdon and Poli, 2002). That the limiting distribution of
functionality reaches a limit as program length increases was also proven for
a variety of other non-Turing complete computers and languages, including:
a) cyclic (increment, decrement and NOP), b) bit flip computer (flip bit
and NOP), c) any non-reversible computer, d) any reversible computer,