Page 113 - 49A Field Guide to Genetic Programming

P. 113

11.2 Search Spaces 99

developed as fully as the schema theory model.
Exact mathematical models of GP are probabilistic descriptions of the
operations of selection, reproduction, crossover and mutation. They explic-
itly represent how these operations determine which areas of the program
space will be sampled by GP, and with what probability. These models treat
the ﬁtness function as a black box, however. That is, there is no represen-
tation of the fact that in GP, unlike in other evolutionary techniques, the
ﬁtness function involves the execution of computer programs on a variety of
inputs. In other words, schema theories and Markov chains do not tell us
how ﬁtness is distributed in the search space. Yet, without this information,
we have no way of closing the loop and fully characterising the behaviour of a
GP systems which is always the result of the interaction between the ﬁtness
function and the search biases of the representation and genetic operations
used in the system.

11.2 Search Spaces

The characterisation of the space of computer programs explored by GP has
been another main topic of theoretical research (Langdon and Poli, 2002).
Of course results describing the space of all possible programs are widely
applicable, not only to GP and other search-based automatic programming
techniques, but also to many other areas ranging from software engineering
to theoretical computer science.
In this category are theoretical results showing that the distribution of
functionality of non Turing-complete programs approaches a limit as pro-
gram length increases. That is, although the number of programs of a
particular length grows exponentially with length, beyond a certain thresh-
old the fraction of programs implementing any particular functionality is
eﬀectively constant. For example, in Figure 11.1 we plot the proportion of
binary program trees composed of NAND gates which implement each of the
2 2 3 = 256 Boolean functions of three inputs. Notice how, as the length of
programs increases, the proportion of programs implementing each function
approaches a limit.
This does not happen by accident. There is a very substantial body of
empirical evidence indicating that this happens in a variety of other systems.
In fact, there are also mathematical proofs of these convergence results for
two important forms of programs: Lisp (tree-like) S-expressions (without
side eﬀects) and machine code programs without loops (Langdon, 2002a,b,
2003a,b, 2005b; Langdon and Poli, 2002). That the limiting distribution of
functionality reaches a limit as program length increases was also proven for
a variety of other non-Turing complete computers and languages, including:
a) cyclic (increment, decrement and NOP), b) bit ﬂip computer (ﬂip bit
and NOP), c) any non-reversible computer, d) any reversible computer,

108 109 110 111 112 113 114 115 116 117 118