Page 113 - 49A Field Guide to Genetic Programming
P. 113

11.2 Search Spaces                                             99


            developed as fully as the schema theory model.
               Exact mathematical models of GP are probabilistic descriptions of the
            operations of selection, reproduction, crossover and mutation. They explic-
            itly represent how these operations determine which areas of the program
            space will be sampled by GP, and with what probability. These models treat
            the fitness function as a black box, however. That is, there is no represen-
            tation of the fact that in GP, unlike in other evolutionary techniques, the
            fitness function involves the execution of computer programs on a variety of
            inputs. In other words, schema theories and Markov chains do not tell us
            how fitness is distributed in the search space. Yet, without this information,
            we have no way of closing the loop and fully characterising the behaviour of a
            GP systems which is always the result of the interaction between the fitness
            function and the search biases of the representation and genetic operations
            used in the system.


            11.2     Search Spaces

            The characterisation of the space of computer programs explored by GP has
            been another main topic of theoretical research (Langdon and Poli, 2002).
            Of course results describing the space of all possible programs are widely
            applicable, not only to GP and other search-based automatic programming
            techniques, but also to many other areas ranging from software engineering
            to theoretical computer science.
               In this category are theoretical results showing that the distribution of
            functionality of non Turing-complete programs approaches a limit as pro-
            gram length increases. That is, although the number of programs of a
            particular length grows exponentially with length, beyond a certain thresh-
            old the fraction of programs implementing any particular functionality is
            effectively constant. For example, in Figure 11.1 we plot the proportion of
            binary program trees composed of NAND gates which implement each of the
            2 2 3  = 256 Boolean functions of three inputs. Notice how, as the length of
            programs increases, the proportion of programs implementing each function
            approaches a limit.
               This does not happen by accident. There is a very substantial body of
            empirical evidence indicating that this happens in a variety of other systems.
            In fact, there are also mathematical proofs of these convergence results for
            two important forms of programs: Lisp (tree-like) S-expressions (without
            side effects) and machine code programs without loops (Langdon, 2002a,b,
            2003a,b, 2005b; Langdon and Poli, 2002). That the limiting distribution of
            functionality reaches a limit as program length increases was also proven for
            a variety of other non-Turing complete computers and languages, including:
            a) cyclic (increment, decrement and NOP), b) bit flip computer (flip bit
            and NOP), c) any non-reversible computer, d) any reversible computer,
   108   109   110   111   112   113   114   115   116   117   118