Page 112 - 49A Field Guide to Genetic Programming
P. 112

98                                  11 GP Theory and its Applications


            it is normally impossible to exactly visualise the program search space due
            to its high dimensionality and complexity, making it that much harder to
            understand.
               An alternative approach to better understanding the dynamics of GP is
            to study mathematical models of evolutionary search. There are a number
            of cases where this approach has been very successful in illuminating some
            of the fundamental processes and biases in GP systems. In this chapter we
            will review several theoretical approaches to understanding GP, including
            mathematical models of GP (Section 11.1), analyses of the structure of GP
            search spaces (Section 11.2), and the use of theory to understand and combat
            the chronic problem of bloat in a principled fashion (Section 11.3).

            11.1     Mathematical Models
            Schema theories are among the oldest and the best known models of evolu-
            tionary algorithms (Holland, 1992; Whitley, 1994). Schema theories are
            based on the idea of partitioning the search space into subsets, called
            schemata. They are concerned with modelling and explaining the dynamics
            of the distribution of the population over the schemata. Modern genetic
            algorithm schema theory (Stephens and Waelbroeck, 1997, 1999) provides
            exact information about the distribution of the population at the next gen-
            eration in terms of quantities measured at the current generation, without
            having to actually run the algorithm.
               The theory of schemata in GP has had a difficult childhood. Some excel-
            lent early efforts led to different worst-case-scenario schema theorems (Al-
            tenberg, 1994; Koza, 1992; O’Reilly and Oppacher, 1994b; Poli and Langdon,
            1997; Rosca, 1997; Whigham, 1995). Only very recently have the first ex-
            act schema theories become available (Poli, 2000a,b, 2001a) which give exact
            formulations (rather than lower bounds) for the expected number of individ-
            uals sampling a schema at the next generation. Initially (Poli, 2000b, 2001a),
            these exact theories were only applicable to GP with one-point crossover (see
            Section 5.3). However, more recently they have been extended to the class of
            homologous crossovers (Poli, McPhee, and Rowe, 2004) and to virtually all
            types of crossovers that swap subtrees (Poli and McPhee, 2003a,b), including
            standard GP crossover with and without uniform selection of the crossover
            points (Section 2.4), one-point crossover, context-preserving crossover and
            size-fair crossover (which have been described in Section 5.3), as well as
            more constrained forms of crossover such as strongly-typed GP crossover
            (see Section 6.2.2), and many others.
               Other models of evolutionary algorithms include models based on
            Markov chain theory (e.g. (Davis and Principe, 1993; Nix and Vose, 1992))
            and on statistical mechanics (e.g. (Pr¨ugel-Bennett and Shapiro, 1994)).
            Markov models have been applied to GP (Mitavskiy and Rowe, 2006; Poli
            et al., 2004; Poli, Rowe, and McPhee, 2001), but so far they have not been
   107   108   109   110   111   112   113   114   115   116   117