Page 115 - 49A Field Guide to Genetic Programming
P. 115

11.3 Bloat                                                    101

            11.3     Bloat


            Starting in the early 1990s, researchers began to notice that in addition to
            progressively increasing their mean and best fitness, GP populations also
            showed certain other dynamics. In particular, it was noted that very often
            the average size (number of nodes) of the programs in a population, after a
            certain number of generations in which it was largely static, at some point
            would start growing at a rapid pace. Typically the increase in program size
            was not accompanied by any corresponding increase in fitness. The origin
            of this phenomenon, which is known as bloat, has effectively been a mystery
            for over a decade.
               Note that there are situations where one would expect to see program
            growth as part of the process of solving a problem. For example, GP runs
            typically start from populations of small random programs, and it may be
            necessary for the programs to grow in complexity for them to be able to
            comply with all the fitness cases (a situation which often arises in continuous
            symbolic regression problems). So, we should not equate growth with bloat
            and we should define bloat as program growth without (significant) return in
            terms of fitness.
               Bloat is not only surprising, it also has significant practical effects: large
            programs are computationally expensive to evolve and later use, can be hard
            to interpret, and may exhibit poor generalisation. For these reasons bloat
            has been a subject of intense study in GP. Over the years, many theories
            have been proposed to explain various aspects of bloat, and while great
            strides have been made, we still lack a single, universally-accepted unifying
            theory to explain the broad range of empirical observations. We review the
            key theoretical results on bloat in Section 11.3.1.
               While discussions on the causes of bloat were going on, practitioners have
            still had to face the reality of combating bloat in their runs. Consequently,
            a variety of effective practical techniques have been proposed to counteract
            bloat. We review these in Section 11.3.2, where we will particularly focus on
            the parsimony pressure method (Koza, 1992; Zhang and M¨uhlenbein, 1993,
            1995; Zhang et al., 1997), which is perhaps the simplest and most frequently
            used method to control bloat in genetic programming.




            11.3.1   Bloat in Theory


            As mentioned above, there are several theories of bloat. Let us start by
            looking at three of the oldest ones: the replication accuracy theory, the
            removal bias theory and the nature of program search spaces theory.
   110   111   112   113   114   115   116   117   118   119   120