Page 146 - 49A Field Guide to Genetic Programming
P. 146

132                                           13 Troubleshooting GP

            13.2     Can you Trust your Results?

            Since GP is a stochastic search algorithm, different runs may have different
            outcomes and yield different results. Because of this, one needs to be very
            careful in making inferences regarding the degree of success of the system
            from a small set of runs.
               It is possible, for example, to run a GP system 10 times on a particular
            problem, observe that all 10 runs failed to find a solution, and conclude that
            GP cannot solve the problem. However, if the success probability is say 5%
            with a particular choice of parameters and representation, the probability of
            doing 10 runs and all of them failing is almost 60%! So, the failure to solve
            the problem in these 10 runs should not come as a surprise, even though
            there’s a reasonable chance that you would find a solution if you did more
            runs.
               For precisely this reason, it is very important to do enough runs and
            use appropriate statistical tests to ensure that conclusions are statistically
            significant.
               GP runs can often be very time consuming, especially if the fitness func-
            tion is computationally expensive. While parallel and distributed computing
            (see Section 10.4) can significantly speed up the process, tools from the de-
            sign of experiments literature (Bartz-Beielstein, 2006) can also be used to
            reduce the number of different runs that are necessary to explore the space
            in a statistically sound manner.
               A common GP application is classification, e.g., evolving a program or
            function that can classify patient biopsy data into two categories: cancerous
            or benign. There are numerous pitfalls in this type of work, such as using
            all the available data as training data, thereby leaving nothing to use for
            validating your evolved solution on unseen data. There is a broad literature
            on this and related subjects, and numerous tools such as cross-validation
            that one can use when not enough data are available. (See, for example,
            (Hastie, Tibshirani, and Friedman, 2001).) The aim must be to ensure that
            your results can be trusted to work in the real world, rather than in just the
            synthetic environment created by the fitness cases we chose.


            13.3     There are No Silver Bullets

            When working on real problems there are not likely to be any silver bullets.
            No technique (including GP) is likely to solve all instances of an NP-hard
            problem in an amount of time that grows linearly with the size of the prob-
            lem. GP has proven extremely successful in a wide variety of domains (e.g.,
            Chapter 12) but that doesn’t mean that it will work immediately or easily
            in every domain, or even that it is the best tool for a specific domain.
               While some of the successes in the field have been “easy”, most were the
   141   142   143   144   145   146   147   148   149   150   151