Page 84 - 49A Field Guide to Genetic Programming
P. 84

70                                 8 Probabilistic Genetic Programming


               Different EDAs use different models for the probability distribution that
            controls the sampling (see (Larra˜naga, 2002; Larra˜naga and Lozano, 2002)
            for more information). For example, population-based incremental learning
            (PBIL) (Baluja and Caruana, 1995) and the uniform multivariate distribu-
            tion algorithm (UMDA) (M¨uhlenbein and Mahnig, 1999a,b) assume that
            each variable is independent of the other variables. Consequently, these al-
            gorithms need to store and adjust only a linear array of probabilities, one for
            each variable. This works well for problems with weak interactions between
            variables. Since no relationship between the variables is stored or learned,
            however, PBIL and UMDA may have difficulties solving problems where the
            interactions between variables are significant.
               Naturally, higher order models are possible. For example, the MIMIC
            algorithm of de Bonet, Isbell, and Viola (1997) uses second-order statis-
            tics. It is also possible to use flexible models where interactions of dif-
            ferent orders are captured. The Bayesian optimisation algorithm (BOA)
            (Pelikan, Goldberg, and Cant´u-Paz, 1999) uses baysian networks to rep-
            resent generic sampling distributions, while the extended compact genetic
            algorithm (eCGA) (Harik, 1999) clusters genes into groups where the genes
            in each group are assumed to be linked but groups are considered inde-
            pendent. The sampling distribution is then taken to be the product of the
            distributions modelling the groups.
               EDAs have been very successful. However, they are often unable to rep-
            resent both the overall structure of the distribution and its local details,
            typically being more successful at the former. This is because EDAs rep-
            resent the sampling distribution using models with an, inevitably, limited
            number of degrees of freedom. For example, suppose the optimal sampling
            distribution has multiple peaks, corresponding to different local optima, sep-
            arated by large unfit areas. Then, an EDA can either decide to represent
            only one peak, or to represent all of them together with the unfit areas. If
            the EDA chooses the wrong local peak this may lead to it getting stuck and
            not finding the global optimum. Conversely if it takes a wider view, this
            leads to wasting many trials sampling irrelevant poor solutions.
               Consider, for example, a scenario where there are five binary variables,
            x 1 , x 2 , x 3 , x 4 and x 5 , and two promising regions: one near the string of all
            zeros, i.e., (x 1 , x 2 , x 3 , x 4 , x 5 ) = (0, 0, 0, 0, 0), and the other near the string
            of all ones, i.e., (x 1 , x 2 , x 3 , x 4 , x 5 ) = (1, 1, 1, 1, 1). One option for a (simple)
            EDA is to focus on one of the two regions, e.g., setting the variables x i
            to 0 with high probability (say, 90%). This, however, fails to explore the
            other region, and risks missing the global optimum. The other option is to
            maintain both regions as possibilities by setting all the probabilities to 50%,
            i.e., each of the variables x i is as likely to be 0 as 1. These probabilities will
            generate samples in both of the promising regions. For example, the strings
            (0, 0, 0, 0, 0) and (1, 1, 1, 1, 1) will each be generated with a 3.125% proba-
   79   80   81   82   83   84   85   86   87   88   89