Page 55 - 49A Field Guide to Genetic Programming
P. 55
5.1 Constructing the Initial Population 41
range the size and shape of the initial trees within a few generations. As
discussed in Section 11.3.1 (page 101), the excessive sampling of short pro-
grams appears to be an important cause of bloat (the uncontrolled growth
of programs during GP runs, which will be described in more detail in Sec-
tion 11.3, page 101 onwards). It has been shown (Dignum and Poli, 2007)
that when the initial population is created with the size distribution pre-
ferred by crossover (see Section 11.3.1), bloat is more marked. The distri-
bution has a known mathematical formula (it is a Lagrange distribution of
the second kind), but in practice it can be created by simply performing
multiple rounds of crossover on a population created in the traditional way
before the GP run starts. This is known as Lagrange initialisation. These
findings suggest that initialisation methods which tend to produce many
short programs may in fact induce bloat sooner than methods that produce
distributions more skewed towards larger programs.
5.1.3 Seeding
The most common way of starting a GP run from an informed non-random
point is seeding the initial population with an individual which, albeit not
a solution, is thought to be a good starting point. Such a seed may have
been produced by an earlier GP run or perhaps constructed by the user
(Aler, Borrajo, and Isasi, 2002; Holmes, 1995; Hsu and Gustafson, 2001;
Langdon and Nordin, 2000; Langdon and Treleaven, 1997; Westerberg and
Levine, 2001). However, Marek, Smart, and Martin (2002) reported that
hand written programs may not be robust enough to prosper in an evolving
population.
One point to be careful of is that such a seed individual is liable to be
much better than randomly created trees. Thus, its descendants may take
over the population within a few generations. So, under evolution the seeded
population is initially liable to lose diversity rapidly. Furthermore, depend-
ing upon the details of the selection scheme used, a single seed individual
may have some chance of being removed from the population. Both problems
are normally dealt with by filling the whole population with either identical
or mutated copies of the seed. This method creates a low diversity initial
population in a controlled way, thereby avoiding the initial uncontrolled loss
of diversity associated with single seeds. Furthermore, with many copies of
the seed, few selection methods will have much chance of removing all copies
of the seed before they are able to create children. Diversity preserving tech-
niques, such as multi-objective GP (e.g., (Parrott, Li, and Ciesielski, 2005),
(Setzkorn, 2005) and Chapter 9), demes (Langdon, 1998) (see Section 10.3),
fitness sharing (Goldberg, 1989) and the use of multiple seed trees, might
also be good cures for the problems associated with the use of a single seed.
In any case, the diversity of the population should be monitored to ensure
that there is significant mixing of different initial trees.