Page 106 - 49A Field Guide to Genetic Programming
P. 106
92 10 Fast and Distributed Genetic Programming
that could be compiled for the GPU on the host PC. The compiled pro-
grams were transferred one at a time to a GPU for fitness evaluation. Both
groups obtained impressive speedups by running many test cases in parallel.
Langdon and Banzhaf (2008) and Langdon and Harrison (2008) created
a SIMD interpreter (Juille and Pollack, 1996) using RapidMind’s GNU C++
OpenGL framework to simultaneously run up to a quarter of a million GP
5
trees on an NVIDIA GPU (see Figure 10.1). As discussed in Section 7.1.2,
GP trees can be linearised. This avoids pointers and yields a very compact
data structure; reducing the amount of memory needed in turn facilitates
the use of large populations. To avoid recursive calls in the interpreter,
Langdon used reverse polish notation (RPN), i.e., a post-fix rather than
a pre-fix notation. Only small modifications are needed to crossover and
mutation so that they act directly on the RPN expressions. This means the
same representation is used on both the host and the GPU. Almost a billion
GP primitives can be interpreted by a single graphics card per second. In
both Cartesian and tree-based GP the genetic operations are done by the
host CPU. Wong, Wong, and Fok (2005) showed, for a genetic algorithm,
these too can be done by the GPU.
Although each of the GPU’s processors may be individually quite fast
and the manufacturers claim huge aggregate FLOPS ratings, the GPUs are
optimised for graphics work. In practice, it is hard to keep all the processors
fully loaded. Nevertheless 30 GFLOPS has been achieved (Langdon and
Harrison, 2008). Given the differences in CPU and GPU architectures and
clock speeds, often the speedup from using a GPU rather than the host
CPU is the most useful statistic. This is obviously determined by many
factors, including the relative importance of amount of computation and
size of data. The measured RPN tree speedups were 7.6-fold (Langdon and
Harrison, 2008) and 12.6-fold (Langdon and Banzhaf, 2008).
10.4.3 GP on FPGAs
Field programmable gate arrays (FPGAs) are chips which contain large ar-
rays of simple logic processing units whose functionality and connectivity
can be changed via software in microseconds by simply writing a configu-
ration into a static memory. Once an FPGA is configured it can update
all of its thousands of logic elements in parallel at the clock speed of the
circuit. Although an FPGA’s clock speed is often an order of magnitude
slower than that of a modern CPU, its massive parallelism makes it a very
powerful computational device. Because of this and of their flexibility there
has been significant interest in using FPGAs in GP.
Work has ranged from the use of FPGAs to speed up fitness evaluation
5
Bigger populations, e.g. five million programs (Langdon and Harrison, 2008), are
possible by loading them onto the GPU in 256k units.