Page 77 - 49A Field Guide to Genetic Programming
P. 77
7.1 Linear Genetic Programming 63
interpreted linear GP, where each instruction is executable by some higher-
level virtual machine (typically written in an efficient language such as C
or C++). When the instructions are actual machine code, then the order
of the elements of the representation shown in Figure 7.2 is determined by
the particular computer architecture used, and the corresponding data must
be packed into bit fields of appropriate sizes. The overhead of packing and
unpacking data can be avoided, however, when one is using virtual machine
instructions since then the designer of a GP system has complete freedom
as to how the virtual machine will interpret its instructions.
If the goal is execution speed, then the evolved code should be machine
code for a real computer rather than some higher level language or virtual-
machine code. This is why Nordin (1994) started by evolving machine code
for SUN computers and Crepeau (1995) targeted the Z80. The linear GP
of Leung, Lee, and Cheang (2002) was designed for novel hardware, but much
of the GP development had to be run in simulation whilst the hardware itself
was under development.
The Sun SPARC has a simple 32-bit RISC architecture which eases
designing genetic operations which manipulate its machine code. Nordin
(1997) wrapped each machine code GP individual (which was a sequence of
machine instructions) inside a C function. Each of the GP program’s inputs
was copied from one of the C function’s arguments into one of the machine
2
registers. As well as the registers used for inputs, a small number (e.g.,
2–4) of other registers are used for scratch memory to store partial results of
intermediate calculations. Finally, the GP simply leaves its answer in one of
the registers. The external framework uses this as the C function’s return
value.
Since Unix was ported onto the x86, Intel’s complex instruction set,
which was already standard with Windows-based PCs, has had almost com-
plete dominance. Seeing this, Nordin ported his Sun RISC linear GP system
onto Intel’s CISC. Various changes were made to the genetic operations
which ensure that the initial random programs are made only of legal In-
tel machine code and that mutation operations, which act inside the x86’s
32-bit word, respect the x86’s complex sub-fields. Since the x86 has instruc-
tions of different lengths, special care has to be taken when altering them.
Typically, several short instructions are packed into each 4-byte word. If
there are any bytes left over, they are filled with no-operation codes. In
this way, best use is made of the available space, without instructions cross-
ing 32-bit boundaries. Nordin’s work led to Discipulus (Foster, 2001),
which has been used in applications ranging from bioinformatics (Vukusic,
Grellscheid, and Wiehe, 2007) to robotics (Langdon and Nordin, 2001) and
2
Anyone using a register-based GP (linear or tree-based) should consider write-
protecting the input registers to prevent the inputs from being overwritten. Otherwise
evolved programs (especially in the early generations) are prone to writing over their
inputs before they’ve had a chance to use them in any constructive way.