Page 77 - 49A Field Guide to Genetic Programming
P. 77

7.1 Linear Genetic Programming                                 63


            interpreted linear GP, where each instruction is executable by some higher-
            level virtual machine (typically written in an efficient language such as C
            or C++). When the instructions are actual machine code, then the order
            of the elements of the representation shown in Figure 7.2 is determined by
            the particular computer architecture used, and the corresponding data must
            be packed into bit fields of appropriate sizes. The overhead of packing and
            unpacking data can be avoided, however, when one is using virtual machine
            instructions since then the designer of a GP system has complete freedom
            as to how the virtual machine will interpret its instructions.
               If the goal is execution speed, then the evolved code should be machine
            code for a real computer rather than some higher level language or virtual-
            machine code. This is why Nordin (1994) started by evolving machine code
            for SUN computers and Crepeau (1995) targeted the Z80. The linear GP
            of Leung, Lee, and Cheang (2002) was designed for novel hardware, but much
            of the GP development had to be run in simulation whilst the hardware itself
            was under development.
               The Sun SPARC has a simple 32-bit RISC architecture which eases
            designing genetic operations which manipulate its machine code. Nordin
            (1997) wrapped each machine code GP individual (which was a sequence of
            machine instructions) inside a C function. Each of the GP program’s inputs
            was copied from one of the C function’s arguments into one of the machine
                                                         2
            registers. As well as the registers used for inputs, a small number (e.g.,
            2–4) of other registers are used for scratch memory to store partial results of
            intermediate calculations. Finally, the GP simply leaves its answer in one of
            the registers. The external framework uses this as the C function’s return
            value.
               Since Unix was ported onto the x86, Intel’s complex instruction set,
            which was already standard with Windows-based PCs, has had almost com-
            plete dominance. Seeing this, Nordin ported his Sun RISC linear GP system
            onto Intel’s CISC. Various changes were made to the genetic operations
            which ensure that the initial random programs are made only of legal In-
            tel machine code and that mutation operations, which act inside the x86’s
            32-bit word, respect the x86’s complex sub-fields. Since the x86 has instruc-
            tions of different lengths, special care has to be taken when altering them.
            Typically, several short instructions are packed into each 4-byte word. If
            there are any bytes left over, they are filled with no-operation codes. In
            this way, best use is made of the available space, without instructions cross-
            ing 32-bit boundaries. Nordin’s work led to Discipulus (Foster, 2001),
            which has been used in applications ranging from bioinformatics (Vukusic,
            Grellscheid, and Wiehe, 2007) to robotics (Langdon and Nordin, 2001) and
               2
               Anyone using a register-based GP (linear or tree-based) should consider write-
            protecting the input registers to prevent the inputs from being overwritten. Otherwise
            evolved programs (especially in the early generations) are prone to writing over their
            inputs before they’ve had a chance to use them in any constructive way.
   72   73   74   75   76   77   78   79   80   81   82