Page 128 - 49A Field Guide to Genetic Programming
P. 128

114                                                 12 Applications


            goal is to find a function whose output has some desired property, e.g., the
            function matches some target values (as in the example given in Section 4.1).
            This is generally known as a symbolic regression problem.
               Many people are familiar with the notion of regression. Regression means
            finding the coefficients of a predefined function such that the function best
            fits some data. A problem with regression analysis is that, if the fit is not
            good, the experimenter has to keep trying different functions by hand until
            a good model for the data is found. Not only is this laborious, but also
            the results of the analysis depend very much on the skills and inventiveness
            of the experimenter. Furthermore, even expert users tend to have strong
            mental biases when choosing functions to fit. For example, in many applica-
            tion areas there is a considerable tradition of using only linear or quadratic
            models, even when the data might be better fit by a more complex model.
               Symbolic regression attempts to go beyond this. It consists of finding
            a function that fits the given data points without making any assumptions
            about the structure of that function. Since GP makes no such assumption,
            it is well suited to this sort of discovery task. Symbolic regression was one
            of the earliest applications of GP (Koza, 1992), and continues to be widely
            studied (Cai, Pacheco-Vega, Sen, and Yang, 2006; Gustafson, Burke, and
            Krasnogor, 2005; Keijzer, 2004; Lew, Spencer, Scarpa, Worden, Rutherford,
            and Hemez, 2006).
               The steps necessary to solve symbolic regression problems include the five
            preparatory steps mentioned in Chapter 2. We practiced them in the exam-
            ple in Chapter 4, which was an instance of a symbolic regression problem.
            There is an important difference here, however: the data points provided in
            Chapter 4 were computed using a simple formula, while in most realistic sit-
            uations each point represents the measured values taken by some variables
            at a certain time in some dynamic process, in a repetition of an experiment,
            and so on. So, the collection of an appropriate set of data points for symbolic
            regression is an important and sometimes complex task.
               For instance, consider the case of using GP to evolve a soft sensor (Jor-
            daan, Kordon, Chiang, and Smits, 2004). The intent is to evolve a function
            that will provide a reasonable estimate of what a sensor (in an industrial
            production facility) would report, based on data from other actual sensors
            in the system. This is typically done in cases where placing an actual sensor
            in that location would be difficult or expensive. However, it is necessary to
            place at least one instance of such a sensor in a working system in order to
            collect the data needed to train and test the GP system. Once the sensor
            is placed, one would collect the values reported by that sensor and by all
            the other real sensors that are available to the evolved function, at various
            times, covering the various conditions under which the evolved system will
            be expected to act.
               Such experimental data typically come in large tables where numerous
   123   124   125   126   127   128   129   130   131   132   133