bytegp.txt

来自「标准的GP源代码,由Andy Singleton维护」· 文本 代码 · 共 323 行 · 第 1/2 页

TXT
323
字号
	Chrome* CrossTree(Chrome* mate);	void Mutate();                          // mutate self		// load and parse and expression.  Return success	virtual int Load(istream& istr,int issource);		 // write the expression to a stream	void write(int pretty,ostream& ofile = cout);A Function object carries information needed by the Chrome for initializing, displaying and evaluating expressions that contain that function.  Many functions, such as the arithmetic functions, are problem independent, but some functions, such as data terminals, require the data structures and evaluation environment of a particular problem.  Important data members include:	char name[30];          // written name	    // number of arguments.  Used to initialize and traverse arguments.	int argnum;	    //number of variables in variable table of opcode. To init node.idx	int varnum;	    // selection frequency relative to other functions.   Used in initialization, mutation.	int weight;	       // pointer to evaluation code.  Not a virtual for speed reasons	EVALFUNC evalfunc;The important method is:	retval eval(class Chrome* st) {return (evalfunc)(st);};Pop classThe Pop (short for Population) carries an array of Chromes, evaluates them with the fitness function provided in the current Problem, and performs a genetic algorithm to generate new and better Chromes.  Important data members include:	Chrome** pop;           // allocated array holding the actual chromes	Problem* problem;       // The current Problem	UINT popsize;           // Number of Chromes	long gencount;          // number of individuals generated so far	int   BestMember;       // Index of current best ChromeImportant methods include:		// set up a population for a particular problem	Pop(Problem* prob,ChromeParams* par, UINT size );		// generate a new chrome.  Return its fitness	virtual Chrome* generate();		// generate until reaching time max_time, evaluating maxevals individuals, or reaching maxfitness	virtual Chrome* go_until(time_t max_time,long maxevals,float maxfitness);Pop uses a "steady state" genetic algorithm, generating one new Chrome and replacing one old Chrome at a time, as opposed to making a whole new batch as a "generation".  The generate method is the agent which selects one or more parents and applies mutation or crossover.  This virtual method could be replaced by code with different strategies for selection, reproduction, or interaction with parallel populations.Selection strategy is important.  We want to choose parents randomly enough so that we maintain a healthy diversity in our population.  If only the best individual reproduces, the GA will run out of raw material.  On the other hand, progress requires a lot more reproductions of the good individuals than the bad ones.  The default generate uses tournament selection of size six, which means that it looks at a set of six candidates selected at random and chooses the best one in the set.  We could make this "greedier", or more likely to pick the very best individuals, by increasing the size of the set.  The default generate also uses a kill-tournament to select an individual for replacement, replacing the worst of two randomly selected individuals.The default generate does crossover 70% of the time, mutation 20% of the time, and a straight copy 10% of the time.  You can change these ratios in ChromeParams.Problem classIn order to do GP, you must have a fitness function, which defines the problem, and a set of primitive operations sufficient for solving it.  Any implementation of GPQUICK must have a subclass of Problem which includes the appropriate fitness function and primitive functions.  The important data member is:		// array of primitive functionsImportant methods include:	Function** funclist;		// install the primitives and initialize the evaluation environment	Problem();	void AddF(Function* f);         // add a Function to the funclist		// Pure virtual fitness function. 		// It should evaluate the chrome in an appropriate environment,		// and return a fitness value	virtual float fitness(Chrome* chrome);Most programs need to be evaluated in a particular context.  If you are trying to evolve a program that can navigate a maze, you need to define the maze.  Setting up the environment should be done in the Problem constructor and at the beginning of the fitness function.****************************************************************************The included problem...SSRProb - Simple Symbolic RegressionThe sample problem provided with GPQUICK performs symbolic regression, chosen for simplicity and speed.  In symbolic regression we attempt to evolve an arithmetic expression that matches a set of output variables, or an unknown function.  SSRProb creates an expression using the input variables John, Paul, George, Ringo and Elvis.  The correct expression is John*(George - Paul) + 2.5*Ringo.  Elvis is a noise variable.The function set for SSRProb includes the terminals NUMBER (for numeric constants), John, Paul, George, Ringo, and Elvis.  It also includes the arithmetic functions ADD, SUB, MUL, DIV and SINE.The fitness function for SSRProb is the sum squared error of the expression for 20 randomly chosen samples of the input variables.Running SSRProbAs GPQUICK runs, it periodically lists out the number of Chromes generated, the best fitness, and the best expression.  We can learn some important things about artificial evolution by watching this output.  At first, the search for a better expression wanders aimlessly, and total error hovers near 10^11.  Sometimes fitness does not improve, and the run fails.  At some point, QPQUICK may discover a variant of the expression (MUL John (SUB George Paul), and total error drops suddenly to 10^7.  This illustrates the phenomenon of punctuated equilibrium.  Change tends to be sudden, unpredictable, and surrounded by periods of stability.  Now GPQUICK is looking for 2.5 Ringos.  It eventually finds one Ringo, followed very quickly by a second Ringo.  This demonstrates how a genetic algorithm uses crossover to build a solution piece by piece.  The expression resembles (ADD (ADD Ringo Ringo) (MUL John (SUB George Paul))) and the total error is near 300,000.  The expression begins to grow bigger, adding various functions in search of that extra half Ringo.  It may eventually build an sizable expression that simplifies to exactly John*(George - Paul)+2.5*Ringo and quit, before it times out after generating 80,000 chromes.  This result shows  the tendency of GP to generate messy, obscure, but effective solutions.If you program your Pop to output population statistics, you will see that the variance in fitness (error) decreases just when the expressions start to grow in size.  The bigger expressions have a lot of redundancy, and they are likely to have a pretty good fitness after crossover.  I have dubbed this phenomenon "defense against crossover".   Chopping a piece out of one program and sticking it randomly into another program is a pretty radical operation which would almost surely kill a manually written program, but genetic programs learn tolerate and even thrive on it.  This is a key which explains why genetic programming works at all.  It evolves evolvability.You can change the difficulty of the symbolic regression problem by changing the formula.  Longer polynomials are harder match.  Changing the parameters in the ChromeParams constructor will allow you to change the size of the expressions, the mix of mutation and crossover, and "greediness" of the selection tournament.  By removing the comment on the SINE function, you can increase the size of the search in "program space" and noticeably reduce the probability of success.  This dependence on the function set is a weakness of GP.  It is also an opportunity, since by improving your function set you can make a difficult problem easier and get it solved.Using symbolic regression to discover a function that you already know is not useful, but you can modify SSRProb to learn prediction and classification from real world examples.  This is genetic induction, and it has several advantages when compared with competing statistical or neural techniques....*****************************************************************************This problem was included for its simplicity.  More Problems should be available later.

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?