analyze_samples.html
来自「有趣的模拟进化的程序 由国外一生物学家开发 十分有趣」· HTML 代码 · 共 320 行
HTML
320 行
<html><title>Sample Programs from Analyze Mode</title><body bgcolor="#FFFFFF" text="#000000" link="#0000AA" alink="#0000FF" vlink="#000044"><h2 align=center>Sample Programs from Analyze Mode</h2><p>This document gives some example analyze programs and explains how theyfunction.<h3>Testing a genome sequence</h3><p>The following program will load in a genome sequence, run it through a testCPU, and output the information about it in a couple of formats.<pre> VERBOSE LOAD_SEQUENCE rmzavcgmciqqptqpqcpctletncogcbeamqdtqcptipqfpgqxutycuastttva RECALCULATE DETAIL detail_test.dat fitness merit gest_time length viable sequence TRACE PRINT</pre><p>This program starts off with the "VERBOSE" command so that avida will printto the screen all of the details about what is going on as it runs theanalyze script; I recommend you begin all of your programs this way fordebugging purposes. The program then uses the LOAD_SEQUENCE command to allowthe user to enter a specific genome sequence in its compressed format. Thiswill translate the genome into the proper genotype as long as you are usingthe correct instruction set file, since that file determines the mappings ofletters to instructions).<p>The RECALCULATE command places the genome sequence into a test CPU, anddetermines its fitness, merit, gestation time, etc. so that the DETAILcommand that follows it can have access to all of this information as itprints it to the file "detail_test.dat" (its first argument). The TRACEand PRINT commands will then print individual files about this genome, thefirst tracing its execution line-by-line, and the second summarizing all sorts of statistics about it and displaying the genome.Since no directory was specified for these commands, "<tt>genebank/</tt>" isassumed, and the filenames are "<tt>org-S1.trace</tt>" and"<tt>org-S1.gen</tt>". If a genotype has a name when it is loaded, thatname will be kept, but if it doesn't, it will be assigned a name starting atorg-S1, then org-S2, and so on counting higher. The TRACE and PRINT commandsadd their own suffixes to the genome's name to determine the filename theywill be printed as.<h3>Using Variables</h3><p>Often, you will want to run the same section of analyze code with multiple different "inputs" each time through, or else you might simply want a singlevalue to be easy to change throughout the code. To facilitate suchprogramming practices, variables are available in analyze mode that can bealtered for each repitition through the code.<p>There are actually several types of variables, all of which are a singleletter of number. For a command that requires a variable name as an input,you simply put that variable where it is requested. For example, if youwere going to set the variable i to be equal to the number 12, you wouldtype:<pre> SET i 12</pre><p>But later on in the code, how does avida know when you type an i if youreally want the letter 'i' there, or if you prefer the number 12 to be there?To distinguish these cases, you must put a dollar sign ("$") before a variablewherever you want it to be translated to its value instead of just usingthe variable name itself.<p>There are a few different commands that allow you to manipulate a variable'svalue, and sometimes execute a section of code multiple times based off of each of the possible values. Here is one example:<pre> FORRANGE i 100 199 SET d /home/charles/dev/avida/runs/evo-neut/evo_neut_$i PURGE_BATCH LOAD_DETAIL_DUMP $d/detail_pop.100000 RECALCULATE DETAIL $d/detail.dat update length fitness sequence END</pre><p>The FORRANGE command runs the contents of the loop once for each possiblevalue in the range, setting the variable i to each of these values in turn.Thus the first time through the loop, 'i' will be equal to the value '100',then '101', '102', all the way up to '199'. In this particular case, we have100 runs (numbered 100 through 199) that we want to work with.<p>The first thing we do once we're inside the loop is set the value of thevariable 'd' to be the name of the directory we're going to be workingwith. Since this is a long directory name, we don't want to have to typeit over every time we need it. If we set it to the variable d, then all weneed to do is type '$d' in the future, and it will be translated to thefull name. Note that in this case we are setting a variable to a stringinstead of a number; that's just fine and avida will figure out how tohandle it properly. This directory we are working with will change eachtime through the loop, and that it is no problem to use one variable as partof setting another.<p>After we know what directory we are using, we run a PURGE_BATCH to get rid ofall of the genotypes from the last time throughthe loop (lest we just keep building up more and more genotypes in thecurrent batch)and then we refill the batch by using LOAD_DETAIL_DUMP to load in all of thegenotypes saved in the file "<tt>detail_pop.100000</tt>" within our chosendirectory. The RECALCULATE command runs all of the genotypes through atest CPU so we have all the statistics we need, and finally DETAIL will printout the stats we want to the file "<tt>detail.dat</tt>", againplacing it in the proper directory. The END command signifies the end of theFORRANGE loop.<h3>Finding Lineages</h3><p>Quite often, the portion of an avida run that we will be most interestedin is the lineage from the final dominant genotype back to the originalancestor. As such, there are tools in avida to get at this information.<pre> FORRANGE i 100 199 SET d /home/charles/dev/avida/runs/evo-neut/evo_neut_$i PURGE_BATCH LOAD_DETAIL_DUMP $d/detail_pop.100000 LOAD_DETAIL_DUMP $d/historic_dump.100000 FIND_LINEAGE num_cpus RECALCULATE DETAIL lineage.$i.html depth parent_dist length fitness html.sequence END</pre><p>This program looks very similar to the last one. The first four linesare actually identical, but after loading the detail dump at update 100,000,we also want to load the historic dump from the same time point. Adetail file contains all of the genotypes that were currently alive in thepopulation at the time it was printed, while the historic files contain allof the genotypes that are direct ancestors of those that were still alive.The combination of these two files gives us the lineages of the entirepopulation back to the original ancestor. Since we are only interested ina single lineage, the next thing we do is run the FIND_LINEAGE command topick out a single genotype, and discard everything else except for itslineage. In this case, we pick the genotype with the highest abundance (the most virtual CPUs associated with it) at the time of printing.<p>As before, the RECALCULATE command gets us any additional information wemay need about the genotypes, and then we print that information to a fileusing the DETAIL command. The filenames that we are using this timehave the format "<tt>lineage.$i.html</tt>", so they are all being writtento the current directory with filenames that incorporate the run numberright in them. Also, because the filename ends in the suffix '.html', Avidaknows to print the file in a proper html format. Note that the specificvalues that we choose to print take advantage of the fact that we have alineage (and hence measured things like the genetic distance to the parent)and are in html mode (and thus can print the sequence using colors to specifywhere exactly mutations occurred).<h3>Working with Batches</h3><p>In analyze mode, we can load genotypes into multiple batches and we thenoperate on a single batch at a time. So, for example, if we wantedto only consider the dominant genotypes at time points 100 updates apart, butall we had to work with were the detail files (containing <i>all</i>genotypes at each time point) we might write a program like:<pre> SET d /home/charles/avida/runs/mydir/here-it-is SET_BATCH 0 FORRANGE u 100 100000 100 # Cycle through updates PURGE_BATCH # Purge current batch (0) LOAD_DETAIL_DUMP $d/detail_pop.$u # Load in the population at this update FIND_GENOTYPE num_cpus # Remove all but most abundant genotype DUPLICATE 0 1 # Duplicate batch 0 into batch 1 END SET_BATCH 1 # Switch to batch 1 RECALCULATE # Recalculate statistics... DETAIL dom.dat fitness sequence # Print info for all dominants!</pre><p>This program is slightly more complicated than the others, so I added incomments directly inside it. Basically, what we do here is use batch 0as our staging area where we load the full detail dumps into, strip themdown to only the single most abundant genotype, and then copy that genotypeover into batch one. By the time we're done, we have all of the dominantgenotypes inside of batch one, so we can print anything we need right fromthere.<h3>Building your own Commands</h3><p>One really useful feature that I have added to the analyze mode is theability for the user to construct a variety of their own commands withoutmodifying the source code. This is done with the FUNCTION command.For example, if you know you will always need a file called"<tt>lineage.html</tt>" with very specific information in it, you might writea helper command for yourself as follows:<pre> FUNCTION MY_HTML_LINEAGE # arg1=run_directory PURGE_BATCH LOAD_DETAIL_DUMP $1/detail_pop.100000 LOAD_DETAIL_DUMP $1/historic_dump.100000 FIND_LINEAGE num_cpus RECALCULATE DETAIL $1/lineage.html depth parent_dist length fitness html.sequence END</pre><p>This works identically to how we found lineages and printed their data inthe section above. Only this time, it has created the new command called"MY_HTML_LINEAGE" that you can use anytime thereafter. Arguments to functionswork similar to variables, but they are numbers instead of letters. Thus$1 translates to the first arguments, $2 becomes the second, and so on. Youare limited to 9 arguments at this point, but that should be enough formost tasks. $0 is the name of the function you are running, in case youever need to use that.<p>You may be interested in also using functions in conjunction with theSYSTEM command. Anything you type as arguments to this command gets run onthe command line, so you can make functions to do anything that couldotherwise be done were you at the shell prompt. For example, imagine thatyou were going to use a lot of compressed files in your analysis that youwould first need to uncompress. You might right a function like:<pre> FUNCTION UNZIP # Arg1=filename SYSTEM gunzip $1 END</pre><p>This is a shorter example than you might typically want to write a functionfor, but it does get the point across. This would allow you to just type"UNZIP <filename>" whenever you needed to uncompress something.<p>Functions are particularly useful in conjunction with the INCLUDE command.You can create a file called something like "<tt>my_functions.cfg</tt>" inyour avida work directory, define a bunch of functions there, and then startall of your <tt>analyze.cfg</tt> files with the line:<pre> INCLUDE my_functions.cfg</pre><p>and you will have access to all of your functions thereafter. Ideally, asthis language becomes more flexible, so will your ability tocreate functions within the language, so you will be able to developflexible and useful libraries for yourself.<h3>Try it Out...</h3><p>Here are a couple of example problems you can try to see how well you canuse analyze mode. These should get you used to working with it for futureprojects.<p><b>Problem 1</b>. A detail file in avida contains one line associated with each genotype, in order from the most abundant to the least. Currently, the LOAD_DETAIL_DUMP command will load the entire file's worth of genotypes into the current batch, but what if you only wanted the top few? You should write a function called "LOAD_DETAIL_TOP" that takes two arguments. The first ($1) is the name file that needs to be loaded in (just as in the original command), and the second is the number of lines you want to load.<p> The easiest way to go about doing this is by using the SYSTEM command along with the Unix command "<tt>head</tt>" which will output the very top of a file. If you typed the line:<pre> head -42 detail_pop.1000 > my_temp_file</pre><p> The file "<tt>my_temp_file</tt>" would be created, and its contents would be the first 42 lines of <tt>detail_pop.1000</tt>. So, what you need this function to do is create a temporary file with proper number of lines from the detail file in it, load that temp file into the current batch, and then delete the file (using the <tt>rm</tt> command). <i>Warning</i>: be very careful with the automated deletions -- you don't want to accidentally remove something that you really need! I recommend that you use the command "rm -i" until you finish debugging. This problem may end up being a little tricky for you, but you should be able to work your way through it.<p><b>Problem 2</b>. Now that you have a working LOAD_DETAIL_TOP command, you can run "LOAD_DETAIL_TOP <filename> 1" in order to only load the most dominant genotype from the detail file. Rewrite the example program from the section "Working with Batches" above such that you now only need to work within a single batch.
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?