⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 fastdnaml_scripts.txt

📁 fastDNAml is an attempt to solve the same problem as DNAML, but to do so faster and using less memo
💻 TXT
📖 第 1 页 / 共 2 页
字号:
                        Shell Scripts for use with                          fastDNAml and DNArates                                SUMMARYUNIX shell scripts have proven quite useful in running the fastDNAml and/orDNArates programs.  They have been used in two different contexts.  First,many of the program options can be invoked by simple editing of the input.The second category are scripts that help run and maintain results of theprogram.bootstrap           add B (bootstrap) option (and optional seed) to inputcategories          add C (rate categories) option and values to inputcategories_file     add Y (categories file) option to input (DNArates)clean_checkpoints   remove checkpoint files when there is a finished treefileclean_jumbles       remove all but one optimal jumble for a given resultfastDNAml_boot      loop over bootstrap seeds, doing 1 or more jumbles eachfastDNAml_loop      do jumbles, stopping when same best tree found n timesfrequencies         add F option and user-defined frequencies to inputglobal              add G (global) option (and optional region size) to inputjumble              add J (jumble) option (and optional seed) to inputmin_info            add M (minimum information) option and value to inputn_categories        add C (categories) option (without rate values) to inputout.PID             append process ID to output file name of a programoutgroup            add O (outgroup) option and number to inputprintdata           add 1 (print data) option to inputquickadd            add Q (quickadd) option to inputrestart             add R (restart) option and checkpoint tree to inputscores              summarize and sort likelihoods from jumble output filestransition          add T (transition/transversion) option and value to inputtreefile            add Y (treefile) option to inputtrees2NEXUS         combine trees and add a NEXUS wrapper for PAUP and MacCladetrees2prolog        convert Newick format trees to prolog factsuserlengths         add L (userlengths) option to inputusertree            add U (usertree) option, tree count, and tree(s) to inputusertrees           add U (usertree) option, tree count, and tree(s) to inputweights             add W (userweight) option and values to the inputweights_categories  add W and C options and values to the input                SCRIPTS THAT INVOKE DNAML OPTIONSGENERAL COMMENTS:The program fastDNAml takes data from standard input.  Thus, to run theprogram with data in the file called "infile", the command would be  fastDNAml <infileIn this case, the output goes to standard output (generally the user'sterminal).  To put in into a file, one can use output redirection as in  fastDNAml <infile >outfileBecause of the use of standard input, the input to fastDNAml can bypreprocessed by a function, and then piped to the program.  For example,  bootstrap <infile | fastDNAml >outfile       or  bootstrap 137 <infile | fastDNAml >outfilecan be used to add the bootstrap option and a random number seed to the input,and then pass it on to fastDNAml for analysis.Many of the fastDNAml options are amenable to this arrangement.  Ineach case, the preprocessing can simply add options (and auxiliary datalines, as necessary) to the input.  In addition to avoiding the need to play with UNIX text editors, there are several advantages to this approach:   1. The files remain relatively compatible with PHYLIP DNAML.   2. It reduces the chance of introducing errors into the data.   3. It is easier to try alternative options on the same data.   4. If the data for each sequence are provided in one long line (so that      interleaved and non-interleaved formats are the same), then some text      editors will truncate the lines.Shell scripts are available for each of the above program options.  Thecorresponding formats and effects are described below.THE SCRIPTS:BOOTSTRAP (B)  Format:   bootstrap [random_seed]  Example:  bootstrap <infile.phylip | fastDNAml >outfile  Example:  bootstrap 137 <infile.phylip | fastDNAml >outfileAdds a bootstrap option and a random number seed to the input.  If the randomseed is not supplied, then the process ID of the bootstrap shell is used.  Thus,repeated executions of the first example will tend to generate different randomsamples (note that many systems only use about 32000 process IDs, so once youget above 100 repetitions, reuse of the same number may become a significantconcern).CATEGORIES (C)  Format:  categories categories_data_file  Example: categories archae.rates <archaea.phylip | fastDNAml >archaea.outAdds the categories option and the corresponding data to the input.  The datamust have the format specified for PHYLIP dnaml 3.3.  The first line must bethe letter C, followed by the number of categories (a number in the range 1through 35), and then a blank-separated list of the rates for each category.(The list can take more than one line; the program reads until it finds thespecified number of rate values.)  The next line should be the word Categoriesfollowed by one rate category character per sequence position.  The categories1 - 35 are represented by the series 1, 2, 3, ..., 8, 9, A, B, C, ..., Y, Z.These latter data can be on one or more lines.  For example,  C  12  0.0625  0.125  0.25  0.5  1  2  4  8  16  32  64  128  Categories  5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9              633792246624457364222574877188898132984963499AA9899975or, with more categories,  C 35   0.16529   0.29525   0.34482   0.40272   0.47035   0.54933   0.64157         0.74930   0.87512   1.02207   1.19369   1.39413   1.62823   1.90164         2.22096   2.59389   3.02945   3.53815   4.13227   4.82615   5.63654         6.58301   7.68841   8.97943  10.48723  12.24822  14.30490  16.70694        19.51232  22.78878  26.61541  31.08459  36.30423  42.40033 256.00000  Categories  4HHZ282111 21ED48H1HD Z1CD171411 1118F111EI IHI8ELBZZZ ZZZZZZZZZZ              ZZZZZZZZZZ 1MJZZMJLKL ZKL1ZZZZZZ ZZZZZZZZZZ ZZZZZZZZGH HHIGG43FOZ              Z2B9111324 1ZZZ171Z11 1184GH11ZZ IB1BBZ111J IB1ILKF4L1 21AEDE8111              111111ED9K 2219L3HGJ1 1Z1ZZMONMH ZZOMSQLM8Z 11411(Notice that spaces are permitted in the categories data, and that the valuescan extend across multiple lines.  However, this means that extra values arenot permitted.)In order to generate output compatible with PHYLIP dnaml v3.3, this should bethe first option added (so that the categories data are inserted immediatelybefore the sequence data).CATEGORIES_FILE (Y)  Format:  categories_fileAdds the Y option to the input data for the DNArates program.  Makes the program write a file of weights and categories that can be directly added to the inputfor the fastDNAml program (see weights_categories script).  Example: categories_file <archaea.phylip | n_categories 17 | \           usertree archaea.tree | DNAratesThis command line will find the site-specific rates for the sequence data inarchaea.phylip and the tree in archaea.tree, categorize them into 17 groups,and write the resulting categories (and a weighting mask removing sites ofundetermined rates) into a file called weight_rate.PID, where PID is a number(the ID of the process running DNArates).FREQUENCIES (F)  Format:  frequencies  Example: frequencies <archaea.phylip | fastDNAmlAdds empirical frequencies option (F) to the input stream.GLOBAL (G)  Format:  global [final_tree_rearrangements [partial_tree_rearrangements]]  Example: global <archaea.phylip | fastDNAml  Example: global 4 <archaea.phylip | fastDNAml  Example: global 0 0 <archaea.phylip | fastDNAmlAdds a global option to the input.  If a rearrangement distance isspecified, then this value is added as part of the option auxiliaryinformation.  In this latter case, it is essential that the input contain(or the global command be preceded by) a B, J or T option.  Example: global <archaea.phylip | fastDNAml  Example: transition 2.0 <archaea.phylip | global 4 | fastDNAml  Example: transition 2.0 <archaea.phylip | global 0 0 | fastDNAmlThe first example invokes the global rearrangement option for the completedtree, exactly as with DNAML in the PHYLIP package.  The second example performs"regional" rearrangements of the completed tree, such that subtrees are movedacross as many as tree branch points before being reconnected.  The finalexample does not perform any tree rearrangements whatsoever, it just builds atree by sequentially adding each of the sequences.  Notice that in the lasttwo examples, the T 2.0 option is added to the input BEFORE the global option.JUMBLE (J)  Format:  jumble [random_seed]  Example: jumble <archaea.phylip | fastDNAml  Example: jumble 137 <archaea.phylip | fastDNAmlAdds a jumble option and a random number seed to the input.  If the randomseed is not supplied, then the process id of the jumble shell is used.  Thus,in the first example, the seed used is the process ID of the jumble shellscript.  Repeated executions of the command line will tend to get differentrandom number seeds, and hence different addition orders.  In the secondexample, the seed is 137.MIN_INFO (M)  Format:  min_info  minimum_unambiguous_residues  Example: categories_file <archaea.phylip | n_categories 17 | \           usertree archaea.tree | min_info 8 | DNAratesAdds the minimum information option and an auxiliary data line to the inputfor the DNArates program.  This changes the threshold value of unambiguousresidues that must be present in a column before the program will consider therate to be "defined".  The default value is currently 4.N_CATEGORIES  Format:  n_categories  number  Example: categories_file <archaea.phylip | n_categories 17 | \           usertree archaea.tree | min_info 8 | DNAratesAdds the C options and an auxiliary data line of the form "C number" to theinput.  When the DNArates program categorizes the rates that it infers forthe sites in the alignment, this is the number of categories it will use.OUTGROUP (O)  Format:  outgroup  outgroup_number  Example: outgroup 5 <archaea.phylip | treefile | fastDNAml > archaea.outAdds the outgroup option and appropriate auxiliary data line to the input.  Theexample will infer a tree for the archaea data, root it on sequence 5, andwrite a tree to treefile.PID, where PID is a number (the process ID offastDNAml).  The textual output from fastDNAml (a description of the analysis)is written to archaea.out.PRINTDATA (1)  Format:  printdata  Example: printdata <archaea.phylip | fastDNAml > archaea.outAdds a printdata option to the input.  In the example, the file archaea.outwill include an echoing of the data in addition to the usual output.QUICKADD (Q)  Format:  quickadd  Example: quickadd <archaea.phylip | fastDNAml > archaea.outAdds a quickadd option to the input.  This greatly decreases the time in

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -