📄 changes_v34.html
字号:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><title>ChangeLog - FASTA v34</title><style type="text/css">body { margin-left: 6px; }.sidebar { font-size: 12px; font-family: sans-serif; text-decoration:none; background-color: #FFFFCC; }.fasta { font-family: sans-serif; }fasta h3 { font-size: 14px; color: #000000 }.fasta td {background-color: #FFFFCC }.fasta a { text-decoration: none; }.fasta li { font-size: 12px; margin-left:-1em }</style><head><body><div class=fasta><h3>ChangeLog - FASTA v34</h3><pre>$Name: fa35_03_06 $ - $Id: changes_v34.html,v 1.2 2007/08/09 20:34:08 wrp Exp $</pre><h3>May 28, 2007</h3>Small modification for GCG ASCII (libtype=5) header line.<hr><h3>October 6, 2006 CVS fa34t26b3</h3>New Windows programs available using Intel C++ compiler. Firstthreaded programs for Windows; first SSE2 acceleration of SSEARCH forWindows.<h3>July 18, 2006 CVS fa34t26b2</h3>More powerful environment variable substitutions for FASTLIBS files.The library file name parsing programs now provide the option forenvironment variable substitions. For example, SLIB2=/slib2 as anenvironment variable (e.g. export SLIB2=/slib2 for ksh and bash), then<pre>fasta34 -q query.aa '${SLIB2}/swissprot.fa' expands as expected.</pre>While this is not important for command lines, where the Unix shellwould expand things anyway, it is very helpful for variousconfiguration files, such as files of file names, where:<pre><${SLIB2}/blastswissprot.fa</pre>now expands properly, and in FASTLIBS files the line:<pre>NCBI/Blast Swissprot$0S${SLIB2}/blast/swissprot.fa</pre>expands properly. Currently, Environment variable expansion onlytakes place for library file names, and the <directory in a file offile names.<h3>July 2, 2006 fa34t26b0</h3>This release provides an extremely efficient SSE2 implementation ofthe Smith-Waterman algorithm for the SSE2 vector instructions writtenby Michael Farrar (farrar.michael@gmail.com). The SSE code speeds upSmith-Waterman 8 - 10-fold in my tests, making it comparable to EricLindahl's Altivec code for the Apple/IBM G4/G5 architecture.<h3>May 24, 2006 fa34t25d8</h3>In addition, support for ASN.1 PSSM:2 files provided by the NCBIPSI-BLAST WWW site is included. This code will not work withiteration 0 PSSM's (which have no PSSM information). For ASN.1PSSM's, which provide the matrix name (and in some cases the gappenalties), the scoring matrix and gap penalties are set appropriatelyif they were not specified on the command line. ASN.1 PSSM's are type 2:<pre>ssearch34 -P "pssm.asn1 2" .....</pre><h3>May 18, 2006</h3>Support for NCBI Blast formatdb databases has been expanded. TheFASTA programs can now read some NCBI *.pal and *.nal files, which areused to specify subsets of databases. Specifically, theswissprot.00.pal and pdbaa.00.pal files are supported. FASTA supportsfiles that refer to *.msk files (i.e. swissprot.00.pal refers toswissprot.00.msk); it does not currently support .pal files thatsimply list other .pal or database files (e.g. FASTA does not supportnr.pal or swissprot.pal).<hr><h3>Nov 20, 2005</h3>Changes to support asymmetric matrices - a scoring matrix read in froma file can be asymmetric. Default matrices are all symmetric.<h3>Sept 2, 2005</h3>The prss34 program has been modified to use the same display routinesas the other search programs. To be more consistent with the otherprograms, the old "-w shuffle-window-size" is now "-v window-size".<tt><b>prss34/prfx34</b></tt> will also show the optimal alignment for which thesignificance is calculated by using the "-A" option. Since the new program reports results exactly like other<tt><b>fasta/ssearch/fastxy34</b></tt> programs, parsing for statistical significanceis considerably different. The old format program can be make using"make prss34o".<h3>May 5, 2005 CVS fa34t25d1</h3>Modification to the -x option, so that both an "X:X" match score andan "X:not-X" mismatch score can be specified. (This score is also usedgive a positive score to a "*:*" match - the end of a reading frame,while giving a negative score to "*:not-*".<h3>Jan 24, 2005</h3>Include a new program, "print_pssm", which reads a blastpgp binarycheckpoint file and writes out the frequency values as text. Thesevalues can be used with a new option with ssearch34(_t) and prss34,which provides the ability to read a text PSSM file. To specify atext PSSM, use the option -P "query.ckpt 1" where the "1" indicates atext, rather than a binary checkpoint file. "initfa.c" has also beenmodified to work with PSSM files with zero's in the in the frequencytable. Presumably these positions (at the ends) do not provideinformation. (Jan 26, 2005) blastpgp actually uses BLOSUM62 valueswhen zero frequencies are provided, so read_pssm() has been modifiedto use scoring matrix values for zero frequencies as well.<hr><h3>Nov 4-8, 2004</h3>Incorporation of Erik Lindahl "anti-diagonal" Altivec code forSmith-Waterman, only. Altivec SSEARCH is now faster than FASTA for<h3>Aug 25,26, 2004 CVS fa34t24b3</h3>Small change in output format for <tt>p34comp*</tt> programs in">>>query_file#1 string" line before alignments. This line is not presentin the non-parallel versions - it would be better for them to be consistent.<hr><h3>Dec 10, 2003 CVS fa34t23b3</h3>Cause default ktup to drop for short sequences. For protein < 50, <i>ktup=1</i>;for DNA < 20, 50, 100 <i>ktup</i> = 1, 2, 3, respectively.<h3>Dec 7, 2003</h3>A new option, "-U" is available for RNA sequence comparison. "-U"functions like "-n", indicating that the query is an RNA sequence. Inaddition, to account for "G:U" base pairs, "-U" modifies the scoringmatrices so that a "G:A" match has the same score as a "G:G" match,and "T:C" match has the same score as a "T:T" match.<h3>Nov 2, 2003</h3>Support for more sophisticated display options. Previously, one couldhave only on "-m #" option, even though several of the options wereorthogonal (-m 9c is independent of -m 1 and -m2, which is independentof -m 6 (HTML)). In particular -m 9c can be combined with -m 6, whichcan be very helpful for runs that need HTML output but can alsoexploit the encoding provided by -m 9c.The "-m 9" option now also allows "-m 9i", which shows the standardbest score information, plus percent identity and alignment length.<h3>Sept 25, 2003</h3>A new option is available for annotating alignments. -V '@#?!'can be used to annotate sites in a sequence, e.g:<pre>>GTM1_HUMAN ...PMILGYWDIRGLAHAIRLLLEYTDS<b>@</b>S<b>?</b>YEEKKYT@MGDAPDYDRS@QWLNEKFKLGLDFPNLPYLIDGAHKIT</pre>might mark known and expected (S,T) phosphorylation sites. Thesesymbols are then displayed on the query coordinate line:<pre> 10 20 <b>@?</b> 30 @ 40 @ 50 60GTM1_H PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::gtm1_h PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP 10 20 30 40 50 60</pre>This annotation is mostly designed to display post-translationalmodifications detected by MassSpec with FASTS, but is also availablewith FASTA and SSEARCH.<h3>June 16, 2003 version: fasta34t22</h3>ssearch34 now supports PSI-BLAST PSSM/profiles. Currently, it onlysupports the "checkpoint" file produced by blastall, and only oncertain architectures where byte-reordering is unnecessary. It has notbeen tested extensively with the -S option.<pre>ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa library</pre>Will use the frequency information in the blast.chkpt file to do aposition specific scoring matrix (PSSM) search using theSmith-Waterman algorithm. Because ssearch34 calculates scores foreach of the sequences in the database, we anticipate that PSSMssearch34 statistics will be more reliable than PSI-Blast statistics.The Blast checkpoint file is mostly double precision frequencynumbers, which are represented in a machine specific way. Thus, youmust generate the checkpoint file on the same machine that you runssearch34 or prss34 -P query.ckpt. To generate a checkpoint file,run:<pre>blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null</pre>(This searches swissprot for 2 iterations ("-j 2" using a E()threshold 1e-6 saving the resulting position specific frequencies inquery.ckpt. Note that the original query.fa and query.ckpt mustmatch.)<h3>Apr 11, 2003 CVS fa34t21b3</h3>Fixes for "-E" and "-F" with ssearch34, which was inadvertantly disabled.<p>A new option, "-t t", is available to specify that all the proteinsequences have implicit termination codons "*" at the end. Thus, allprotein sequences are one residue longer, and full length matches areextended one extra residue and get a higher score. Forfastx34/tfastx34, this helps extend alignments to the very end incases where there may be a mismatch at the C-terminal residues.</p><p><tt>-m 9c</tt> has also been modified to indicate locations of terminationcodons ( *1).</p><h3>Mar 17, 2003 CVS fa34t21b2</h3>A new option on scoring matrices "-MS" (e.g. "BL50-MS") can be used toturn the I/L, K/Q identities on or off. Thus, to make "fastm34" usethe isobaric identities, use "-s M20-MS". To turn them off for "fasts34",use "-s M20".<h3>Jan 25, 2003</h3>Add option "-J start:stop" to pv34comp*/mp34comp*. "-J x" used toallow one to start at query sequence "x"; now both start and stop canbe specified.<hr><h3>Nov 14-22, 2002 CVS fa34t20b6</h3>Include compile-time define (-DPGM_DOC) that causes all the fastaprograms to provide the same command line echo that is provided by thePVM and MPI parallel programs. Thus, if you run the program:<pre>fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12</pre>the first lines of output from FASTA will be:<pre># fasta34_t -q gtt1_drome.aa /slib/swissprot FASTA searches a protein or DNA sequence data bank version 3.4t20 Nov 10, 2002 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448</pre>This has been turned on by default in most FASTA Makefiles. <h3>Aug 27, 2002</h3>Modifications to mshowbest.c and drop*.c (and p2_workcomp.c,compacc.c, doinit.c, etc.) to provide more information about thealignment with the -m 9 option. There is now a "-m 9c" option, whichdisplays an encoded alignment after the -m 9 alignment information.The encoding is a string of the form: "=#mat+#ins=#mat-#del=#mat".Thus, an alignment over 218 amino acids with no gaps (not necessarily100% identical) would be =218. The alignment:<pre> 10 20 30 40 50 60 70 GT8.7 NVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKL--GLDFPNLPYL-IDGSHKITQ :.:: . :: :: . .::: : .: ::.: .: : ..:.. ::: :..:XURTG NARGRMECIRWLLAAAGVEFDEK---------FIQSPEDLEKLKKDGNLMFDQVPMVEIDG-MKLAQ 20 30 40 50 60 </pre>would be encoded: "=23+9=13-2=10-1=3+1=5". The alignment encoding iswith respect to the beginning of the alignment, not the beginning ofeither sequence. The beginning of the alignment in either sequence isgiven by the an0/an1 values. This capability is particularly usefulfor [t]fast[xy], where it can be used to indicate frameshift positions"/#\#" compactly. If "-m 9c" is used, the "The best scores" titleline includes "aln_code".<h3>Aug 14, 2002 CVS tag fa34t20</h3>Changes to nmgetlib.c to allow multiple query searches coming fromSTDIN, either through pipes or input redirection. Thus, the command<pre>cat prot_test.lseg | fasta34 -q -S @ /seqlib/swissprot</pre>produces 11 searches. If you use the multiple query functions, thequery subset applies only to the first sequence.Unfortunately, it is not possible to search against a STDIN library,because the FASTA programs do not keep the entire library in memoryand need to be able to re-read high-scoring library sequences. Sinceit is not possible to fseek() against STDIN, searching against a STDINlibrary is not possible.<h3>Aug 5, 2002</h3><tt><b>fasts34(_t)</b></tt> and <tt><b>fastm34(_t)</b></tt> have been modified to allow searches withDNA sequences. This gives a new capability to search for DNA motifs,or to search for ordered or unordered DNA sequences spaced atarbitrary distances.<h3>June 25, 2002</h3>Modify the statistical estimation strategy to sample all the sequencesin the database, not just the first 60,000. The histogram is stillbased only on the first 60,000 scores and lengths, though all scoresan lengths are shown. The fit to the data may be better than thehistogram indicates, but it should not be worse.<h3>June 19, 2002</h3>Added "-C #" option, where 6 <= # <= MAX_UID (20), to specify thelength of the sequence name display on the alignment labels. Untilnow, only 6 characters were ever displayed. Now, up to MAX_UIDcharacters are available.<h3>Mar 16, 2002</h3>Added create_seq_demo.sql, nt_to_sql.pl to show how to build an SQLprotein sequence database that can be used with with the mySQLversions of the fasta34 programs. Once the mySQL seq_demo databasehas been installed, it can be searched using the command:<pre>fasta34 -q mgstm1.aa "seq_demo.sql 16"</pre>mysql_lib.c has been modified to remove the restriction that mySQLprotein sequence unique identifiers be integers. This allows theprogram to be used with the PIRPSD database. The RANLIB() functioncall has been changed to include "libstr", to support SQL text keys.Due to the size of libstr[], unique ID's must be < MAX_UID (20)characters.<p>A "pirpsd.sql" file is available for searching the mySQL distributionof the PIRPSD database. PIRPSD is available fromftp://nbrfa.georgetown.edu/pir_databases/psd/mysql.</div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -