📄 readme.v34t0

📁 序列对齐 Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequenc
💻 V34T0
📖 第 1 页 / 共 5 页
字号:
(This searches swissprot for 2 iterations ("-j 2" using a E()threshold 1e-6 saving the resulting position specific frequencies inquery.ckpt.  Note that the original query.fa and query.ckpt mustmatch.)>>June 5, 2003Fix to mshowbest.c to get -m 9 coordinates correct on reverse strandwith pv34comp*.  Some additional fixes for prfx34.>>May 22, 2003Changes to llgetaa.c, getseq.c, comp_lib.c to provide a differentlibrary residue lookup table (sascii) for queries and libraries.  Thisallows one to make a prfx34 (like prss34, but using the fastxalgorithm).  prfx34 is now available.>>May 13,14 2003Fixes to most of the drop*.c files, and mshowbest.c, to ensure thatcoordinates displayed with -m 9(c) and the final alignment areconsistent.  They were consistent for fasta34/ssearch34/fasts34, butnot for fastx34/fasty34.  The alignment coordinate system has beenbeen revised for consistency in allthe drop*.c programs (coordinatesused to be off-by-one for some, but not other functions).Fixes to -m 9c for fasty34/pv34compfy.  In addition, a problem wasfixed with fastx34/fasty34 that appeared with a protein sequence wasconsiderably longer than the DNA query, e.g. an EST vs titin (26Kresidues).  This problem only appeared on pv34compfx/fy on Xserve'sunder OS_X; but it should improve fastx34/fasty34 performance withvery long protein sequences on all platforms.>>May 7,8 2003Changes to p2_workcomp.c, compacc.c, and p_mw.h to fix persistentbugs in the -m 9c display.  Previous pv34comp* programs would notreturn the correct coded alignment if more than 100 alignments camefrom the same node, or if an encoding was longer than 127 chars.Also, fixes to p2_complib.c, comp_lib.c, to allow long query sequencesto be segmented.  Previously, only the first 20,000 residues wereused.  The segmented queries are not overlapped; segmented librarysequences are.>>May 5, 2003Changes to last_tat.c, scaleswt.c to ensure that all fasts alignmentsthat are likely to have significant scores are displayed.  In previousimplementations, if the query had more than 10 fragments, only the 100best scores were shown.  Now, we rescore up to 2500 alignments.  Thenew approach allows large mixtures to be used for searches, where someof the fragments from the mixture match too many proteins(e.g. actins).  Some differences between the fasts34 and pv34compfsimplementations have been fixed.  The two programs typically will notgive exactly the same results, because of small differences in thesampling procedures, but the results are essentially equivalent.>>Apr 11, 2003  CVS fa34t21b3Fixes for "-E" and "-F" with ssearch34, which was inadvertantly disabled.A new option, "-t t", is available to specify that all the proteinsequences have implicit termination codons "*" at the end.  Thus, allprotein sequences are one residue longer, and full length matches areextended one extra residue and get a higher score.  Forfastx34/tfastx34, this helps extend alignments to the very end incases where there may be a mismatch at the C-terminal residues.-m 9c has also been modified to indicate locations of terminationcodons ( *1).>>Mar 17, 2003  CVS fa34t21b2A new option on scoring matrices "-MS" (e.g. "BL50-MS") can be used toturn the I/L, K/Q identities on or off.  Thus, to make "fastm34" usethe isobaric identities, use "-s M20-MS".  To turn them off for "fasts34",use "-s M20".More fixes for correct alignment coordinates.  There was a conflict between-m 9 and -m 9c and subsequent alignment displays.>>Mar 13, 2003	Various fixes to produce correct fastm34 alignments.  Changes to allfunctions to correct potential problem with -m 9 alignment coordinateswhen both -m 9 and actual alignments are shown.>>Feb 25,27, 2003Modifications to re-activate showsum.c, which included corrections tothe showbest() call in p2_complib.c.>>Feb 13, 2003	CVS fa34t21b1Modifications to dropfx.c to dramatically improve alignment speed forcases where the DNA sequence is considerably longer than the proteinsequence.  Previously, a 200 aa vs 5000 nt comparison would do a full200 x 5000 Smith-Waterman alignment; with this modification, no morethan a 200 x 1200 (2x3x200) alignment is done.  This optimization hasnot (yet) been applied to dropfz2.c (fasty/tfasty).>>Feb 11, 2003Small modifications to comp_lib.c, p2_complib.c, and nmgetlib.c topass openlib() a possibly old lmf_str.  This allows openlib() tore-use memory mapped files.  closelib() no longer releases memorymapped file buffers.  Under Linux, memory mapped file buffers were notreally released, so when comparing a set of sequences against nr, theprogram could not mmap() the database after several searches.  Thiswill also speed up memory mapped multiple sequence searches.>>Jan 28-31, 2003  CVS fa34t21b0Fix another bug (all of v34t20) involved with overlapping longsequences.  And another bug that occurred when using sampledstatistics, but appeared only on the SGI platform - thanks to DmitriMikhailov.  Several other issues have been addressed based on moreinstrumented runtime testing.Fix an old (all v34) bug that caused problems with -z 11-16 (shuffledsequence array was not allocated properly).  Fixed another bug with -z6/16 when using threaded (_t) searches in fasta34_t.Restructure statistical analysis functions (scaleswn.c, scaleswt.c) toreturn the "final" statistical estimation routine done in pst.zsflag_f.This allows the program to cope with searches against a single sequencecorrectly.Corrected an error for DNA sequences needing Altschul-Gish statistics.>>Jan 25, 2003Add option "-J start:stop" to pv34comp*/mp34comp*.  "-J x" used toallow one to start at query sequence "x"; now both start and stop canbe specified.>>Jan 14, 2003Changes to apam.c to provide an error message on stderr when a scoringmatrix cannot be found.Changes to dropfs2.c, initsw.c, initfa.c to provide -m9c informationfor fasts34 searches.  Modify the alignment algorithm to useprobabilistic scores properly.>>Dec 22, 2002Change to compacc.c (sortbeste()) to do a second sort on zscore whenseveral sequences have E() == 0.>>Nov 27, 2002Change FSEEK_T to fseek_t to keep Borland BCC5 happy. >>Nov 14-22, 2002  CVS fa34t20b6Include compile-time define (-DPGM_DOC) that causes all the fastaprograms to provide the same command line echo that is provided by thePVM and MPI parallel programs.  Thus, if you run the program:    fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12the first lines of output from FASTA will be:    # fasta34_t -q gtt1_drome.aa /slib/swissprot     FASTA searches a protein or DNA sequence data bank     version 3.4t20 Nov 10, 2002    Please cite:     W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448This has been turned on by default in most FASTA Makefiles.  Fix p2_complib.c so that qstats[] is always allocated before it is used.Fix serious bug in non-threaded comp_lib.c that caused some highscoring sequences to be missed by fasts34.  New tests are included intest.sh to detect this problem in the future.The shell sort algorithm in sortbeste(), sortbestz(), and sortbesto()has been modified to use an improved algorithm that will not goquadratic in pathological cases.nmgetlib.c and mmgetaa.c have been modified to remove "^A" in libstrwhen used with p2_complib.c.Fix problem with MAXSEG in tatstats.h with IBM/AIX.Changes to most Makefiles to use -DSAMP_STATS; fixes to p2_complib.cfor SAMP_STATS.>>Oct 22, Nov 3, Nov 9, 2002   CVS tag fa34t20b5Fix problem in comp_lib.c that caused the query sequence length to becounted twice.Fixed problem with prss34 (updated find_zp in showrss.c).Correct shuffling function in several places.Add jitter back to addhistz() - improves appearance with prss34.Changes to fix problems with aln_code using -m 9c.Fix to serious bug in scaleswt.c (fasts34, etc) that caused sorts onthe high scores to take much to long.  The program is now 10X faster,and scales well on PVM/MPI.Fix to llgetaa.c to work with new getseq() API with automatic alphabetrecognition.>>Oct 12, 2002 CVS tag fa34t20b4Several very obscure (and sometimes old) bugs that appeared in certainMPI environments have been fixed.  This occurred because the pst.sq[]array did not always have a '\0' at the end.  In addition,mshowalign.c/p2_workcomp.c sometimes failed to put the '\0' at the endof seqc0/seqc1.  Correct bug introduced in fa34t20b3 for fasts34(_t).>>Oct 9, 2002 CVS tag fa34t20b3Fix to apam.c build_xascii() to not zero-out qascii[0].  FixMakefile.pvm4.  Mix problem with -m 9c with compacc.c.>>Sept 28, 2002 Additional fixes to -m 9c in p2_complib.c/compacc.c/mshowbest.c.Remove restriction in fasts34(_t) to less than 30 peptides (though nomore than 30 peptides can be aligned currently).>>Sept 24, 2002Fix p2_workcomp.c so that e_scores are delivered correctly whenlast_calc flag is set, and -m 9c provides alignments when only onebest hit is present.Fix comp_lib.c to use different maxn and overlap for each differentquery sequence.  fasta34 and fasta34_t now have identical results whena long sequence is searched.Add '@C:101' support to memory mapped FASTA format files.Fix mshowalign.c so that coordinates returned by cal_coord() useloffset+l_off.>>Sept 14, 2002	CVS tag fa34t20b2Changes to p2_complib.c, compacc.c to fix statistics problems withpv34compfs on query sequences with more than 10 fragments.>>Aug 27, 2002Modifications to mshowbest.c and drop*.c (and p2_workcomp.c,compacc.c, doinit.c, etc.) to provide more information about thealignment with the -m 9 option.  There is now a "-m 9c" option, whichdisplays an encoded alignment after the -m 9 alignment information.The encoding is a string of the form: "=#mat+#ins=#mat-#del=#mat".Thus, an alignment over 218 amino acids with no gaps (not necessarily100% identical) would be =218.  The alignment:       10        20        30        40        50          60         70  GT8.7  NVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKL--GLDFPNLPYL-IDGSHKITQ       :.::  . :: ::  .   .:::         : .:    ::.:   .: : ..:.. :::  :..:XURTG  NARGRMECIRWLLAAAGVEFDEK---------FIQSPEDLEKLKKDGNLMFDQVPMVEIDG-MKLAQ               20        30                 40        50        60        would be encoded: "=23+9=13-2=10-1=3+1=5".  The alignment encoding iswith respect to the beginning of the alignment, not the beginning ofeither sequence.  The beginning of the alignment in either sequence isgiven by the an0/an1 values. This capability is particularly usefulfor [t]fast[xy], where it can be used to indicate frameshift positions"/#\#" compactly.  If "-m 9c" is used, the "The best scores" titleline includes "aln_code".>>Aug 14, 2002	CVS tag fa34t20Changes to nmgetlib.c to allow multiple query searches coming fromSTDIN, either through pipes or input redirection.  Thus, the command       cat prot_test.lseg | fasta34 -q -S @ /seqlib/swissprotproduces 11 searches.  If you use the multiple query functions, thequery subset applies only to the first sequence.Unfortunately, it is not possible to search against a STDIN library,because the FASTA programs do not keep the entire library in memoryand need to be able to re-read high-scoring library sequences.  Sinceit is not possible to fseek() against STDIN, searching against a STDINlibrary is not possible.>>Aug 5, 2002fasts34(_t) and fastm34(_t) have been modified to allow searches withDNA sequences.  This gives a new capability to search for DNA motifs,or to search for ordered or unordered DNA sequences spaced atarbitrary distances.>>Aug 4, 2002comp_lib.c has been modified to provide comp_mlib.c function.comp_mlib.c is no longer used.  comp_lib.c with the "mlib" functioncan now recognize protein or DNA sequences automatically, and readsfrom stdin can now detect DNA/protein sequence types automatically.Changes to compacc.c, getseq.c, doinit.c initfa.c, initsw.c, andnmgetlib.c to support automatic sequence type detection.>>July 28-31, 2002(1) The various Makefile's have been "normalized".  The fast*34[_t]    (Makefile.34m.common[_sql]), Makefile.pvm4[_sql], and    Makefile.mpi4[_sql] make files all use a common set of filenames,    described in Makefile.fcom.  This greatly simplifies adding    programs, but requires that all *.o files be deleted when moving    from fast*34* to pv34comp* to mp34comp*.(2) showalign.c/p_showalign.c have been merged into mshowalign.c    showbest.c/manshowbest.c have been merged into mshowbest.c.  Some    of the related files (showun.c, manshowun.c, have not been merged    or tested).(3) Code for ranking scores with valid e_value's incorporated.(4) Bug fixes in p2_complib.c, so that fasts34/fasts34_t/pvcompfs    provide identical statistics.>>July 26, 2002Makefile.pvm4_sql and Makefile.pvm4 have been substantially simplifiedby providing the worker program name from the h_init() function in theinitfa.c/initsw.c files.>>July 24, 2002Substantial modifications to param.h, structs.h to ensure that nosequence specific information is kept in struct pstruct.  Thisstructure now holds the pam[] matrix, and other scoring parameters,
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -