📄 readme.v34t0

📁 序列对齐 Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequenc
💻 V34T0
📖 第 1 页 / 共 5 页
字号:
Changes to comp_lib.c (and Makefile.pcom) to support prss34_t.>>Feb 12, 2005Modify dropfs.c to dynamically allocate space for alignments, so thatqueries with a large number of fragments can still place all thefragments on the alignment.  Also fix a problem produced by removing-DBIGMEM from most of the Makefile's, but not fixing defs.h to useBIGMEM sizes by default.>>Jan 24, 2005Include a new program, "print_pssm", which reads a blastpgp binarycheckpoint file and writes out the frequency values as text.  Thesevalues can be used with a new option with ssearch34(_t) and prss34,which provides the ability to read a text PSSM file.  To specify atext PSSM, use the option -P "query.ckpt 1" where the "1" indicates atext, rather than a binary checkpoint file.  "initfa.c" has also beenmodified to work with PSSM files with zero's in the in the frequencytable.  Presumably these positions (at the ends) do not provideinformation. (Jan 26, 2005) blastpgp actually uses BLOSUM62 valueswhen zero frequencies are provided, so read_pssm() has been modifiedto use scoring matrix values for zero frequencies as well.>>Jan 13, 2005Change to initfa.c to have fasts34 do a protein comparison by default,rather than an unknown sequence type.  Automatic checking for fasts34does not work reliably, because queries can be very short.  Likewisefor fastm34.  [Jan 26, 2004] Undo this change, which broke DNAcomparison when "-n" was specified.>>Jan 7, 2005Changes to tatstats.h, dropfs2.c to allow larger numbers of peptidesto match when fasts is used to show coverage on a proteomicsexperiment.  Previously fasts could match no more than 30 peptides,that has been increased to 50.  In addition, ktup=2 can be usedto increase the likelihood that short exact matchs trump longermismatched regions.  >>Nov 11, 2004	   CVS fa34t25Finished merge of earlier fa34t24 branch with HEAD.  Correctlabeling of TFASTM.>>Nov 4-8, 2004	   Incorporation of Erik Lindahl "anti-diagonal" Altivec code forSmith-Waterman, only.  Altivec SSEARCH is now faster than FASTA forquery sequences < 250 amino acids.Small modifications to output score display to ensure that the correctscores are shown, and that they are correctly labeled.>>Aug 25,26, 2004  CVS fa34t24b3Small change in output format for p34comp* programs in">>>query_file#1 string" line before alignments.  This line is not presentin the non-parallel versions - it would be better for them to be consistent.Change in last_stats.c to properly label fasts statistics with -z != 1.Change in dropfs2.c to ensure that tatprobs are not precalculated with -z 4.Modify -m 9i output option to show in HTML output.Add "#ifdef NOOVERHANG" to dropfs2.c that causes overlappingalignments to score a 0, rather than the partial overlap score.Useful for SAGE alignments, because "fasts" requires global alignments(except for for overhangs, unless NOOVERHANG is defined).>>Aug 23, 2004Fix problem with very long definition lines with formatdb version4ASN databases.  Fix mshowalign.c to re-enable "-L" option.>>July 28, 2004 Fix to re-enable -w window shuffle for PRSS.  Modify comp_lib.cfor PRSS to ensure that the unshuffled score and probabilityare shown, even for very high probabililty alignments.>>July 21, 2004Modifications to support PostgreSQL databases with the same commandsas MySQL databases.  MySQL database libraries are type 16, PostgreSQLare type 17.  Makefile.linux_sql and Makefile.pvm4_sql support bothdatabase types simultaneously.>>June 23, 2004 CVS fa34t24b2Additional fixes to enable -n or -p with fasts34 andfastm34. Makefile.pcom was fixed for fastm34_t.  A new file,mgstm1.nts, of DNA fragments from mgstm1.seq, is included for testingfasts34 and fastm34.>>May 4, 2004  Fixes to initfa.c to allow DNA:DNA for FASTS, FASTM.  This changeintroduced a bug that broke FASTS completely, but was fixed June 18,2004 (and retagged fa34t24b2).>>April 23, 2004 CVS fa34t24b1Fix bug in initfa.c that caused tfasts/tfastf not to examine all sixframes.>>May 4, 2004Fixes to initfa.c to allow DNA:DNA for FASTS, FASTM.>>March 19, 2004 CVS fa34t24b0Modify all the drop*.c files, plus mshowbest.c and mshowalign.c, todisplay percent similarity, rather than percent ungapped.  Analignment is counted as similar if the score is greater than or equalto zero (the same criterion used for placing ".". To disable thischange, remove -DSHOWSIM from the appropriate Makefile.*.>>March 18, 2004 CVS fa34t23b8Fix bug in initfa.c tables that caused prss to generally compareproteins.>>March 15, 2004 Fix bug in calls to revcomp(); make revcomp() guarantee NULL termination.>>March 2, 2004	CVS fa34t23b7Fix a very embarrassing and surprising bug that caused insertionsin fasta alignments to appear in the wrong sequence.>>Feb 7, 2004	CVS fa34t23b6Change initfa.c to allow "-i" (reverse complement) and "-i -3" with"fastx34" and "prfx34".  In addition, "prfx34" now examines both queryDNA strands in calculated the shuffled statistical significance.>>Feb 5, 2004Reverse assignments for G:U baseparing in initfa.c.Fix memory allocation error caused by doubling DNA alignment width.>>Jan 7, 2004	CVS fa34t23b5Change in do_walign() in dropnfa.c to make final DNA alignments use aband that is 2X as large as the search band width.>>Dec 22, 2003	CVS fa34t23b4Fix typo in p2_complib.c that prevented compilation.  Fix problemwith karlin.c for asymmetrical matrices, such as used with -U.>>Dec 10, 2003  CVS fa34t23b3Fix problem in resetp()/initfa.c that disabled banded Smith-WatermanDNA alignments.Allow spam() to do extended alignments for DNA if one of the sequencesis < 50 nt.Cause default ktup to drop for short sequences.  For protein < 50, ktup=1;for DNA < 20, 50, 100 ktup = 1, 2, 3, respectively.>>Dec 7, 2003A new option, "-U" is available for RNA sequence comparison.  "-U"functions like "-n", indicating that the query is an RNA sequence.  Inaddition, to account for "G:U" base pairs, "-U" modifies the scoringmatrices so that a "G:A" match has the same score as a "G:G" match,and "T:C" match has the same score as a "T:T" match.  The asymmetricmatrix required changes in dropnfa.c that were similar to the changesin dropgsw.c required for profiles.  In addition, m_msg.qdnaseq and pst.dnaseq can now be SEQT_DNA, SEQT_RNA, SEQT_PROT, SEQT_UNK, or SEQT_OTHER.m_msg.ldnaseq does not use SEQT_RNA, only SEQT_DNA.  A new member ofstruct pstruct: int nt_align, is used to indicate nucleotidealignments.>>Nov 19, 2003Changes to Makefile's to distinguish between tatstats_fs.o andtatstats_ff.o.>>Nov 2, 2003Substantial changes to comp_lib.c, p2_complib.c, mshowbest.c, andmshowalign.c to support more sophisticated display options.Previously, one could have only on "-m #" option, even though severalof the options were orthogonal (-m 9c is independent of -m 1 and -m2,which is independent of -m 6 (HTML)).  The programs now use a bitmaskthat allows independent options to be combined.  In particular -m 9ccan be combined with -m 6, which can be very helpful for runs thatneed HTML output but can also exploit the encoding provided by -m 9c.The "-m 9" option now also allows "-m 9i", which shows the standardbest score information, plus percent identity and alignment length.>>Oct 26, 2003	CVS fa34t23b1Additional fixes to Makefiles to enable tfastf34(_t).  Changes tosupport ossearch34 (a non-Phil Green optimized Smith-Waterman).>>Oct 8, 2003	CVS fa34t23b0Fixes to get DNA queries working in both directions, and to fix PCOMPLIBprograms for "-V" option.  Currently, the parallel programs cannot usethe "-V" option.>>Sept 25, 2003A new option is available for annotating alignments.  -V '@#?!'can be used to annotate sites in a sequence, e.g:	>GTM1_HUMAN ...	PMILGYWDIRGLAHAIRLLLEYTDS@S?YEEKKYT@MG	DAPDYDRS@QWLNEKFKLGLDFPNLPYLIDGAHKITmight mark known and expected (S,T) phosphorylation sites.  Thesesymbols are then displayed on the query coordinate line:               10        20    @?  30  @     40  @     50        60GTM1_H PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP       ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::gtm1_h PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP               10        20        30        40        50        60This annotation is mostly designed to display post-translationalmodifications detected by MassSpec with FASTS, but is also availablewith FASTA and SSEARCH.>>Sept 22, 2003	  CVS fa34t22b5The Altivec Smith-Waterman code has been removed.>>Sept 17, 2003	  CVS fa34t22b4A variety of different bugs have been fixed.  (1) All the functions inthe old initsw.c are now in initfa.c; initsw.c will be removed.Specifically, the Profile/PSSM code is now in initfa.c.  initfa.c isnow fully table driven. (2) various problems with prss34 and prfx34have been fixed in initfa.c.  (3) An additional ncbl2_mlib.c bufferoverrun has been fixed. (4) fastf34 is now available in this package.Its performance is very similar to, but not identical to, fastf33.  Iam tracking down the differences.  In general, the raw scorescalculated by both programs are the same, but the statistical analysisseems to be slightly different.>>July 30, 2003   CVS fa34t22b3Fix bug in ncbl2_mlib.c that caused buffer overrun with blast/formatdbv3 description lines.>>July 28, 2003The initfa.c file has been substantially re-structured to use atable-driven approach to parameter setting, rather than the previousconfusing combinations of #ifdef's.  Two tables of parameters areused, pgm_def_arr[] and msg_def_arr[], which specify values like theprogram name, reference, scoring matrix, default gap penalties, etc.msg_def_arr[] has the sequence types for the query, library, andalgorithm, as well as other parameters (qframe, nframe, nrelv, etc),which greatly simplifies the sequence recognition logic.  ppst->pgm_idcan be used to identify the program that is running.  Eventually,almost all of the program specific #ifdef's will be removed frominitfa.c.  initfa.c now provides initsw.c functionality, so thatinitsw.c is no longer needed.>>July 25, 2003A new file is included - fasta.defaults - that lists the scoringmatrix, gap penalty, and other defaults for all of the fasta34programs.  This file will be used soon to simplify parameter settingfor the FASTA programs, and should also be used by Javascript WWWinterfaces to the FASTA programs.>>July 22, 2003    CVS fa34t22b2Fixes to dropfs2.c, tatprobs.c to ensure that negative probabilitiescannot occur.  Negative probabilities were never seen with standardmatrices, but did occur with BL50.  Another optimization in dropfs.cconsiderably improves fasts34 performance in some cases.Fix a problem with formatdb v4 ASN.1 format files.>>July 12, 2003Fix a bug that prevented "-L" (long sequence descriptions) fromworking.>>July 9, 2003Fix reverse complement (M:K) error.  Fix off-by-one error for FASTADNA alignments that caused the first aligned residue pair to bemissed.>>July 4 - 8, 2003Incorporate blast-def-line ASN.1 parsing so that NCBI formatdb version4 files can be read.>>June 26, 2003The strategy for displaying the match/mismatch line (" .:" for -m 0)has been changed dramatically to acommodate more sophisticatedstrategies for indicating conservative replacements, e.g. because ofPSSM's.  In addition to seqc0 and seqc1, which hold the alignedsequences for display, there is also seqca, which holds the alignmentsymbol.  calcons(), do_show(), and discons() have all changed toinclude seqca.  calcons() is somewhat more complex; discons() is muchsimpler.  (June 29, 2003 - dropgsw.c calcons() now displays profilesimilarity accurately - it is very very illuminating.)>>June 16, 2003	version: fasta34t22ssearch34 now supports PSI-BLAST PSSM/profiles.  Currently, it onlysupports the "checkpoint" file produced by blastall, and only oncertain architectures where byte-reordering is unnecessary.  It has notbeen tested extensively with the -S option.	ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa libraryWill use the frequency information in the blast.chkpt file to do aposition specific scoring matrix (PSSM) search using theSmith-Waterman algorithm.  Because ssearch34 calculates scores foreach of the sequences in the database, we anticipate that PSSMssearch34 statistics will be more reliable than PSI-Blast statistics.The Blast checkpoint file is mostly double precision frequencynumbers, which are represented in a machine specific way.  Thus, youmust generate the checkpoint file on the same machine that you runssearch34 or prss34 -P query.ckpt.  To generate a checkpoint file,run:blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -