📄 readme.v33t0

📁 序列对齐 Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequenc
💻 V33T0
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
changed to allow both Blast1.4 and Blast2.0 databases to be read.  Inaddition, Makefile.common now includes an option to link bothncbl_lib.o and ncbl2_lib.o, which provides support for both libraries.However, Blast1.4 support has not been tested.The Makefile structure has been improved.  Each architecture specificMakefile (Makefile.alpha, Makefile.linux, etc) now includesMakefile.common.  Thus, changes to the program structure should becorrect for all platforms.  "map_db" and "list_db" are not made with"make all".The database reading functions in nxgetaa.c can now return a databaselength of 0, which indicates that no residues were read.  Previously,0-length sequences returned a length of 1, which were ignored.Complib.c and comp_thr.c have changed to accommodate thismodification.  This change was made to ensure that each residue,including the last, of each sequence is read.Corrected bug in nxgetaa.c with FASTA format files with very long(>512 char) definition lines.(2) (September 20, 1999) BLAST2 format databases supportedThis release supports NCBI Blast2.0 format databases, using eitherconventional file reading or memory mapped files.  The Blast2.0 formatcan be read very efficiently, so there is only a modest improvement inperformance with memory mapping.  The decision to use mmap()'ed filesis made at compile time, by defining USE_MMAP.  My thanks to EamonnO'Toole of DEC/Compaq, and Daryl Madura of Sun Microsystems, forproviding mmap()'ed modifications to fasta3.  On my machines, Blast2.0format reduces search time by about 30%.  At the moment, ambiguous DNAsequences are not decoded properly.(3) (September 30, 1999) A new statistical estimation option isavailable.  -z 2 has been changed from ln()-scaling, which nevershould have been used, to scaling using Maximum Likelihood Estimates(MLEs) of Lambda and K.  The MLE estimation routines were written byAaron Mackey, based on a discussion of MLE estimates of Lambda and Kwritten by Sean Eddy.  The MLE estimation examines the middle 95% ofscores, if there are fewer than 10000 sequences in the database;otherwise it excludes (censors) the top 250 scores and the bottom 250scores.  This approach seems to effectively prevent related sequencesfrom contaminating the estimation process.  As with -z 1, -z 12 causesthe program to generate a shuffled sequence score for each of thelibrary sequences; in this case, no censoring is done.  If theestimation process is reliable, Lambda and K should not vary much withdifferent queries or query lengths.  Lambda appears not to vary muchwith the comparison algorithm, although K does.(4) Minor changes include fixes to some of the alignment display routines,individual copies of the pstruct structure for each thread, and somechanges to ensure that every last residue in a library is availablefor matching (sometime the last residue could be ignored).  Thisversion has undergone extensive testing with high-throughput sequencesto confirm that long sequences are read properly.  Problems withfastf3/fasts3 alignment display have also been addressed.>>August 26, 1999 (no version change - not released)Corrected problem in "apam.c" that prevented scoring matrices frombeing imported for [t]fasts3/[t]fastf3.>>August 17, 1999 --> v32t07Corrected problem with opt_cut initialization that only appearedwith pvcomp* programs.Improved calculation of FASTA optcut threshold for DNA sequencecomparison for match scores much less than +5 (e.g. +3).  The previousoptcut theshold was too high when the match penalty was < 4 andktup=6; it is now scaled more appropriately.Optcut thresholds have also been raised slightly forfastx/y3/tfastx/y3.  This should improve performance with minimaleffects on sensitivity.>>July 29, 1999(no version change - date change)Corrected various uninitialized variables and buffer overrunsdetected.>>July 26, 1999 - new distribution(no version change - v32t06, previous version not released)Changed the location of "(reverse complement)" label in tfasta/x/y/s/fprograms.Statistical calculations for tfasta/x/y in unthreaded versioncorrected.  Statistical estimates for threaded and unthreaded versionsof the tfasta/x/y/s/f programs should be much more consistent.Substantial modifications in alignment coordinate calculation/presentation.  Minor error in fastx/y/tfastx/y end of alignmentcorrected.  Major problems with tfasta alignment coordinatescorrected.  tfasta and tfastx/y coordinates should now be consistent.Corrected problem with -N 5000 in tfasta/x/y3(_t) searches encounteredwith long query sequences.Updated pthr_subs.c/Makefile.linux to increase the pthreads stacksizeto try to avoid "cannot allocate diagonal arrays" error message.Pthreads stacksize can be changed with RedHat 6.0, but not RedHat 5.2,so Makefile.linux uses -DLINUX5 for RedHat5.* (no pthreads stack size).I am still getting this message, so it has not been completelysuccessful.  Makefile.linux now uses -DALLOCN0 to avoid this problem,at some cost in speed.The pvcomp* programs have been updated to work properly withforward/reverse DNA searches.  See readme.pvm_3.2.>>July 7, 1999 - not released --> v32t06Corrected bug in complib.c (fasta3, fastx3, etc) that caused coredumps with "-o" option.Corrected a subtle bug in fastx/y/tfastx/y alignment display.>>June 30, 1999 - new distribution(no version change)Corrected doinit.c to allow DNA substitution matrices with -s matrixoption.Changed ".gbl" files to ".h" files.>>June 2 - 9, 1999 - new distribution(no version change)Added additional DNA lambda/K/H to alt_param.h.  Corrected someother problems with those table. for the case where (inf,inf)gap penalties were not included.Fixed complib.c/comp_thr.c error message to properly report filenamewhen library file is not found.Included approximate Lambda/K/H for BL80 in alt_parms.h.BL80 scoring matrix changed from 1/3 bit to 1/2 bit units.Included some additional perl files for searchfa.cgi, searchnn.cgiin the distribution (my-cgi.pl, cgi-lib.pl).>>May 30, 1999, June 2, 1999 - new distribution(no version number change)Added Makefile.NetBSD, if !defined(__NetBSD__) for values.h.  Changedzs_to_E() and z_to_E() in scaleswn.c to correctly calculate E() valuewhen only one sequence is compared and -z 3 is used.>>May 27, 1999(no version number change)Corrected bug in alignment numbering on the % identity line	27.4% identity in 234 aa (101-234:110-243)for reverse complements with offset coordinates (test.aa:101-250)>>May 23, 1999(no version number change)Correction to Makefile.linux (tgetaa.o : failed to -DTFAST). >>May 19, 1999(no version number change)Minor changes to pvm_showalign.c to allow #define FIRSTNODE 1.Changes to showsum.c to change off-end reporting.  (Neither of thesechanges is likely to affect anyone outside my research group.)>>May 12, 1999 --> v32t05Fixed a serious bug in the fastx3/tfastx3 alignment display whichcaused t/fastx3 to produce incorrect alignments (and incorrectly lowpercent identities).  The scores were correct, but the alignmentpercent identities were too low and the alignments were wrong.Numbering errors were also corrected in fastx3/tfastx3 andfasty3/tfasty3 and when partial query sequences were used.>>May 7, 1999Fixed a subtle bug in dropgsw.c that caused do_work() to calculateincorrect Smith-Waterman scores after do_walign() had been called.This affected only pvcompsw searches with the "-m 9" option.>>May 5, 1999Modified showalign.c to provide improved alignment information thatincludes explicitly the boundaries of the alignment.  Defaultalignments now say:Smith-Waterman score: 175;  24.645% identity in 211 aa overlap (5:207-7:207)>>May 3, 1999Modified nxgetaa.c, showsum.c, showbest.c, manshowun.c to allow a"not" superfamily annotation for the query sequence only.  Thegoal is to be able to specify that certain superfamily numbers beignored in some of the search summaries.  Thus, a description lineof the form:>GT8.7 | 40001 ! 90043 | transl. of pa875.con, 19 to 675says that GT8.7 belongs to superfamily 40001, but any librarysequences with superfamily number 90043 should be ignored in anylisting or summary of best scores.In addition, it is now possible to make a fasta3r/prcompfa, which isthe converse of fasta3u/pucompfa. fasta3u reports the highest scoringunrelated sequences in a search using the superfamily annotation.fasta3r shows only the scores of related sequences.  This might beused in combination with the -F e_val option to show the scoresobtained by the most distantly related members of a family.>>April 25, 1999 -->v32t04 (not distributed)Modified nxgetaa.c to remove the dependence of tgetaa.o on TFASTA(necessary for a more rational Makefile structure).  No code changes.>>April 19, 1999Fixed a bug in showalign.c that displayed incorrect alignment coordinates.(no version number change).>>April 17, 1999 --> v32t03A serious bug in DNA alignments when the sequence has been broken intomultiple segments that was introduced in version fasta32 has beenfixed.  In addition, several minor problems with -z 3 statistics onDNA sequences were fixed.Added -m 9 option, which unfortunately does different things inpvcompfa/sw and fasta3/ssearch3.  In both programs, -m 9 provides theid's of the two sequences, length, E(), %_ident, and start and end ofthe alignment in both sequences.  pvcompfa/sw provides thisinformation with the list of high scoring sequences.  fasta3/ssearch3provides the information in lieu of an alignment.>>March 18, 1999 --> v32t02Added information on the algorithm/parameter description line toreport the range of the pam matrices.  Useful for matrices likeMD_10, _20, and _40 which require much higher gap penalties.>>March 13, 1999 (not distributed) --> v32t01  -r results.file  has been changed to -R results.file to accomodate DNA match/mismatch penalties of the form: -r "+1/-3".>>February 10, 1999Modify functions in scalesw*.c to prevent underflow after exp() onAlpha Linux machines.  The Alpha/LINUX gcc compiler is buggy anddoesn't behave properly with "denormalized" numbers, so "gcc -g -mieee" is recommended.Add "Display alignments also (y/n)[n] "pvcomplib.c again provides alignments!!  In addition, there is anew "-m 9" option, which reports alignments as:>>>/home/wrp/slib/hlibs/hum0.aa#5>HS5 gi:1280326 T-cell receptor beta chain 30 aa, 30 aa vs /home/wrp/slib/hlibs/hum0.seg libraryHS5         	  30	HS5         	  30	1.873e-11	1.000	  30	   1	  30	   1	  30HS5         	  30	HS2249      	  40	1.061e-07	0.774	  31	   1	  30	   7	  37HS5         	  30	HS2221      	  38	1.207e-07	0.833	  30	   1	  30	   7	  35HS5         	  30	HS2283      	  40	1.455e-07	0.774	  31	   1	  30	   7	  37HS5         	  30	HS2239      	  38	1.939e-07	0.800	  30	   1	  30	   7	  35where the columns are:query-name      q-len   lib-name      lib-len   E()             %id    align-len  q-start q-end   l-start l-end>>February 9, 1999Corrected bug in showalign.c that offset reverse complement alignmentsby one.>>Febrary 2, 1999Changed the formatting slightly in showbest.c to have columns line up better.>>January 11, 1999Corrected some bugs introduced into fastf3(_t) in the previous version.>>December 28, 1998Corrected various problems in dropfz.c affecting alignment scoresand coordinates.Introduced a new program, fasts3(_t), for searching with peptidesequences.>>November 11, 1998  --> v32t0Added code to correct problems with coordinate number in long librarysequences with tfastx/tfasty.  With this release, sequences should benumbered properly, and sequence numbers count down with reversecomplement library sequences.In addition, with this release, fastx/y and tfastx/y translatedprotein alignments are numbered as nucleotides (increasing by 3,labels every 30 nucleotides) rather than codons.
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -