📄 readme.v32t0

📁 序列对齐 Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequenc
💻 V32T0
📖 第 1 页 / 共 2 页
字号:
12 下一页
FASTX/Y and FASTA (DNA) are now half as fast, because the programs nowsearch both the forward and reverse strands by default.The documentation in fasta3x.me/fasta3x.doc has been substantiallyrevised.>>October 9, 1999 --> v32t08 (no version number change)Added "-M low-high" option, where low and high are inclusion limitsfor library sequences.  If a library sequence is shorter than "low" orlonger than "high", it will not be considered in the search.  Thus,"-M 200-250" limits the database search to proteins between 200 and250 residues in length.  This should be particularly useful for fasts3and fastf3.  This limit applies only to protein sequences.Modified scaleswn.c to fall back to maximum likelihood estimates oflambda, K rather than mean/variance estimates. (This allows MLEestimation to be used instead of proc_hist_n when a limited range ofscores is examined.)>>October 20, 1999(no version change)Modify nxgetaa.c/nmgetaa.c to recognize 'N' as a possible DNA character.>>October 9, 1999 --> v32t08 (no version number change)Added "-M low-high" option, where low and high are inclusion limitsfor library sequences.  If a library sequence is shorter than "low" orlonger than "high", it will not be considered in the search.  Thus,"-M 200-250" limits the database search to proteins between 200 and250 residues in length.  This should be particularly useful for fasts3and fastf3.  -M -500 searches library sequences < 500; -M 200 -searches sequences > 200. This limit applies only to proteinsequences.Modified scaleswn.c to fall back to maximum likelihood estimates oflambda, K rather than mean/variance estimates. (This allows MLEestimation to be used instead of proc_hist_n when a limited range ofscores is examined.)>>October 2, 1999 --> v32t08Many changes:(1) memory mapped (mmap()ed) database reading - other database reading fixes(2) BLAST2 databases supported(3) true maximum likelihood estimates for Lambda, K(4) Misc. minor fixes(1) (Sept. 26 - Oct. 2, 1999) Memory mapped database access.It is now possible to use mmap()ed access to FASTA format databases,if the "map_db" program has been used to produce an ".xin" file.  IfUSE_MMAP is defined at compile time and a ".xin" file is present, the".xin" will be used to access sequences directly after the file ismmap()ed.  On my 4-processor Alpha, this can reduce elapsed time by50%. It is not quite as efficient as BLAST2 format, but it is close.Currently, memory mapping is supported for type 0 (FASTA), 5(PIR/GCG ascii), and 6 (GCG binary).  Memory mapping is used if a".xin" file is present. ".xin" files are created by the new program"map_db".  The syntax for "map_db" is:	map_db [-n] "/dir/database.fa"which creates the file /dir/database.fa.xin.  Library types can beincluded in the filename; thus:	map_db -n "/gcggenbank/gb_om.seq 6"would be used for a type 6 GCG binary file. The ".xin" file must be updated each time the database file changes.map_db writes the size of the database file into the ".xin" file, sothat if the database file changes, making the ".xin" offsetinformation invalid, the ".xin" file is not used. "list_db" isprovided to print out the offset information in the ".xin" file.(Oct 2, 1999) The memory mapping routines have been changed toallow several files to be memory mapped simultaneously. Indeed, once adatabase has been memory mapped, it will not be unmap()ed until theprogram finishes.  This fixes a problem under Digital Unix, and shouldmake re-access to mmap()ed files (as when displaying high scores andalignments) much more efficient.  If no more memory is available formmap()ing, the file will be read using conventional fread/fgets.(Oct 2, 1999) The names of the database reading functions has beenchanged to allow both Blast1.4 and Blast2.0 databases to be read.  Inaddition, Makefile.common now includes an option to link bothncbl_lib.o and ncbl2_lib.o, which provides support for both libraries.However, Blast1.4 support has not been tested.The Makefile structure has been improved.  Each architecture specificMakefile (Makefile.alpha, Makefile.linux, etc) now includesMakefile.common.  Thus, changes to the program structure should becorrect for all platforms.  "map_db" and "list_db" are not made with"make all".The database reading functions in nxgetaa.c can now return a databaselength of 0, which indicates that no residues were read.  Previously,0-length sequences returned a length of 1, which were ignored.Complib.c and comp_thr.c have changed to accommodate thismodification.  This change was made to ensure that each residue,including the last, of each sequence is read.Corrected bug in nxgetaa.c with FASTA format files with very long(>512 char) definition lines.(2) (September 20, 1999) BLAST2 format databases supportedThis release supports NCBI Blast2.0 format databases, using eitherconventional file reading or memory mapped files.  The Blast2.0 formatcan be read very efficiently, so there is only a modest improvement inperformance with memory mapping.  The decision to use mmap()'ed filesis made at compile time, by defining USE_MMAP.  My thanks to EamonnO'Toole of DEC/Compaq, and Daryl Madura of Sun Microsystems, forproviding mmap()'ed modifications to fasta3.  On my machines, Blast2.0format reduces search time by about 30%.  At the moment, ambiguous DNAsequences are not decoded properly.(3) (September 30, 1999) A new statistical estimation option isavailable.  -z 2 has been changed from ln()-scaling, which nevershould have been used, to scaling using Maximum Likelihood Estimates(MLEs) of Lambda and K.  The MLE estimation routines were written byAaron Mackey, based on a discussion of MLE estimates of Lambda and Kwritten by Sean Eddy.  The MLE estimation examines the middle 95% ofscores, if there are fewer than 10000 sequences in the database;otherwise it excludes (censors) the top 250 scores and the bottom 250scores.  This approach seems to effectively prevent related sequencesfrom contaminating the estimation process.  As with -z 1, -z 12 causesthe program to generate a shuffled sequence score for each of thelibrary sequences; in this case, no censoring is done.  If theestimation process is reliable, Lambda and K should not vary much withdifferent queries or query lengths.  Lambda appears not to vary muchwith the comparison algorithm, although K does.(4) Minor changes include fixes to some of the alignment display routines,individual copies of the pstruct structure for each thread, and somechanges to ensure that every last residue in a library is availablefor matching (sometime the last residue could be ignored).  Thisversion has undergone extensive testing with high-throughput sequencesto confirm that long sequences are read properly.  Problems withfastf3/fasts3 alignment display have also been addressed.>>August 26, 1999 (no version change - not released)Corrected problem in "apam.c" that prevented scoring matrices frombeing imported for [t]fasts3/[t]fastf3.>>August 17, 1999 --> v32t07Corrected problem with opt_cut initialization that only appearedwith pvcomp* programs.Improved calculation of FASTA optcut threshold for DNA sequencecomparison for match scores much less than +5 (e.g. +3).  The previousoptcut theshold was too high when the match penalty was < 4 andktup=6; it is now scaled more appropriately.Optcut thresholds have also been raised slightly forfastx/y3/tfastx/y3.  This should improve performance with minimaleffects on sensitivity.>>July 29, 1999(no version change - date change)Corrected various uninitialized variables and buffer overrunsdetected.>>July 26, 1999 - new distribution(no version change - v32t06, previous version not released)Changed the location of "(reverse complement)" label in tfasta/x/y/s/fprograms.Statistical calculations for tfasta/x/y in unthreaded versioncorrected.  Statistical estimates for threaded and unthreaded versionsof the tfasta/x/y/s/f programs should be much more consistent.Substantial modifications in alignment coordinate calculation/presentation.  Minor error in fastx/y/tfastx/y end of alignmentcorrected.  Major problems with tfasta alignment coordinatescorrected.  tfasta and tfastx/y coordinates should now be consistent.Corrected problem with -N 5000 in tfasta/x/y3(_t) searches encounteredwith long query sequences.Updated pthr_subs.c/Makefile.linux to increase the pthreads stacksizeto try to avoid "cannot allocate diagonal arrays" error message.Pthreads stacksize can be changed with RedHat 6.0, but not RedHat 5.2,so Makefile.linux uses -DLINUX5 for RedHat5.* (no pthreads stack size).I am still getting this message, so it has not been completelysuccessful.  Makefile.linux now uses -DALLOCN0 to avoid this problem,at some cost in speed.The pvcomp* programs have been updated to work properly withforward/reverse DNA searches.  See readme.pvm_3.2.>>July 7, 1999 - not released
12 下一页
💿 文件大小 601 K
👤 上传用户 l2335800
📂 所属分类 Linux/Unix编程
🏷️ 相关标签

#sequence #protein #DNA #database
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -