📄 readme.v35
字号:
'*' is back in the aascii[] matrix, so that it is present by default(like fasta34).>>July 23, 2007Changes to support sub-sequence ranges for "library" sequences -necessary for fully functional prss (ssearch35) and lalign35. For allprograms, it is now possible to specify a subset of both the query andthe library, e.g. lalign35 -q mchu.aa:1-74 mchu.aa:75-148Note, however, that the subset range applied to the library will beapplied to every sequence in the library - not just the first - andthat the same subset range is applied to each sequence. This probablymakes sense only if the library contains a single sequence (this isalso true for the query file).Correct bugs in the functions that produce lav output from lalign35 -m11 to properly report the begin and end coordinates of both sequences.Previously, coordinates always began with "1". Correct associated bugin ps_lav.c that assumed coordinates started with "1".>>June 29, 2007 CVS fa35_02_01Merge of HEAD with fasta35 branch.>>June 29, 2007 CVS fa35_01_06Add exit(0); to ps_lav.c for 0 return code.>>June 26, 2007Add amino-acid 'J' for 'I' or 'L'.Add Mueller and Vingron (2000) J. Comp. Biol. 7:761-776 VT160 matrix,"-s VT160", and OPTIMA_5 (Kann et al. (2000) Proteins 41:498-503).Changes to dropnnw.c documentation functions to remove #ifdef's fromstrncpy() - which apparently is a macro in some versions of gcc.>>June 7, 2007Modify initfa.c to allow ggssearch35(_t), glsearch35(_t) to use PSSMs.>>June 5, 2007 CVS fa35_01_05Modifications to p2_complib.c, p2_workcomp.c to support Intel Ccompiler. Fixed bug in p2_workcomp.c - gstring[2][MAX_STR] required -[MAX_SSTR] too short. mp35comp* programs now tested and working (asare pv35comp*, c35.work* programs).Fix problem with fasts/fastm/fastf last_tat.c with limited memory.Correct problem with lalign35.exe Makefile.nm_[fp]com.Add $(CFLAGS) to map_db to enable large file support.Address problem with PSSM's when '*' not defined (initfa.c:extend_pssm()).>>May 30, 2007 CVS fa35_01_04Complete work on ps_lav, which converts an lalign35 lav (-m 11) fileinto a postscript plot, which looks identical to the plots produced byplalign from fasta2.>>May 25,29, 2007Changes to defs.h, doinit.c mshowalign.c for -m 11, which produces lavoutput only for lalign35.Changes to comp_lib2.c to add m_msg.std_output, which provides all thestandard print lines. This is turned off for -m 11 (lav) output.lalign35 -m 11 provides standard lav output, with the addition of#lalign35 -q ... .>>May 18, 2007Add m_msg.zsflag to preserve pst.zsflag when reset by global/globalexclusion of many library sequences.>>May 9, 2007 CVS fa35_01_03Tested local database size determination with p2_complib2/p2_workcomp2.>>May 2, 2007 renamed fasta35, pv35comp, etcSeparate thread buffer structures from param.h.Problems with incorrect alignments has been fixed by re-initializing thebest_seqs and lib_buf2_list.buf2 structures after each query sequence.The labels on the alignment scores are much more informative (and morediverse). In the past, alignment scores looked like:>>gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-transfer (218 aa) s-w opt: 1497 Z-score: 1857.5 bits: 350.8 E(): 8.3e-97Smith-Waterman score: 1497; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218)^^^^^^^^^^^^^^where the highlighted text was either: "Smith-Waterman" or "bandedSmith-Waterman". In fact, scores were calculated in other ways,including global/local for fasts and fastf. With the addition ofggsearch35, glsearch35, and lalign35, there are many more ways tocalculate alignments: "Smith-Waterman" (ssearch and protein fasta),"banded Smith-Waterman" (DNA fasta), "Waterman-Eggert","trans. Smith-Waterman", "global/local", "trans. global/local","global/global (N-W)". The last option is a global global alignment,but with the affine gap penalties used in the Smith-Watermanalgorithm.>>April 24, 2007The new program structure has been migrated to the PVM and MPIversions. In addition, the new global algorithms (pv35compgg,pv35compgl) have been moved, though the the PVM/MPI versions do not(yet) to the appropriate size filtering.>>April 19, 2007Two new programs, ggsearch35(_t) and glsearch35_t are now available.ggsearch35(_t) calculates an alignment score that is global in thequery and global in the library; glsearch35_t calculates an alignmentthat is global in the query and local, while local in the librarysequence. The latter program is designed for global alignments to domains.Both programs assume that scores are normally distributed. Thisappears to be an excellent approximation for ggsearch35 scores, butthe distribution is somewhat skewed for global/local (glsearch)scores. ggsearch35(_t) only compares the query to library sequencesthat are beween 80% and 125% of the length of the query; glsearchlimits comparisons to library sequences that are longer than 80% ofthe query. Initial results suggest that there is relatively littlelength dependence of scores over this range (scores go downdramatically outside these ranges).A bug was found and fixed in showalign() and showbest() where theaa1save buffer was not preserved when some sequences needed to bere-read, while others were stored in the beststr.>>April 9, 2007Some of the drop*.c functions have been reconfigured to reduce theamount of duplicate code. For example, dropgsw.c, dropnsw.c, anddropnfa.c all used exactly the same code to produce global alignments(NW_ALIGN() and nw_align()), this code is now in wm_align.c.Likewise, those same files, as well as dropgw2.c, use identical codeto produce consensus alignments (calcons(), calcons_a(), calc_id(),calc_code()). Rather than working with three or four copies ofidentical code, there is now one version.>>March 29, 2007At last, the lalign (SIM) algorithm has been moved from FASTA21 toFASTA35. Currently, only lalign35 is available. A plotting versionwill be available shortly (or perhaps a more general solution thatproduces lav output).The statistical estimates for lalign35 should be much more accuratethan those from the earlier lalign, because lambda and K are estimatedfrom shuffles.Many functions have been modified to reduce the number of timesstructures are passed as arguments, rather than pointers.>>February 23, 2007The threading strategy has been modified slightly to separate the endof the search phase (and a complete reading of all results buffers)from the termination phase. This will allow future threading ofsubsequent phases, including the Smith-Waterman alignments inshowbest() and showalign() (though care will be required to ensurethat the results are presented in the correct order).>>February 20, 2007 fasta-34_27_0 (released as fasta-35_1)The FASTA programs have been restructured to reduce the differencesbetween the threaded and unthreaded versions (and ultimately theparallel versions) and to make more efficient use of modern largememory systems. This is the beginning of a move towards a more robustshuffling strategy when searching databases with modest numbers ofrelated sequences.The major changes: comp_lib.c -> comp_lib2.c - comp_lib.c will be removed work_thr.c -> work_thr2.c - work_thr.c will be removed mshowbest.c, mshowalign.c have been modified to remove aa1 as an argument. They must allocate that space if they need it. The system is set up to allocate a substantial amount of library sequence memory, either to a single buffer (unthreaded) or to the threaded buffer pool. For smaller databases, the library sequences are read once, and then subsequently read from memory (this could be extended for RANLIB(bline) as well).Soon, these changes will allow the program to re-read the beststr[]sequences and shuffle them to produce accurate lambda/K estimates.================================================================See readme.v34t0 for earlier changes.================================================================
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -