📄 readme.v34t0
字号:
Fix serious bug in dropnfa.c/dropfx.c/dropfz2.c that caused -S to workimproperly on sequences with effective length of 3 or less.Change to scaleswn.c to make mle_cen(), mle_cen2() more robust to caseswhere the top and bottom scores are the same.Change p2_complib.c to avoid compiler complaints with (void *)wstage2p=NULLon some platforms.>>Aug. 30, 2001 CVS tag fa34t05d3Fixed problem with uthr_subs.c for Suns, but changed Makefile.sun touse pthreads rather than Sun Unix threads. Removed SQL stuff fromMakefile.mpi4/pvm4 and added Makefile.mpi4_sql/pvm4_sql. fa34t05d2 - fix to map_db.c to provide *sascii.fa34t05d1 - fixes to ibm_pthr_subs.c and Makefile.ibm from IBM.>>Aug. 20, 2001 CVS tag fa34t05d0The pvm/mpi complib programs have been substantially updated withrelease 3.4. See readme.v34t0 for more information. With version3.4, the MPI programs are mp34comp*, mu34comp*, etc.A major effect of this change is to disable automatic sequence type(protein/DNA) recognition with pv34compfa/mp34compfa. By default,protein libraries are assumed. Thus, pv34compfa/mp34compfa requirethe "-n" command line option when running pv34compfa/mp34compfa on DNAsequence libraries. This issue does not occur with the otherprograms, which will recognize the appropriate sequence type, becauseit is determined by the program (e.g. pv34compfx requiresDNA:protein).Fixed substantial problem with 64-bit file offsets for Linux incomplib.c/comp_thr.c, p2_complib.c. This problem, solved by DougBlair, was preventing the threaded versions from working properly inmemory mapped mode.In all earlier versions of fasta, when very long sequences weresearched, the sequence length reported was that of the "chunk" thatwas actually searched (typically 80,000-query_length) rather than theactual library sequence length. The peculiar behavior now changed,and the full length of the library sequence, not the sequence chunk,is reported as the library sequence length. Note that chunks arestill used, however, which can cause the same alignment to be showntwice. In addition, the "-m 9" output format has changed to reportthe coordinates of the query and library sequence (see below), whichmay be different from 1-sequence_length because the the query andlibrary sequences may have been extracted from larger sequences. Fouradditional fields have been added, "pn0", "px0","pn1", "px1" that arethe positions in for the beginning (pn0/1) and end (px0/1) of theyquery/library sequence. pn0/1 would typically be changed with the"@C:#" directive, described below.Changes to doinit.c/initfa.c/initsw.c to provide a new function -f_lastenv() - that allows function-specific adjustments to parametersafter the command line options have been read but before the firstsequence is read. This change solved problems with "mp/pv34compfx -S".fasts34/tfasts34 now recognize that 'I/L' are the same, as are 'Q/K'(which are apparently indistinguishable by Mass-Spec). The latteridentity is on by default, but can be turned off with "-h 0".The MPI/PVM versions of the programs have been tested extensively withcompfa, compfx, and comptfx. Makefile.mpi4 now works properly.Changes to p2complib.c to support the PVM option "-T 1-4", whichallows one to run on nodes 1-4 of a (presumably larger) PVM virtualmachine. This option has no effect on the mp34comp* programs. Theold "-T 4" to run on 4 nodes, is also available. If each node has 2cpu's, as indicated in the "pvmd hostfile", both CPU's will be usedfor a total, in this example, of 8 processes. This allows one tospecify a large PVM machine and use separate parts of itindependently.Changes to nmgetlib.c to fix problems with longer dates in GCG files(Y2K). Fixes to faatran.c for extended alphabets and 'X's. Variouscode clean-ups to make "gcc -Wall" a little bit (not much) happier.This is the first distributed fasta34 version.================>>Aug 9, 2001 CVS tag fa34t05Corrections to initfa.c to allow -S to work with tfastx/y.Fix to manshowbest.c for query position with -m 9.>>July 18, 2001 CVS tag fa34t04Various changes to complib.c, comp_thr.c, p2_complib.c, showbest.c,showalign.c to deal with overlapping alignments in long sequences thathave been segmented. When long sequences are segmented (lcont>0), theeventual total length (n1tot_v) is saved at beststr->n1tot_p. Ifthere was no lcont, then beststr->n1tot_p = NULL, and beststr->n1should be used as the sequence length. This has the advantage ofrequiring space only when long sequences are encountered, andrequiring only one integer for several segments.m_msg.noshow has been removed.The -m 9 format has been changed - 5 fields have been added, 4(pmn0/pmx0/pmn1/pmx1) provide the beginning and end coordinates of thequery and library sequence; the last (fs) reports the number offrameshifts. The names of the alignment boundaries have been changedfrom min0/max0/min1/max1 to amn0/amx0/amn1/amx1 (Alignment miN/maX).The SQL format has been extended to provide for statements that dothings but do not generate results, such as creating and selecting into a temporary table, e.g.:================ do create temporary table seq_pos ( id int unsigned not null auto_increment primary key, prot_id int unsigned not null default 0, start int unsigned not null default 0, length int unsigned not null default 0, ) ; do insert into seq_pos (prot_id, start, length) select id, 11, len-10 from protein, annot where len > 100 and annot.protein_id = protein.id and annot.pref=1 ; select seq_pos.id, substring(protein.seq, start, length), concat("@C:", start, " ", descr) from protein, seq_pos, annot where protein.id = annot.protein_id and protein.id = seq_pos.prot_id and annot.pref = 1 ; select prot_id, concat("@C:", start, " ", descr) from seq_pos, annot where annot.protein_id = seq_pos.prot_id and seq_pos.id = # and annot.pref = 1 ;================ In the current implementation, these statements must start with "DO"as the first two characters on the line, and come immediately after aline ending with ';'. The text from "DO" to the next ";", excludingthe "DO", is executed when the database connection is made.===== >>July 12, 2001The allocation of the work_info data structure used to sendinformation to the worker threads has been changed. The old methodworked, possibly by accident.A bug in p2_complib.c that caused E()-values to be calculatedimproperly for the first query sequence has been fixed.>>July 11, 2001 --> fa34t02It is now possible to specify output coordinates in library sequencesby including the string: "@C:number" on the description line, e.g. >gtm1_human gi|12345 human glutathione transferase M1 @C:21would label the first residue in the library sequence "21" rather than"1". This capability has been included to provide accuratecoordinates for searches done against subsequences generated by an SQLquery. For example, one could use a query of the form: SELECT protein.id, substring(protein.seq,11,length(protein.seq)-20), concat(protein.name," @C:11 ",protein.descr) FROM protein;to generate a sequence set with each sequence starting with residue11. Without the "@C:11" option on the description line, the programwould number the alignment positions starting at 1, even though thefirst residue of the sequence really started at 11. "@C:11" allowsone to correct the coordinate system.Currently, "@C:offset" is available only with library type 1 (fastaformat) and 16 (mySQL).The SQL-generated database with "@C:offset" can be used with both thefast*34(_t) programs and with pv34comp*. However, the SQL syntax isused differently in the fasta34 and pv34compfa programs. fast*34(_t)requires three SQL statements during a search: (1) a statement togenerate a large set of library sequences; (2) a statement to generatea description of a single sequence, given a unique identifier providedby (1); and (3) a statement to generate a single sequence given aunique identifier provided by (1). For fast*34 searches, the third(3) SQL statement must provide the "@C:offset" information in thethird results field for the offset to be used. It is optional in (1)and (2).The pv34comp* programs only require one SQL statement, statement (1)above, which must provide three fields, a unique identifier, thesequence, and a complete description that must include "@C:offset" ifsubstrings are used. If SQL queries (2) and (3) are provided, theyare ignored. Thus, the same files can be used by both programs, butthe "@C:offset" is required in different SQL queries by the fast*34and pv34comp* programs.Other changes:Re-incorporation of GAP_OPEN option; fix to Altschul-Gish stats whenGAP_OPEN is used.Re-incorporation of A. Mackey's spam() improvement in dropnfa.Fixes to include file ordering to allow fast*34(_t) pv34comp* programsto compile.Fix to lascii[] for SQL database queries.Fix to an old bug in comp_thr.c to send individual worker_infostructures to threads (does not fix LINUX threads problems, however).=====>>July 9, 2001Considerable changes to support no-global library functions. (1) Separate ascii/sequence mapping arrays are used by the query-reading (qascii), library-reading (lascii), and sequence comparison function (pascii) routines. As a result, there is no longer a need for tgetlib.o/lgetlib.o - lgetlib.o can serve both functions.(2) This also allows us to remove all #ifdef TFAST/FASTX conditionals from complib.c/comp_thr.c/p2_complib.c. We no longer need tcomp_thr.o, comp_thrx.o, etc. We still have a variety of p2_complib.o variations to support the different c34.work* files.(3) Because non-global openlib/getlib functions are available, exactly the same open/get functions are available for reading both the query and reference libraries in pv34comp* programs. The host-specific openlib/getlib functions in hxgetaa.c are now provided by nmgetlib.c, etc. This has two effect: (a) it is now possible to compare a query database generated by an SQL query to a library database generated by a different SQL query. (b) pv34comp* has lost (at least in this version) the ability to automatically detect the query sequence type. To search with a DNA query, you MUST use "-n".(4) the resetp() function is now responsible for almost all of the function sepcific (TFAST/FASTX/etc) initializations. All of the function specific code has been removed from complib.c/comp_thr.c and most of it has been moved to initfa.c/resetp().(5) manageacc.c has been merged into compacc.c (mostly prhist()).=====>>June 1, 2001Many changes to accommodate a new - no global variable - strategy forreading sequence databases. Every time a file is opened, a structlmf_str is allocated which can be used for memory mapped files, ncbl2,files, and mysql files.In addition, an open'ed file has a default sequence type: DNA orprotein, or one can open a file in a mode that will allow the sequencetype to be changed.=====>>May 18, 2001 CVS: fa33t09d0A new compile time parameter - -DGAP_OPEN, is available to change thedefinition of the "-f gap-open" parameter from the penalty for thefirst residue in a gap to a true gap-open penalty, as is used in BLASTand many other comparison algorithms. This will probably become thedefault for fasta in version 3.4.Fixes to conflicts between "-S" and "-s matrix". When a scoringmatrix file was specified, lower-case alignments were not displayedwith -S (although the scores were calculated properly).More extensive testting of mysql_lib.c (mySQL query-libraries) withthe pv4comp* and mp4comp* programs.=====>>April 5, 2001 CVS: fa33t08d4b3Changes in nmgetlib.c and ncbl2_mlib.c to return long sequencedescriptions for PCOMPLIB (pv4/mp3comp*). Also fix p2_complib.c torequest DNA library for translated comparisons.Fix for prss33(_t) to read both sequences from stdin.=====>>March 27, 2001 CVS: fa33t08d4Modifications to allow 64-bit fseek/ftell on machines like Sun,Linux/Intel, that support -D_FILE_OFFSET_BITS=64, -D_LARGE_FILE_SOURCEoff_t, and fseeko(), ftello() with the option -DUSE_FSEEKO. Machineswith 64-bit long's do not need this option. Machines with 32-bitlongs that allow files >2 Gb can do so with 64-bit file accessfunctions, including fseeko() and ftello(), which work with off_t fileoffsets instead of long's.=====>>March 3, 2001 CVS: fa33t08d2Corrected problems in nmgetaa.c and mysql_lib.c with parallelprograms, and one serious problem with alternate DNA scoring matrices(initfa.c, initsw.c) not being set properly. A subtle problem withthe merge of scaleswn.c and scaleswg.c is fixed.>>February 17, 2001Modified mysql_lib.c to use "#", rather than "%ld", to indicate theposition of the GID. This change was made because sprintf() cannot beused reliably to generate an SQL string, as '"' and '%' are used in such strings.=====>>January 17, 2001(no version change, date change)Minor fixes to initfa.c, initsw.c to deal with DNA scoring matricesproperly. "-n -s dna.mat" is required for the sequence/matrix to berecognized as DNA.>>January 16, 2001-->v34t00Merge of the main CVS trunk - fa33t06 with the latest release branch,fa33t08.In addition, PCOMPLIB mods have been made to mysql_lib.c. Becausep2_complib.c gets sequence description information during the firstread of the database, the mysql_query must be changed to return:result[0]=GID, result[1]=description, result[2]=sequence. In thePCOMPLIB
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -