📄 readme.v33t0
字号:
$Name: fa35_03_06 $ - $Id: readme.v33t0,v 1.47 2007/06/29 20:23:58 wrp Exp $================ readme.v33t0 ================This release includes an MPI implementation of the parallellibrary-vs-library comparison code. See readme.mpi_3.3 andreadme.pvm_3.3 for more information.=====>>July 9, 2001Considerable changes to support no-global library functions. (1) Separate ascii/sequence mapping arrays are used by the query-reading (qascii), library-reading (lascii), and sequence comparison function (pascii) routines. As a result, there is no longer a need for tgetlib.o/lgetlib.o - lgetlib.o can serve both functions.(2) This also allows us to remove all #ifdef TFAST/FASTX conditionals from complib.c/comp_thr.c/p2_complib.c. We no longer need tcomp_thr.o, comp_thrx.o, etc. We still have a variety of p2_complib.o variations to support the different c34.work* files.(3) Because non-global openlib/getlib functions are available, exactly the same open/get functions are available for reading both the query and reference libraries in pv34comp* programs. The host-specific openlib/getlib functions in hxgetaa.c are now provided by nmgetlib.c, etc. This has two effect: (a) it is now possible to compare a query database generated by an SQL query to a library database generated by a different SQL query. (b) pv34comp* has lost (at least in this version) the ability to automatically detect the query sequence type. To search with a DNA query, you MUST use "-n".(4) the resetp() function is now responsible for almost all of the function sepcific (TFAST/FASTX/etc) initializations. All of the function specific code has been removed from complib.c/comp_thr.c and most of it has been moved to initfa.c/resetp().(5) manageacc.c has been merged into compacc.c (mostly prhist()).(6) Although it may reflect a subtle bug in my code, it is not possible to reliably run threaded/memory mapped versions of the fasta34_t code. I have spent considerable time tracking down the problem, and have determined that, in threaded code, something happens during the thread initialization to corrupt the description offset information used when files are memory mapped. This never occurs when the unthreaded versions of the code are used. And it does not occur under MacOSX, Compaq Tru64Unix, Sun Solaris/Sparc, or SGI IRIX. Thus, I cannot recommend using the threaded code versions (_t) under Linux (RH6.2 or 7.1).=====>>June 1, 2001Many changes to accomodate a new - no global variable - strategy forreading sequence databases. Every time a file is opened, a structlmf_str is allocated which can be used for memory mapped files, ncbl2,files, and mysql files.In addition, an open'ed file has a default sequence type: DNA orprotein, or one can open a file in a mode that will allow the sequencetype to be changed.=====>>May 18, 2001 CVS: fa33t09d0A new compile time parameter - -DGAP_OPEN, is available to change thedefinition of the "-f gap-open" parameter from the penalty for thefirst residue in a gap to a true gap-open penalty, as is used in BLASTand many other comparison algorithms. This will probably become thedefault for fasta in version 3.4.Fixes to conflicts between "-S" and "-s matrix". When a scoringmatrix file was specified, lower-case alignments were not displayedwith -S (although the scores were calculated properly).More extensive testting of mysql_lib.c (mySQL query-libraries) withthe pv4comp* and mp4comp* programs.=====>>April 5, 2001 CVS: fa33t08d4b3Changes in nmgetlib.c and ncbl2_mlib.c to return long sequencedescriptions for PCOMPLIB (pv4/mp3comp*). Also fix p2_complib.c torequest DNA library for translated comparisons.Fix for prss33(_t) to read both sequences from stdin.=====>>March 27, 2001 CVS: fa33t08d4 --> fa33t08d4Problems in ncbl2_mlib.c found searching NCBI non-redundant nucleotidedatabase "nt" were fixed. Testing revealed a minor memory leak, whichwas fixed by modifying showbest.c, showalign.c, comp_thr.c, complib.c,and p2_complib.c to remember the last opened database file moreeffectively.Modifications to allow 64-bit fseek/ftell on machines like Sun,Linux/Intel, that support -D_FILE_OFFSET_BITS=64, -D_LARGE_FILE_SOURCEoff_t, and fseeko(), ftello() with the option -DUSE_FSEEKO. Machineswith 64-bit long's do not need this option. Machines with 32-bitlongs that allow files >2 Gb can do so with 64-bit file accessfunctions, including fseeko() and ftello(), which work with off_t fileoffsets instead of long's.=====>>March 3, 2001 CVS: fa33t08d2Corrected problems in nmgetaa.c and mysql_lib.c with parallelprograms, and one serious problem with alternate DNA scoring matrices(initfa.c, initsw.c) not being set properly. A subtle problem withthe merge of scaleswn.c and scaleswg.c is fixed.>>February 17, 2001Modified mysql_lib.c to use "#", rather than "%ld", to indicate theposition of the GID. This change was made because sprintf() cannot beused reliably to generate an SQL string, as '"' and '%' are used in such strings.=====>>January 17, 2001(no version change, date change)Minro fixes to initfa.c, initsw.c to deal with DNA scoring matricesproperly. "-n -s dna.mat" is required for the sequence/matrix to berecognized as DNA.>>January 16, 2001-->v34t00Merge of the main CVS trunk - fa33t06 with the latest release branch,fa33t08.In addition, PCOMPLIB mods have been made to mysql_lib.c. Becausep2_complib.c gets sequence description information during the firstread of the database, the mysql_query must be changed to return:result[0]=GID, result[1]=description, result[2]=sequence. In thePCOMPLIB case, the other SQL queries (for GID description, sequence)are not necessary but must still be provided.=====>>January 16, 2001(no version change, previous version not released)changes to p2_complib.c to correct openlib() incompatibility.changes to nmgetaa.c, ncbl2_lib.c to incorporate PCOMPLIB. nxgetaa.cremoved.=====>>January 12, 2001(no version change, previous version not released)Change to initfa.c to move ktup check from query_parm() to last_init().=====>>January 10, 2001--> v33t08Fixes to complib.c, comp_thr.c to deal properly with long queryprotein sequences when a short library chunk (e.g. -N 5000) was given.In the case where the chunk size is too short, it will be reset to alength which allows the search to proceed, by including an amount ofnew sequence that is equal to the amount of overlap sequence.scaleswn.c and scaleswg.c have been merged.v33t08 includes the initial implementation for mySQL described belowfor v33t07x.======>>Dec. 20, 2000--> v33t07xInitial implementation of a syntax for mySQL database queries. A newfile, mysql_lib.c has been added, and changes have been made tonmgetaa.c (which should now replace nxgetaa.c) and altlib.h. A mySQLdatabase search needs a file with 4 parts:(1) description of the database, user, password(2) a select statement that generates the set of protein sequences as: UID, sequence(3) a select statement that generates a UID, description given a UID(4) a select statement that generats a single UID, sequence given a UID Each of the four parts should be separated by ';'. For example, inthe database that we are using for testing, a file "demo.sql" thatcontains:================localhost taxonomy username secret;SELECT proteins.gid, proteins.sequence FROM proteins,swissprot WHERE proteins.gid=swissprot.gid AND swissprot.spid IS NOT NULL;select proteins.gid, concat(swissprot.spid," ",proteins.description) from proteins,swissprot where proteins.gid=%ld AND swissprot.gid=proteins.gid;select gid, sequence from proteins where gid=%ld;================will find all the proteins in the BLAST "nr" database that also haveSwissProt ID's when given the command line: fasta33 -q query.aa "demo.sql 16"At least for simple queries, there is surprisingly little overhead for thesearch. For more complex queries involving several tables, the overheadcan be significant.At the moment, libraries that need the functions in mysql_lib.c willuse library type 16. We may also use file type 17 for SQL queriesthat return binary sequences.This implementation of mysql_lib.c was written to require a minimalamount of change to the other programs. Only nmgetaa.c and altlib.hneeded to be changed to incorporate this new capability. One resultof this limitation is that one cannot mix mySQL databases queries withother databases in the same search. Eventually, I would like to makea mySQL database like any other, so that several mysql databasequeries could be searched in the same run, and mysql databases couldbe mixed with other (flat file) databases, but this will require somechanges in the function calls throughout the code. (Right now, thevarious programs do not distinguish between an openlib() that is madebefore searching a large database, and one before retrieving a singlesequence. This must be changed for a database query like mySQL tobehave like other databases.Several mySQL demo files have been provided: mysql_demo*.sql.(10 January 2001) The mySQL code has been tested on Intel Linux andCompaq/Alpha/Tru64 Unix.>>Dec. 9, 2000Changes to apam.c that to tie different default gap penalties toalternate scoring matrices. In addition, changes to apam.c, to dealwith user-specified matrices with or without '*'.>>Nov. 5, 2000 (date updated)pst.dnaseq can now have 3 values, -1, or 0-> protein, 1->DNA, and 2->other.This becomes important for thing like init_karlin_a, which needs abackground frequency of residues.>>Nov. 1, 2000Significant bug fixes for the -z 6/-z 16 option. An ininitializedvariable was fixed in karlin.c, and comp_thr.c did not pass thecorrect composition argument type in find_zp(). The -z 6/16 optionhas now been tested and works correctly on Alphas, Linux x86, SGI, Sunand Mac OSX. Another problem was fixed in scaleswn.c (simplex()) thatprevented the code from being reused by the pv4/mp4 complib programs.>>Oct. 9, 2000Several changes made to accomodate Mac OSX. Longer lists of superfamilynumbers now supported in p[su]4comp/m[su]4comp programs.>>Sept 25, 2000All global variables have been removed from scaleswn.c. The last togo, db_struct db, required many edits, because until now, the fastaprograms have kept two versions of the db_struct data (entries,length). One version was kept by the main program, which updated entrynumber and db length as sequences were read; a second copy of thisinformation was kept by the statistical estimation routines. Nowthere is only one copy, which means that the E() values will be afunction of the complete database, not the database with some highscoring sequences removed.>>Sept 23, 2000Continued removal of global variables from scaleswn.c. Only oneglobal is left, db_struct db, which contains the number of entries inthe database and the number of residues. It will be the next to go(changing all the zs_to_*() functions) and scaleswn. will be freeof globals. scaleswg.c is gone - scaleswn.c compiles to scaleswg.cwith -DNORMAL_DIST.>>Sept 20, 2000Removal of histogram globals required changes in p2_complib.c as well.p_complib.c has not been updated. scaleswg.c has been modified toreflect the new histogram strategy.>>Sept 19, 2000Substantial changes to remove globals for printing histogram. m_msgnow contains a hist_str, which keeps histogram information.>>Sept. 19, 2000(no version change, previous version not released)Correct bug introduced into scaleswn.c (inithist()) by changingscore2_sums[], score_sums[] from int to double.Reporting of version numbers is more consistent between fasta33,fasta33_t, and pv4compfa/mp4compfa. The programs now report the samenumbers/dates in similar places.>>Sept. 15, 2000--> v33t07Changes to fix problems with statistical estimates when a largefraction (but not all) of the database is related. Several usersreported problems when searching with rRNA genes with version 33t06.In some cases, a 100% identitical match over 1500 nt would not bestatistically significant against a search of the bacterial divisionof Genbank. This problem was not seen with some releases of v33t05.The cause of the problem was a change between v33t05 and v33t06 to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -