📄 notes
字号:
HMMER 2.2 release noteshttp://hmmer.wustl.edu/SRE, Fri May 4 13:00:33 2001---------------------------------------------------------------As it has been more than 2 years since the last HMMER release, this isunlikely to be a comprehensive list of changes.HMMER is now maintained under CVS. Anonymous read-only access to thedevelopment code is permitted. To download the current snapshot: > setenv CVSROOT :pserver:anonymous@skynet.wustl.edu:/repository/sre > cvs login [password is "anonymous"] > cvs checkout hmmer > cd hmmer > cvs checkout squid > cvs logoutThe following programs were added to the distribution: - The program "afetch" can fetch an alignment from a Stockholm format multiple alignment database (e.g. Pfam). "afetch --index" creates the index files for such a database. - The program "shuffle" makes "randomized" sequences. It supports a variety of sequence randomization methods, including an implementation of Altschul/Erickson's shuffling-while-preserving-digram-composition algorithm. - The program "sindex" creates SSI indices from sequence files, that "sfetch" can use to rapidly retrieve sequences from databases. Previously, index files were constructed with Perl scripts that were not supported as part of the HMMER distribution.The following features were added: - hmmsearch and hmmpfam can now use Pfam GA, TC, NC cutoffs, if these have been picked up in the HMM file (by hmmbuild). See the --cut_ga, --cut_tc, and --cut_nc options. - "Stockholm format" alignments are supported, and have replaced SELEX format as the default alignment format. Stockholm format is the alignment format agreed upon by the Pfam Consortium, providing extensible markup and annotation capabilities. HMMER writes Stockholm format alignments by default. The program sreformat can reformat alignments to other formats, including Clustal and GCG MSF formats. - To improve robustness, particularly in high-throughput annotation pipelines, all programs now accept an option --informat <s>, where <s> is the name of a sequence file format (FASTA, for example). The format autodetection code that is used by default is almost always right, and is very helpful in interactive use (HMMER reads almost anything without you worrying much about format issues). --informat bypasses the autodetector, asserts a particular format, and decreases the likelihood that HMMER misparses a sequence file. - new options: hmmpfam --acc reports HMM accession numbers instead of HMM names in output files. [Pfam infrastructure] sreformat --nogap, when reformatting an alignment, removes all columns containing any gap symbols; useful as a prefilter for phylogenetic analysis. - The real software version of HMMER is logged into the HMMER2.0 line of ASCII save files, for better version control (e.g. bug tracking, but there are no bugs in HMMER). - GCG MSF format reading/writing is now much more robust, thanks to assistance from Steve Smith at GCG. - The PVM implementation of hmmcalibrate is now parallelized in a finer grained fashion; single models can be accelerated. (The previous version parallelized by assigning models to processors, so could not accelerate a single model calibration.) - hmmemit can now take HMM libraries as input, not just a single HMM at a time - useful for instance for producing "consensus sequences" for every model in Pfam with one command. The following changes may affect HMMER-compatible software: - The name of the sequence retrieval program "getseq" was changed to "sfetch" in this release. The name "getseq" clashes with a Genetics Computer Group package program of similar functionality. - The output format for the headers of hmmsearch and hmmpfam were changed. The accessions and descriptions of query HMMs or sequences, respectively, are reported on separate lines. An option ("--compat") is provided for reverting to the previous format, if you don't want to rewrite your parser(s) right away. - hmmpfam now calculates E-values based on the actual number of HMMs in the database that is searched, unless overridden with the -Z option from the command line. It used to use Z=59021 semi-arbitrarily to make results jibe with a typical hmmsearch, but this just confused people more than it helped. hmmpfam E-values will therefore become more significant in this release by about 37x, for a typical Pfam search (59021/1600 = 37).The following major bugs were fixed: [none]The following minor bugs were fixed: - more argument casting to silence compiler warnings [M. Regelson, Paracel ] - a potential reentrancy problem with setting the alphabet type in the threads version was fixed, but this problem is unlikely to have ever affected anyone. [M. Sievers, Paracel]. - fixed a bug where hmmbuild on Solaris machines would crash when presented with an alignment with an #=ID line. Same bug caused a crash when building a model from a single sequence FASTA file [A. Bateman, Sanger] - The configure script was modified to deal better with different vendor's implementations of pthreads, in response to a DEC Digital UNIX compilation problem [W. Pearson, U. Virginia] - Automatic sequence file format detection was slightly improved, fixing a bug in detecting GCG-reformatted Swissprot files [reported by J. Holzwarth] - hmmpfam-pvm and hmmindex had a bad interaction if an HMM file had accession numbers as well as names (e.g., Pfam). The phenotype was that hmmpfam-pvm would search each model twice: once for its name, and once for its accession. hmmindex now uses a new indexing scheme (SSI, replacing GSI). [multiple reports; often manifested as a failure of the StL Pfam server to install, because of an hmmindex --one2one option in the Makefile; this was a local hack, never distributed in HMMER]. - a rare floating exception bug in ExtremeValueP() was fixed; range-checking protections in the function were in error, and a range error in a log() calculation appeared on Digital Unix platforms for a *very* tiny set of scores for any given mu, lambda. - The default null2 score correction was applied in a way that was justifiable, but differed between per-seq and per-domain scores; thus per-domain scores did not necessarily add up to per-seq scores. In certain cases this produced counterintuitive results. null2 is now applied in a way that is still justifiable, and also consistent; per-domain scores add up to the per-seq score. [first reported by David Kerk] - --domE and --domT did not work correctly in hmmpfam, because the code assumed that E-values are monotonic with score. In some cases, this could cause HMMER to fail to report some significant domains. [Christiane VanSchlun, GCG]The following obscure bugs were fixed (i.e., there were no reports ofanyone but me detecting these bugs): - sreformat no longer core dumps when reformatting a single sequence to an alignment format. - Banner() was printing a line to stdout instead of its file handle... but Banner is always called w/ stdout as its filehandle in the current implementation. [M. Regelson, Paracel] - .gz file reading is only supported on POSIX OS's. A compile time define, SRE_STRICT_ANSI, may be defined to allow compiling on ANSI compliant but non-POSIX operating systems. - Several problems with robustness w.r.t. unexpected combinations of command line options were detected by GCG quality control testing. [Christiane VanSchlun](At least) the following projects remain incomplete: - Ian Holmes' posterior probability routines (POSTAL) are partially assimilated; see postprob.c, display.c - CPU times can now be reported for serial, threaded, and PVM executions; this is only supported by hmmcalibrate right now. - Mixture Dirichlet priors now include some ongoing work in collaboration with Michael Asman and Erik Sonnhammer in Stockholm; also #=GC X-PRM, X-PRT, X-PRI support in hmmbuild/Stockholm annotation.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -