📄 hmmbuild.man
字号:
.TH "hmmbuild" 1 "Oct 2003" "HMMER 2.3.2" "HMMER Manual".SH NAME.TP hmmbuild - build a profile HMM from an alignment.SH SYNOPSIS.B hmmbuild.I [options].I hmmfile.I alignfile.SH DESCRIPTION.B hmmbuild reads a multiple sequence alignment file .I alignfile, builds a new profile HMM, and saves the HMM in .I hmmfile..PP.I alignfilemay be in ClustalW, GCG MSF, SELEX, Stockholm, or aligned FASTAalignment format. The format is automatically detected..PPBy default, the model is configured to find one or morenonoverlapping alignments to the complete model: multipleglobal alignments with respect to the model, and local withrespect to the sequence.Thisis analogous to the behavior of the .B hmmls program of HMMER 1. To configure the model for multiple .I local alignmentswith respect to the model and local with respect tothe sequence,a la the old program.B hmmfs,use the.B -f (fragment) option. More rarely, you may want to configure the model for a singleglobal alignment (global with respect to bothmodel and sequence), using the .B -goption;or to configure the model for a single local/local alignment(a la standard Smith/Waterman, or the old.B hmmswprogram), use the.B -s option..SH OPTIONS.TP .B -fConfigure the model for finding multiple domains per sequence,where each domain can be a local (fragmentary) alignment. Thisis analogous to the old.B hmmfsprogram of HMMER 1..TP.B -gConfigure the model for finding a single global alignment toa target sequence, analogous tothe old.B hmmsprogram of HMMER 1..TP.B -hPrint brief help; includes version number and summary ofall options, including expert options..TP.BI -n " <s>" Name this HMM.I <s>. .I <s>can be any string of non-whitespace characters (e.g. one "word").There is no length limit (at least not one imposed by HMMER;your shell will complain about command line lengths first)..TP.BI -o " <f>"Re-save the starting alignment to .I <f>,in Stockholm format.The columns which were assigned to match states will bemarked with x's in an #=RF annotation line. If either the.B --hand or .B --fastconstruction options were chosen, the alignment may havebeen slightly altered to be compatible with Plan 7 transitions,so saving the final alignment and comparing to the starting alignment can let you view these alterations.See the User's Guide for more information on this arcaneside effect..TP.B -s Configure the model for finding a single local alignment pertarget sequence. This is analogous to the standard Smith/Watermanalgorithm or the .B hmmswprogram of HMMER 1. .TP.B -AAppend this model to an existing.I hmmfilerather than creating .I hmmfile.Useful for building HMM libraries (like Pfam)..TP .B -FForce overwriting of an existing .I hmmfile.Otherwise HMMER will refuse to clobber your existing HMM files,for safety's sake..SH EXPERT OPTIONS.TP.B --aminoForce the sequence alignment to be interpreted as amino acidsequences. Normally HMMER autodetects whether the alignment isprotein or DNA, but sometimes alignments are so small thatautodetection is ambiguous. See.B --nucleic..TP.BI --archpri " <x>"Set the "architecture prior" used by MAP architecture construction to .I <x>,where .I <x>is a probability between 0 and 1. This parameter governs a geometricprior distribution over model lengths. As.I <x> increases, longer models are favored a priori.As .I <x>decreases, it takes more residue conservation in a column tomake a column a "consensus" match column in the model architecture.The 0.85 default has been chosen empirically as a reasonable setting..TP.B --binaryWrite the HMM to.I hmmfilein HMMER binary format instead of readable ASCII text..TP.BI --cfile " <f>"Save the observed emission and transition counts to .I <f> after the architecture has been determined (e.g. after residues/gapshave been assigned to match, delete, and insert states).This option is used in HMMER development for generating data filesuseful for training new Dirichlet priors. The format ofcount files is documented in the User's Guide..TP.B --fastQuickly and heuristically determine the architecture of the model byassigning all columns will more than a certain fraction of gapcharacters to insert states. By default this fraction is 0.5, and itcan be changed using the.B --gapmaxoption.The default construction algorithm is a maximum a posteriori (MAP)algorithm, which is slower. .TP.BI --gapmax " <x>"Controls the .I --fastmodel construction algorithm, but if .I --fastis not being used, has no effect.If a column has more than a fraction.I <x>of gap symbols in it, it gets assigned to an insert column..I <x> is a frequency from 0 to 1, and by default is setto 0.5. Higher values of.I <x>mean more columns get assigned to consensus, and models getlonger; smaller values of .I <x> mean fewer columns get assigned to consensus, and models getsmaller..I <x>.TP.B --handSpecify the architecture of the model by hand: the alignment file mustbe in SELEX or Stockholm format, and the reference annotationline (#=RF in SELEX, #=GC RF in Stockholm) is used to specifythe architecture. Any column marked with a non-gap symbol (suchas an 'x', for instance) is assigned as a consensus (match) column inthe model..TP .BI --idlevel " <x>"Controls both the determination of effective sequence number andthe behavior of the .I --wblosum weighting option. The sequence alignment is clustered by percentidentity, and the number of clusters at a cutoff threshold of .I <x> is used to determine the effective sequence number.Higher values of .I <x> give more clusters and higher effective sequencenumbers; lower values of .I <x> give fewer clusters and lower effective sequence numbers..I <x> is a fraction from 0 to 1, and by default is set to 0.62 (corresponding to the clustering level usedin constructing the BLOSUM62 substitution matrix)..TP.BI --informat " <s>"Assert that the input .I seqfileis in format.I <s>;do not run Babelfish format autodection. This increasesthe reliability of the program somewhat, because the Babelfish can make mistakes; particularlyrecommended for unattended, high-throughput runsof HMMER. Valid format strings include FASTA,GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF,CLUSTAL, and PHYLIP. See the User's Guide for a completelist..TP.B --noeffTurn off the effective sequence number calculation, and use thetrue number of sequences instead. This will usually reduce thesensitivity of the final model (so don't do it without good reason!).TP.B --nucleicForce the alignment to be interpreted as nucleic acid sequence,either RNA or DNA. Normally HMMER autodetects whether the alignment isprotein or DNA, but sometimes alignments are so small thatautodetection is ambiguous. See.B --amino..TP.BI --null " <f>"Read a null model from .I <f>.The default for protein is to use average amino acid frequencies fromSwissprot 34 and p1 = 350/351; for nucleic acid, the default isto use 0.25 for each base and p1 = 1000/1001. For documentationof the format of the null model file and further explanationof how the null model is used, see the User's Guide..TP.BI --pam " <f>"Apply a heuristic PAM- (substitution matrix-) based prior on matchemission probabilities instead ofthe default mixture Dirichlet. The substitution matrix is readfrom .I <f>. See .B --pamwgt. The default Dirichlet state transition prior and insert emission priorare unaffected. Therefore in principle you could combine .B --priorwith.B --pambut this isn't recommended, as it hasn't been tested. (.B --pamitself hasn't been tested much!).TP .BI --pamwgt " <x>"Controls the weight on a PAM-based prior. Only has effect if.B --pam option is also in use. .I <x>is a positive real number, 20.0 by default. .I <x>is the number of "pseudocounts" contriubuted by the heuristicprior. Very high values of .I <x> can force a scoring system that is entirely driven by thesubstitution matrix, makingHMMER somewhat approximate Gribskov profiles..TP.BI --pbswitch " <n>"For alignments with a very large number of sequences, the GSC, BLOSUM, and Voronoi weighting schemes are slow;they're O(N^2) for N sequences. Henikoff position-basedweights (PB weights) are more efficient. At or above a certainthreshold sequence number.I <n> .B hmmbuild will switch from GSC, BLOSUM, or Voronoi weights toPB weights. To disable this switching behavior (at the costof compute time, set .I <n>to be something larger than the number of sequences inyour alignment..I <n>is a positive integer; the default is 1000..TP.BI --prior " <f>"Read a Dirichlet prior from .I <f>, replacing the default mixture Dirichlet.The format of prior files is documented in the User's Guide,and an example is given in the Demos directory of the HMMERdistribution..TP.BI --swentry " <x>"Controls the total probability that is distributed to local entriesinto the model, versus starting at the beginning of the modelas in a global alignment..I <x>is a probability from 0 to 1, and by default is set to 0.5.Higher values of.I <x>mean that hits that are fragments on their left (N or 5'-terminal) side will bepenalized less, but complete global alignments will be penalized more.Lower values of.I <x>mean that fragments on the left will be penalized more, andglobal alignments on this side will be favored.This option only affects the configurations that allow localalignments,e.g. .B -sand.B -f;unless one of these options is also activated, this option has no effect.You have independent control over local/global alignment behavior forthe N/C (5'/3') termini of your target sequences using .B --swentryand.B --swexit..TP .BI --swexit " <x>"Controls the total probability that is distributed to local exitsfrom the model, versus ending an alignment at the end of the modelas in a global alignment..I <x>is a probability from 0 to 1, and by default is set to 0.5.Higher values of.I <x>mean that hits that are fragments on their right (C or 3'-terminal) side will bepenalized less, but complete global alignments will be penalized more.Lower values of.I <x>mean that fragments on the right will be penalized more, andglobal alignments on this side will be favored.This option only affects the configurations that allow localalignments,e.g. .B -sand.B -f;unless one of these options is also activated, this option has no effect.You have independent control over local/global alignment behavior forthe N/C (5'/3') termini of your target sequences using .B --swentryand.B --swexit..TP .B --verbose Print more possibly useful stuff, such as the individual scores foreach sequence in the alignment..TP .B --wblosumUse the BLOSUM filtering algorithm to weight the sequences,instead of the default.Cluster the sequences at a given percentage identity(see.B --idlevel);assign each cluster a total weight of 1.0, distributed equallyamongst the members of that cluster..TP.B --wgscUse the Gerstein/Sonnhammer/Chothia ad hoc sequence weightingalgorithm. This is already the default, so this option has no effect(unless it follows another option in the --w family, in which case itoverrides it)..TP.B --wmeUse the Krogh/Mitchison maximum entropy algorithm to "weight"the sequences. This supercedes the Eddy/Mitchison/Durbinmaximum discrimination algorithm, which gives almostidentical weights but is less robust. ME weighting seems to give a marginal increase in sensitivityover the default GSC weights, but takes a fair amount of time..TP .B --wnoneTurn off all sequence weighting..TP.B --wpbUse the Henikoff position-based weighting scheme. .TP.B --wvoronoiUse the Sibbald/Argos Voronoi sequence weighting algorithmin place of the default GSC weighting..SH SEE ALSOMaster man page, with full list of and guide to the individual manpages: see .B hmmer(1)..PPFor complete documentation, see the user guide that came with thedistribution (Userguide.pdf); or see the HMMER web page,http://hmmer.wustl.edu/..SH COPYRIGHT.nfCopyright (C) 1992-2003 HHMI/Washington University School of Medicine.Freely distributed under the GNU General Public License (GPL)..fiSee the file COPYING in your distribution for details on redistributionconditions..SH AUTHOR .nfSean EddyHHMI/Dept. of GeneticsWashington Univ. School of Medicine4566 Scott Ave.St Louis, MO 63110 USAhttp://www.genetics.wustl.edu/eddy/.fi
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -