⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 fasta35.1

📁 序列对齐 Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequenc
💻 1
字号:
.TH FASTA/SSEARCH/FASTX/TFASTXv3 1 local.SH NAMEfasta35, fasta35_t \- scan a protein or DNA sequence library for similarsequencesfastx35, fastx35_t \ - compare a DNA sequence to a protein sequencedatabase, comparing the translated DNA sequence in forward andreverse frames.tfastx35, tfastx35_t \ - compare a protein sequence to a DNA sequencedatabase, calculating similarities with frameshifts to the forward andreverse orientations.fasty35, fasty35_t \ - compare a DNA sequence to a protein sequencedatabase, comparing the translated DNA sequence in forward and reverseframes.tfasty35, tfasty35_t \ - compare a protein sequence to a DNA sequencedatabase, calculating similarities with frameshifts to the forward andreverse orientations.fasts35, fasts35_t \- compare unordered peptides to a protein sequence databasefastm35, fastm35_t \- compare ordered peptides (or short DNA sequences)to a protein (DNA) sequence databasetfasts35, tfasts35_t \- compare unordered peptides to a translated DNAsequence databasefastf35, fastf35_t \- compare mixed peptides to a protein sequence databasetfastf35, tfastf35_t \- compare mixed peptides to a translated DNAsequence databasessearch35, ssearch35_t \- compare a protein or DNA sequence to asequence database using the Smith-Waterman algorithm.ggsearch35, ggsearch35_t \- compare a protein or DNA sequence to asequence database using a global alignment (Needleman-Wunsch)glsearch35, glearch35_t \- compare a protein or DNA sequence to asequence database with alignments that are global in the query andlocal in the database sequence (global-local).lalign35 \- produce multiple non-overlapping alignments for proteinand DNA sequences using the Huang and Miller sim algorithm for theWaterman-Eggert algorithm.prss35, prfx35 \- (discontinued, replaced by ssearch35 and fastx35)estimate statistical significance of an alignment by comparing thescore to the distribution of similarity scores generated by shufflingthe second sequence.  prss35 uses Smith-Waterman.  prfx35 uses thefastx algorithm..SH DESCRIPTIONRelease 3.4 of the FASTA package provides a modular set of sequencecomparison programs that can run on conventional single processorcomputers or in parallel on multiprocessor computers. More than adozen programs \- fasta35, fastx35/tfastx35, fasty35/tfasty35,fasts35/tfasts35, fastm35, fastf35/tfastf35, ssearch35, ggsearch35,and glsearch35 \- are currently available.All of the comparison programs share a set of basic command lineoptions; additional options are available for individual comparisonfunctions. Threaded versions of the FASTA programs (fasta35_t, ssearch35_t, etc.)will run in parallel on modern Linux and Unix multi-core ormulti-processor computers.  Accelerated versions of the Smith-Watermanalgorithm are available for architectures with the Intel SSE2 orAltivec PowerPC architectures, which can speed-up Smith-Watermancalculations 10 - 20-fold..SH Options for comparison functions.LPThese versions of the FASTA programs have been modified to accept aquery sequence from the unix "stdin" data stream.  This makes it mucheasier to use fasta35 and its relatives as part of a WWW page. Toindicate that stdin is to be used, use "@" as the querysequence file name.  "@" can also be used to specify asubset of the query sequence to be used, e.g:.sp.ti 0.5icat query.aa | fasta35 -q @:50-150 s.spwould search the 's' database with residues 50-150 of query.aa.  FASTAcannot automatically detect the sequence type (protein vs DNA) when"stdin" is used, so the '-n' option is required for DNA..TP\-1Sort by "init1" score..TP\-3(TFASTA3, TFASTX/Y35 only) use only forward frame translations.TP\-a #"SHOWALL" option attempts to align all of both sequences in FASTA and SSEARCH..TP\-Aforce Smith-Waterman alignment for output.  Smith-Waterman is thedefault for protein sequences and FASTX35, but not for TFASTA35 or DNAcomparisons with FASTA35..TP\-b #number of best scores to show (must be < -E cutoff if -E is given).TP\-Bshow z-scores rather than bit scores.TP\-c #threshold for band optimization (FASTA, FASTX).TP\-C #(fasta35t11d4) length of name abbreviation in alignments, default = 6..TP\-d #number of best alignments to show ( must be < -e cutoff).TP\-Dturn on debugging mode.  Enables checks on sequence alphabet thatcause problems with tfastx35, tfasty35, tfasta35..TP\-E #expectation value upper limit for score and alignment display.Defaults are 10.0 for FASTA35 and SSEARCH35 protein searches, 5.0 fortranslated DNA/protein comparisons, and 2.0 for DNA/DNA searches..TP\-f #penalty for opening a gap (or first residue for older versions).TP\-F #expectation value lower limit for score and alignment display.-F 1e-6 prevents library sequences with E()-values lower than 1e-6from being displayed. This allows the use to focus on more distantrelationships..TP\-g #penalty for additional residues in a gap.TP\-h #(FASTX35, TFASTX35, FASTY35, TFASTY35 only) penalty for a frameshift betweentwo codons..TP\-j #(FASTY35, TFASTY35 only) penalty for a frameshift within a codon..TP\-Hturn off histogram display.TP\-i(DNA only) reverse complement the query sequence. (TFASTX) compare againstonly the reverse complement of the library sequence..TP\-kspecify number of shuffles for statistical parameter estimation (default=500)..TP\-l strspecify FASTLIBS file.TP\-Lreport long sequence description in alignments.TP\-m 0,1,2,3,4,5,6,9,10,11 alignment display options.  \fC-m 0, 1, 2, 3\fPdisplay different types of alignments.  \fC-m 4\fP provides analignment "map" on the query. \fC-m 5\fP combines the alignment mapand a \fC-m 0\fP alignment.  \fC-m 6\fP provides an HTML output.\fC-m 9\fP does not change the alignment output, but providesalignment coordinate and percent identity information with the bestscores report.  \fC-m 9c\fP adds encoded alignment information to the\fC-m 9\fP; \fC-m 9i\fP provides only percent identity and alignmentlength information with the best scores.  With current versions of theFASTA programs, independent \fC-m\fP options can be combined;e.g. \fC-m 1 -m 9c -m 6\fP..TP\-m 11 provides \fClav\fP format output from lalign35.  It does notcurrently affect other alignment algorithms.  The \fCps_lav\fPprogram can be used to convert \fClav\fP format output to postscriptalignment "dot-plots"..TP\-M #-#molecular weight (residue) cutoffs.  -M "101-200" examines only sequences that are 101-200 residues long..TP\-nforce query to nucleotide sequence.TP\-N #break long library sequences into blocks of # residues.  Useful forbacterial genomes, which have only one sequence entry.  -N 2000 workswell for well for bacterial genomes..TP\-o(FASTA) turn fasta band optimization off during initial phase.  This wasthe behavior of fasta1.x versions..TP\-O filesend output to file..TP\-q/-Qquiet option; do not prompt for input.TP\-r "+n/-m" values for match/mismatch for DNA comparisons. \fC+n\fP isused for the maximum positive value and \fC-m\fP is used for themaximum negative value. Values between max and min, are rescaled, butresidue pairs having the value -1 continue to be -1..TP \-R filesave all scores to statistics file (previously -r file).TP\-s namespecify substitution matrix.  BLOSUM50 is used by default;PAM250, PAM120, and BLOSUM62 can be specified by setting -s P120,P250, or BL62.  With this version, many more scoring matrices areavailable, including BLOSUM80 (BL80), and MDM10, MDM20, MDM40 (Jones,Taylor, and Thornton, 1992 CABIOS 8:275-282; specified as -s M10, -sM20, -s M40). Alternatively, BLASTP1.4 format scoring matrix files canbe specified.  BL80, BL62, and P120 are scaled in 1/2 bit units; allthe other matrices use 1/3 bit units.  DNA scoring matrices can alsobe specified with the "-r" option..TP\-Streat lower case letters in the query or database as low complexityregions that are equivalent to 'X' during the initial database scan,but are treated as normal residues for the final alignment display.Statistical estimates are based on the 'X'ed out sequence used duringthe initial search. Protein databases (and query sequences) can begenerated in the appropriate format using John Wooton's "pseg"program, available from ftp://ncbi.nlm.nih.gov/pub/seg/pseg.  Once youhave compiled the "pseg" program, use the command:.IP\fCpseg database.fasta -z 1 -q  > database.lc_seg\fP.TP\-t #Translation table - tfasta35, fastx35, tfastx35, fasty35, andtfasty35 support the BLAST tranlation tables.  See\fChttp://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c/\fP..TP\-T #(threaded, parallel only) number of threads or workers to use (set bydefault to 4 at compile time)..TP\-UDo RNA sequence comparisons: treat 'T' as 'U', allow G:U base pairs (by scoring "G-A" and "T-C" as "G-G" -1).  Search only one strand..TP\-V "?$%*"Allow special annotation characters in query sequence.  These characterswill be displayed in the alignments on the coordinate number line..TP\-w # line width for similarity score, sequence alignment, output..TP\-W # context length (default is 1/2 of line width -w) for alignment,like fasta and ssearch, that provide additional sequence context..TP\-x #match,#mismatchscores used for matches to 'X:X','N:N', '*:*' matches, and the corresponding'X:not-X', etc, mismatches, overriding the valuesspecified in the scoring matrix.  If only one value is given, it isused for both values..TP\-X "#,#"offsets query, library sequence for numbering alignments.TP\-y #Width for band optimization; by default 16 for DNA and protein ktup=2;32 for protein ktup=1;.TP\-z # Specify statistical calculation. Default is -z 1 for localsimilarity searches, which uses regression against the length of thelibrary sequence. -z -1 disables statistics.  -z 0 estimatessignificance without normalizing for sequence length. -z 2 providesmaximum likelihood estimates for lambda and K, censoring the 250lowest and 250 highest scores. -z 3 uses Altschul and Gish'sstatistical estimates for specific protein BLOSUM scoring matrices andgap penalties. -z 4,5: an alternate regression method.  \-z 6 uses acomposition based maximum likelihood estimate based on the method ofMott (1992) Bull. Math. Biol. 54:59-75.  -z 11,12,14,15,16: computethe regression against scores of randomly shuffled copies of thelibrary sequences.  Twice as many comparisons are performed, butaccurate estimates can be generated from databases of relatedsequences. -z 11 uses the -z 1 regression strategy, etc..TP\-Z db_sizeSet the apparent database size used for expectation value calculations(used for protein/protein FASTA and SSEARCH, and for FASTX, FASTY, TFASTX,and TFASTY)..SH Environment variables:.TPFASTLIBSlocation of library choice file (-l FASTLIBS).TPSMATRIXdefault scoring matrix (-s SMATRIX).TPSRCH_URLthe format string used to define the option to re-search thedatabase..TPREF_URLthe format string used to define the option to lookup the librarysequence in entrez, or some other database..SH AUTHORBill Pearson.brwrp@virginia.EDU

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -