📄 edialign.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
                                 edialign Function   Local multiple alignment of sequencesDescription   edialign is an EMBOSS version of the program DIALIGN2 by B.   Morgenstern. It takes as input nucleic acid or protein sequences and   produces as output a multiple sequence alignment. The sequences need   not be similar over their complete length, since the program   constructs alignments from gapfree pairs of similar segments of the   sequences. Such segment pairs are referred to as "diagonals". If   (possibly) coding nucleic acid sequences are to be aligned, edialign   can optionally translate the compared "nucleic acid segments" to   "peptide segments", or even perform comparisons at both nucleic acid   and protein levels, so as to increase the sensitivity of the   comparison.Algorithm   For a complete explanation of the algorithm, see the references. In   short :   As described in our papers, the program DIALIGN constructs alignments   from gapfree pairs of similar segments of the sequences. Such segment   pairs are referred to as "diagonals". Every possible diagonal is given   a so-called weight reflecting the degree of similarity among the two   segments involved. The overall score of an alignment is then defined   as the sum of weights of the diagonals it consists of and the program   tries to find an alignment with maximum score -- in other words : the   program tries to find a consistent collection of diagonals with   maximum sum of weights. This novel scoring scheme for alignments is   the basic difference between DIALIGN and other global or local   alignment methods. Note that DIALIGN does not employ any kind of gap   penalty.   It is possible to use a threshold T for the quality of the diagonals.   In this case, a diagonal is considered for alignment only if its   "weight" exceeds this threshold. Regions of lower similarity are   ignored. In the first version of the program (DIALIGN 1), this   threshold was in many situations absolutely necessary to obtain   meaningful alignments. By contrast, DIALIGN 2 should produce   reasonable alignments without a threshold, i.e. with T = 0. This is   the most important difference between DIALIGN 2 and the first version   of the program. Nevertheless, it is still possible to use a positive   threshold T to filter out regions of lower significance and to include   only high scoring diagonals into the alignment.   The use of overlap weights improves the sensitivity of the program if   multiple sequences are aligned but it also increases the running time,   especially if large numbers of sequences are aligned. By default,   "overlap weights" are used if up to 35 sequences are aligned but   switched off for larger data sets.   If (possibly) coding nucleic acid sequences are to be aligned, DIALIGN   optionally translates the compared "nucleic acid segments" to "peptide   segments" according to the genetic code -- without presupposing any of   the three possible reading frames, so all combinations of reading   frames get checked for significant similarity. If this option is used,   the similarity among segments will be assessed on the "peptide level"   rather than on the "nucleic acid level".   For the levels of sequence similarity, release 2.2 of DIALIGN has two   additional options:     * It can measure the similarity among segment pairs at both levels       of similarity (nucleotide-level and peptide-level similarity). The       score of a fragment is based on whatever similarity is stronger.       As a result, the program can now produce mixed alignments that       contain both types of fragments. Fragments with stronger       similarity at the "nucleotide level" are referred to as       N-fragments whereas fragments with stronger similarity a the       peptide level are called P-fragments.     * If the translation or mixed alignment option is used, it is       possible to consider the reverse complements of segments, too. In       this case, both the original segments and their reverse       complements are translated and both pairs of implied "peptide       segments" are compared. This option is useful if DNA sequences       contain coding regions not only on the "Watson strand" but also on       the "Crick strand".   The score that DIALIGN assigns to a fragment is based on the   probability to find a fragment of the same respective length and   number of matches (or BLOSUM values, if the translation option is   used) in random sequences of the same length as the input sequences.   If long genomic sequences are aligned, an iterative procedure can be   applied where the program first looks for fragments with strong   similarity. In subsequent steps, regions between these fragments are   realigned. Here, the score of a fragment is based on random occurrence   in these regions between the previously aligned segment pairs.Usage   Here is a sample session with edialign% edialign Local multiple alignment of sequencesInput sequence set: vtest.seqOutput file [vtest.edialign]: (gapped) output sequence(s) [vtest.fasta]:    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequences]         seqset     Sequence set filename and optional format,                                  or reference (input USA)  [-outfile]           outfile    [*.edialign] Output file name  [-outseq]            seqoutall  [.] (Aligned) sequence                                  set(s) filename and optional format (output                                  USA)   Additional (Optional) qualifiers (* if not always prompted):*  -nucmode            menu       [n] Nucleic acid sequence alignment mode                                  (simple, translated or mixed) (Values: n                                  (simple); nt (translation); ma (mixed                                  alignments))*  -revcomp            boolean    [N] Also consider the reverse complement   -overlapw           selection  [default (when Nseq =< 35)] By default                                  overlap weights are used when Nseq =<35 but                                  you can set this to 'yes' or 'no'   -linkage            menu       [UPGMA] Clustering method to construct                                  sequence tree (UPGMA, minimum linkage or                                  maximum linkage) (Values: UPGMA (UPGMA); max                                  (maximum linkage); min (minimum linkage))   -maxfragl           integer    [40] Maximum fragment length (Integer 0 or                                  more)*  -fragmat            boolean    [N] Consider only N-fragment pairs that                                  start with two matches*  -fragsim            integer    [4] Consider only P-fragment pairs if first                                  amino acid or codon pair has similarity                                  score of at least n (Integer 0 or more)   -itscore            boolean    [N] Use iterative score   -threshold          float      [0.0] Threshold for considering diagonal for                                  alignment (Number 0.000 or more)   Advanced (Unprompted) qualifiers:   -mask               boolean    [N] Replace unaligned characters by stars                                  '*' rather then putting them in lowercase   -dostars            boolean    [N] Activate writing of stars instead of                                  numbers   -starnum            integer    [4] Put up to n stars '*' instead of digits                                  0-9 to indicate level of conservation                                  (Integer 0 or more)   Associated qualifiers:   "-sequences" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -odirectory2        string     Output directory   "-outseq" associated qualifiers   -osformat3          string     Output seq format   -osextension3       string     File name extension   -osname3            string     Base file name   -osdirectory3       string     Output directory   -osdbname3          string     Database name to add   -ossingle3          boolean    Separate file for each entry   -oufo3              string     UFO features   -offormat3          string     Features format   -ofname3            string     Features file name   -ofdirectory3       string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   edialign reads any normal sequence USAs. You must give as input at   least two sequences. You can use proteins as well as nucleic acids,   but you can't mix them.  Input files for usage example
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -