📄 infoalign.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
                                 infoalign Function   Information on a multiple sequence alignmentDescription   infoalign is small utility to list some simple properties of sequences   in an alignment.   It will write a table containing one line per sequence. The   information is written out in columns separated by space or TAB   characters. The columns of data are: the sequences' USA, name, two   measures of length, counts of gaps, and numbers of identical, similar   and different residues or bases in this sequence when compared to a   reference sequence, together with a simple statistic of the % change   between the reference sequence and this sequence.   The reference sequence can be either the calculated consensus sequence   (the default) or it can be one of the set of aligned sequences,   specified by either the ordinal number of that sequence in the input   file, or by its name.   Any combination of these types of information can be easily selected   or unselected.   By default, the output file starts each line with the USA of the   sequence being described, so the output file is a list file that can   be manually edited and read in by any other EMBOSS program that can   read in one or more sequence to be analysed.Algorithm   The set of aligned sequences is read in.   If the reference sequence is the consensus sequence (this is the   default) then this is calculated. If the reference sequence is   specified as an ordinal number, then the sequences are counted (from   1) until the reference sequence is identified. If the reference   sequence is specified by its name then the names of the sequences are   compared to the specified name until the reference sequence is   identified.Foreach sequence:  Find the position of the first residue or base which is not a gap character.  Find the position of the last residue or base which is not a gap character.  Foreach position from the first non-gap character to the last non-gapcharacter:    if the position is a gap character, then      increment the 'GapLen' count      if this character is the start of a new gap, increment the 'Gaps' count    else      the character at this position of the sequence and in the      reference sequence are now compared.      if the sequence character and the reference character are identical      (apart from case) then        increment the 'Ident' count      else if the similarity matrix score for the two characters is > 0      (i.e.  if they are similar) then        increment the 'Similar' count      else        increment the 'Different' count  The 'SeqLen' length of the sequence is the number of non-gap characters  in the sequence (i.e. 'Ident' + 'Similar' + 'Different')  The 'AlignLen' length of the sequence is the length from the first  non-gap character to the last non-gap character.  (i.e.  the number of  bases or residues of the sequence plus the number of gap characters  internal to the sequence.)  The '%Change' value for the sequence is calculated as:  ('AlignLen' - 'Ident') * 100 / 'AlignLen'Usage   Here is a sample session with infoalign% infoalign globins.msf Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the input files for this example   Go to the output files for this example   Example 2   This example doesn't display the USA of the sequence:% infoalign globins.msf -nousa Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the output files for this example   Example 3   Display only the name and sequence length of a sequence:% infoalign globins.msf -only -name -seqlength Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the output files for this example   Example 4   Display only the name, number of gap characters and differences to the   consensus sequence:% infoalign globins.msf -only -name -gapcount -diffcount Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the output files for this example   Example 5   Display the name and number of gaps within a sequence:% infoalign globins.msf -only -name -gaps Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the output files for this example   Example 6   Display information formatted with HTML:% infoalign globins.msf -html Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the output files for this example   Example 7   Use the first sequence as the reference sequence to compare to:% infoalign globins.msf -refseq 1 Information on a multiple sequence alignmentOutput file [globins.infoalign]:    Go to the output files for this example   Example 8% infoalign -auto tembl:eclac* -out test.out    Go to the input files for this example   Go to the output files for this example   Example 9% infoalign -auto tembl:eclacz -out test.out    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequence]          seqset     The sequence alignment to be displayed.  [-outfile]           outfile    [*.infoalign] If you enter the name of a                                  file here then this program will write the                                  sequence details into that file.   Additional (Optional) qualifiers:   -matrix             matrix     [EBLOSUM62 for protein, EDNAFULL for DNA]                                  This is the scoring matrix file used when                                  comparing sequences. By default it is the                                  file 'EBLOSUM62' (for proteins) or the file                                  'EDNAFULL' (for nucleic sequences). These                                  files are found in the 'data' directory of                                  the EMBOSS installation.   -refseq             string     [0] If you give the number in the alignment                                  or the name of a sequence, it will be taken                                  to be the reference sequence. The reference                                  sequence is the one against which all the                                  other sequences are compared. If this is set                                  to 0 then the consensus sequence will be                                  used as the reference sequence. By default                                  the consensus sequence is used as the                                  reference sequence. (Any string is accepted)   -html               boolean    [N] Format output as an HTML table   Advanced (Unprompted) qualifiers:   -plurality          float      [50.0] Set a cut-off for the % of positive                                  scoring matches below which there is no                                  consensus. The default plurality is taken as                                  50% of the total weight of all the                                  sequences in the alignment. (Number from                                  0.000 to 100.000)   -identity           float      [0.0] Provides the facility of setting the                                  required number of identities at a position                                  for it to give a consensus. Therefore, if                                  this is set to 100% only columns of                                  identities contribute to the consensus.                                  (Number from 0.000 to 100.000)   -only               boolean    [N] This is a way of shortening the command                                  line if you only want a few things to be                                  displayed. Instead of specifying:                                  '-nohead -nousa -noname -noalign -nogaps                                  -nogapcount -nosimcount -noidcount                                  -nodiffcount -noweight'                                  to get only the sequence length output, you                                  can specify                                  '-only -seqlength'   -heading            boolean    [@(!$(only))] Display column headings   -usa                boolean    [@(!$(only))] Display the USA of the                                  sequence   -name               boolean    [@(!$(only))] Display 'name' column   -seqlength          boolean    [@(!$(only))] Display 'seqlength' column   -alignlength        boolean    [@(!$(only))] Display 'alignlength' column   -gaps               boolean    [@(!$(only))] Display number of gaps   -gapcount           boolean    [@(!$(only))] Display number of gap                                  positions   -idcount            boolean    [@(!$(only))] Display number of identical                                  positions   -simcount           boolean    [@(!$(only))] Display number of similar                                  positions   -diffcount          boolean    [@(!$(only))] Display number of different                                  positions   -change             boolean    [@(!$(only))] Display % number of changed                                  positions   -weight             boolean    [@(!$(only))] Display 'weight' column   -description        boolean    [@(!$(only))] Display 'description' column   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -odirectory2        string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messages
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -