📄 infoalign.txt
字号:
infoalign Function Information on a multiple sequence alignmentDescription infoalign is small utility to list some simple properties of sequences in an alignment. It will write a table containing one line per sequence. The information is written out in columns separated by space or TAB characters. The columns of data are: the sequences' USA, name, two measures of length, counts of gaps, and numbers of identical, similar and different residues or bases in this sequence when compared to a reference sequence, together with a simple statistic of the % change between the reference sequence and this sequence. The reference sequence can be either the calculated consensus sequence (the default) or it can be one of the set of aligned sequences, specified by either the ordinal number of that sequence in the input file, or by its name. Any combination of these types of information can be easily selected or unselected. By default, the output file starts each line with the USA of the sequence being described, so the output file is a list file that can be manually edited and read in by any other EMBOSS program that can read in one or more sequence to be analysed.Algorithm The set of aligned sequences is read in. If the reference sequence is the consensus sequence (this is the default) then this is calculated. If the reference sequence is specified as an ordinal number, then the sequences are counted (from 1) until the reference sequence is identified. If the reference sequence is specified by its name then the names of the sequences are compared to the specified name until the reference sequence is identified.Foreach sequence: Find the position of the first residue or base which is not a gap character. Find the position of the last residue or base which is not a gap character. Foreach position from the first non-gap character to the last non-gapcharacter: if the position is a gap character, then increment the 'GapLen' count if this character is the start of a new gap, increment the 'Gaps' count else the character at this position of the sequence and in the reference sequence are now compared. if the sequence character and the reference character are identical (apart from case) then increment the 'Ident' count else if the similarity matrix score for the two characters is > 0 (i.e. if they are similar) then increment the 'Similar' count else increment the 'Different' count The 'SeqLen' length of the sequence is the number of non-gap characters in the sequence (i.e. 'Ident' + 'Similar' + 'Different') The 'AlignLen' length of the sequence is the length from the first non-gap character to the last non-gap character. (i.e. the number of bases or residues of the sequence plus the number of gap characters internal to the sequence.) The '%Change' value for the sequence is calculated as: ('AlignLen' - 'Ident') * 100 / 'AlignLen'Usage Here is a sample session with infoalign% infoalign globins.msf Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the input files for this example Go to the output files for this example Example 2 This example doesn't display the USA of the sequence:% infoalign globins.msf -nousa Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the output files for this example Example 3 Display only the name and sequence length of a sequence:% infoalign globins.msf -only -name -seqlength Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the output files for this example Example 4 Display only the name, number of gap characters and differences to the consensus sequence:% infoalign globins.msf -only -name -gapcount -diffcount Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the output files for this example Example 5 Display the name and number of gaps within a sequence:% infoalign globins.msf -only -name -gaps Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the output files for this example Example 6 Display information formatted with HTML:% infoalign globins.msf -html Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the output files for this example Example 7 Use the first sequence as the reference sequence to compare to:% infoalign globins.msf -refseq 1 Information on a multiple sequence alignmentOutput file [globins.infoalign]: Go to the output files for this example Example 8% infoalign -auto tembl:eclac* -out test.out Go to the input files for this example Go to the output files for this example Example 9% infoalign -auto tembl:eclacz -out test.out Go to the input files for this example Go to the output files for this exampleCommand line arguments Standard (Mandatory) qualifiers: [-sequence] seqset The sequence alignment to be displayed. [-outfile] outfile [*.infoalign] If you enter the name of a file here then this program will write the sequence details into that file. Additional (Optional) qualifiers: -matrix matrix [EBLOSUM62 for protein, EDNAFULL for DNA] This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. -refseq string [0] If you give the number in the alignment or the name of a sequence, it will be taken to be the reference sequence. The reference sequence is the one against which all the other sequences are compared. If this is set to 0 then the consensus sequence will be used as the reference sequence. By default the consensus sequence is used as the reference sequence. (Any string is accepted) -html boolean [N] Format output as an HTML table Advanced (Unprompted) qualifiers: -plurality float [50.0] Set a cut-off for the % of positive scoring matches below which there is no consensus. The default plurality is taken as 50% of the total weight of all the sequences in the alignment. (Number from 0.000 to 100.000) -identity float [0.0] Provides the facility of setting the required number of identities at a position for it to give a consensus. Therefore, if this is set to 100% only columns of identities contribute to the consensus. (Number from 0.000 to 100.000) -only boolean [N] This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -nousa -noname -noalign -nogaps -nogapcount -nosimcount -noidcount -nodiffcount -noweight' to get only the sequence length output, you can specify '-only -seqlength' -heading boolean [@(!$(only))] Display column headings -usa boolean [@(!$(only))] Display the USA of the sequence -name boolean [@(!$(only))] Display 'name' column -seqlength boolean [@(!$(only))] Display 'seqlength' column -alignlength boolean [@(!$(only))] Display 'alignlength' column -gaps boolean [@(!$(only))] Display number of gaps -gapcount boolean [@(!$(only))] Display number of gap positions -idcount boolean [@(!$(only))] Display number of identical positions -simcount boolean [@(!$(only))] Display number of similar positions -diffcount boolean [@(!$(only))] Display number of different positions -change boolean [@(!$(only))] Display % number of changed positions -weight boolean [@(!$(only))] Display 'weight' column -description boolean [@(!$(only))] Display 'description' column Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -odirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -