📄 emma.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 4 页
字号:
                                  Dayhoff one (above) but are much more up to                                  date and are based on a far larger data set.                                  They appear to be more sensitive than the                                  Dayhoff series. We use the GONNET 40, 80,                                  120, 160, 250 and 350 matrices.                                  We also supply an identity matrix which                                  gives a score of 1.0 to two identical amino                                  acids and a score of zero otherwise. This                                  matrix is not very useful. (Values: b                                  (blosum); p (pam); g (gonnet); i (id); o                                  (own))*  -pwdnamatrix        menu       [i] The scoring table which describes the                                  scores assigned to matches and mismatches                                  (including IUB ambiguity codes). (Values: i                                  (iub); c (clustalw); o (own))*  -pairwisedatafile   infile     Comparison matrix file (optional)*  -matrix             menu       [b] This gives a menu where you are offered                                  a choice of weight matrices. The default for                                  proteins is the PAM series derived by                                  Gonnet and colleagues. Note, a series is                                  used! The actual matrix that is used depends                                  on how similar the sequences to be aligned                                  at this alignment step are. Different                                  matrices work differently at each                                  evolutionary distance.                                  There are three 'in-built' series of weight                                  matrices offered. Each consists of several                                  matrices which work differently at different                                  evolutionary distances. To see the exact                                  details, read the documentation. Crudely, we                                  store several matrices in memory, spanning                                  the full range of amino acid distance (from                                  almost identical sequences to highly                                  divergent ones). For very similar sequences,                                  it is best to use a strict weight matrix                                  which only gives a high score to identities                                  and the most favoured conservative                                  substitutions. For more divergent sequences,                                  it is appropriate to use 'softer' matrices                                  which give a high score to many other                                  frequent substitutions.                                  1) BLOSUM (Henikoff). These matrices appear                                  to be the best available for carrying out                                  data base similarity (homology searches).                                  The matrices used are: Blosum80, 62, 45 and                                  30.                                  2) PAM (Dayhoff). These have been extremely                                  widely used since the late '70s. We use the                                  PAM 120, 160, 250 and 350 matrices.                                  3) GONNET . These matrices were derived                                  using almost the same procedure as the                                  Dayhoff one (above) but are much more up to                                  date and are based on a far larger data set.                                  They appear to be more sensitive than the                                  Dayhoff series. We use the GONNET 40, 80,                                  120, 160, 250 and 350 matrices.                                  We also supply an identity matrix which                                  gives a score of 1.0 to two identical amino                                  acids and a score of zero otherwise. This                                  matrix is not very useful. Alternatively,                                  you can read in your own (just one matrix,                                  not a series). (Values: b (blosum); p (pam);                                  g (gonnet); i (id); o (own))*  -dnamatrix          menu       [i] This gives a menu where a single matrix                                  (not a series) can be selected. (Values: i                                  (iub); c (clustalw); o (own))*  -mamatrixfile       infile     Comparison matrix file (optional)'   -[no]slow           toggle     [Y] A distance is calculated between every                                  pair of sequences and these are used to                                  construct the dendrogram which guides the                                  final multiple alignment. The scores are                                  calculated from separate pairwise                                  alignments. These can be calculated using 2                                  methods: dynamic programming (slow but                                  accurate) or by the method of Wilbur and                                  Lipman (extremely fast but approximate).                                  The slow-accurate method is fine for short                                  sequences but will be VERY SLOW for many                                  (e.g. >100) long (e.g. >1000 residue)                                  sequences.*  -pwgapopen          float      [10.0] The penalty for opening a gap in the                                  pairwise alignments. (Number 0.000 or more)*  -pwgapextend        float      [0.1] The penalty for extending a gap by 1                                  residue in the pairwise alignments. (Number                                  0.000 or more)*  -ktup               integer    [1 for protein, 2 for nucleic] This is the                                  size of exactly matching fragment that is                                  used. INCREASE for speed (max= 2 for                                  proteins; 4 for DNA), DECREASE for                                  sensitivity. For longer sequences (e.g.                                  >1000 residues) you may need to increase the                                  default. (integer from 0 to 4)*  -gapw               integer    [3 for protein, 5 for nucleic] This is a                                  penalty for each gap in the fast alignments.                                  It has little affect on the speed or                                  sensitivity except for extreme values.                                  (Positive integer)*  -topdiags           integer    [5 for protein, 4 for nucleic] The number of                                  k-tuple matches on each diagonal (in an                                  imaginary dot-matrix plot) is calculated.                                  Only the best ones (with most matches) are                                  used in the alignment. This parameter                                  specifies how many. Decrease for speed;                                  increase for sensitivity. (Positive integer)*  -window             integer    [5 for protein, 4 for nucleic] This is the                                  number of diagonals around each of the                                  'best' diagonals that will be used. Decrease                                  for speed; increase for sensitivity.                                  (Positive integer)*  -nopercent          boolean    [N] Fast pairwise alignment: similarity                                  scores: suppresses percentage score   -gapopen            float      [10.0] The penalty for opening a gap in the                                  alignment. Increasing the gap opening                                  penalty will make gaps less frequent.                                  (Positive foating point number)   -gapextend          float      [5.0] The penalty for extending a gap by 1                                  residue. Increasing the gap extension                                  penalty will make gaps shorter. Terminal                                  gaps are not penalised. (Positive foating                                  point number)   -[no]endgaps        boolean    [Y] End gap separation: treats end gaps just                                  like internal gaps for the purposes of                                  avoiding gaps that are too close (set by                                  'gap separation distance'). If you turn this                                  off, end gaps will be ignored for this                                  purpose. This is useful when you wish to                                  align fragments where the end gaps are not                                  biologically meaningful.   -gapdist            integer    [8] Gap separation distance: tries to                                  decrease the chances of gaps being too close                                  to each other. Gaps that are less than this                                  distance apart are penalised more than                                  other gaps. This does not prevent close                                  gaps; it makes them less frequent, promoting                                  a block-like appearance of the alignment.                                  (Positive integer)*  -norgap             boolean    [N] Residue specific penalties: amino acid                                  specific gap penalties that reduce or                                  increase the gap opening penalties at each                                  position in the alignment or sequence. As an                                  example, positions that are rich in glycine                                  are more likely to have an adjacent gap                                  than positions that are rich in valine.*  -hgapres            string     [GPSNDQEKR] This is a set of the residues                                  'considered' to be hydrophilic. It is used                                  when introducing Hydrophilic gap penalties.                                  (Any string is accepted)*  -nohgap             boolean    [N] Hydrophilic gap penalties: used to                                  increase the chances of a gap within a run                                  (5 or more residues) of hydrophilic amino                                  acids; these are likely to be loop or random                                  coil regions where gaps are more common.                                  The residues that are 'considered' to be                                  hydrophilic are set by '-hgapres'.   -maxdiv             integer    [30] This switch, delays the alignment of                                  the most distantly related sequences until                                  after the most closely related sequences                                  have been aligned. The setting shows the                                  percent identity level required to delay the                                  addition of a sequence; sequences that are                                  less identical than this level to any other                                  sequences will be aligned later. (Integer                                  from 0 to 100)   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outseq" associated qualifiers   -osformat2          string     Output seq format   -osextension2       string     File name extension   -osname2            string     Base file name   -osdirectory2       string     Output directory   -osdbname2          string     Database name to add   -ossingle2          boolean    Separate file for each entry   -oufo2              string     UFO features   -offormat2          string     Features format   -ofname2            string     Features file name   -ofdirectory2       string     Output directory   "-dendoutfile" associated qualifiers   -odirectory3        string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   The input is two or more sequences.  Input files for usage example  File: globins.fasta>HBB_HUMAN Sw:Hbb_Human => HBB_HUMANVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH>HBB_HORSE Sw:Hbb_Horse => HBB_HORSEVQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDPENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH>HBA_HUMAN Sw:Hba_Human => HBA_HUMANVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR>HBA_HORSE Sw:Hba_Horse => HBA_HORSEVLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR>MYG_PHYCA Sw:Myg_Phyca => MYG_PHYCAVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG>GLB5_PETMA Sw:Glb5_Petma => GLB5_PETMAPIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRDLSGKHAKSFQVDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSAY>LGB2_LUPLU Sw:Lgb2_Luplu => LGB2_LUPLUGALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA   EMBOSS programs do not allow you to simply type the names of two or   more files or database entries - they try to interpret this as all one   file-name and complain that a file of that name does not exist.   In order to enter the sequences that you wish to align, you must group   them in one of three ways: either make a 'list file' or place several   sequences in a single sequence file or specify the sequences using   wildcards.  Making a List file   A list file is a text file that holds the names of database entries   and/or sequence files.   You should use a text editor such as pico or nedit to edit a file to   contain the names of the sequence files or database entries. There   must be one sequence per line.   An example is the file 'fred' which contains:  __________________________________________________________________________opsd_abyko.fastasw:opsd_xenlasw:opsd_c*@another_list  __________________________________________________________________________   This List files contains:     * opsd_abyko.fasta - this is the name of a sequence file. The file       is read in from the current directory.     * sw:opsd_xenla - this is a reference to a specific sequence in the       SwissProt database     * sw:opsd_c* - this represents all the sequences in SwissProt whose       identifiers start with ``opsd_c''     * another_list - this is the name of a second list file. List files       can be nested!   Notice the @ in front of the last entry. This is the way you tell   EMBOSS that this file is a List file, not a regular sequence file.   That last line was put there both as an indication of the way you tell   EMBOSS that a file is a List file and to emphasise that List files can   contain other List files.   When emma asks for the sequences to align, you should type '@fred'.   The '@' character tells EMBOSS that this is the name of a List file.   An alternative to editing a file and laboriously typing in all of the   names you require is to make a list of a directory containing the   sequence files and then to edit the list file to remove the names of   the sequences files than you do not require.   To make a list of all the files in the current directory that end in
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -