📄 emma.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
   '.pep', type:   ls *.pep > listfile  Several sequences in one file   EMBOSS can read in a single file which contains many sequences.   Each of the sequences in the file must be in the same format - if the   first sequence is in EMBL format, then all the others must be in EMBL   format.   There are some sequence formats that cannot be used when placing many   sequences in the same file. These are sequence formats that have no   clear indication of where the sequence ends and the annotation of the   next sequence starts. These formats include: plain or text format (no   real format, just the sequence), staden, gcg.   If your sequences are not already in a single file, you can place them   in one using seqret. The following example takes all the files ending   in '.pep' and places them in the file 'mystuff' in Fasta format.   seqret "*.pep" mystuff   When emma asks for the sequences to align, you should type 'mystuff'.  Using wildcards   'Wildcard' characters are characters that are expanded to match all   possible matching files or entries in a database.   By far the most commonly used wildcard character is '*' which matches   any number (or zero) of possible characters at that position in the   name.   A less commonly used wildcard character is '?' which matches any one   character at that position.   For example, when emma asks for sequences to align, you could answer:   abc*.pep This would select any files whose name starts with 'abc' and   then ends in '.pep'; the centre of the name where there is a '*' can   be anything.   Both file names and database entry names can be wildcarded.   There is a slightly irritating problem that occurs when wildcards are   used one the Unix command line (This is the line that you type against   the 'Unix' prompt together with the program name.)   In this case the Unix session gets the command line first, runs the   program, expands the wildcards and passes the program parameters to   the program. When Unix expands the wildcards, two things go wrong. You   may have specified wildcarded database entries - the Unix system tries   to file files that match that specification, it fails and refuses to   run the program. Alternatively, you may have specified wildcarded   files - Unix fileds them and gives the name of each of them to the   program as a separate parameter - emma gets the wrong number of   parameters and refuses to run.   You get round this by quoting the wildcard. You can either put the   whole wildcarded name in quotes:   "abc*.pep"   or you can quote just the '*' using a '\' as:   abc\*.pep   This problem does not occur when you reply to the prompt from the   program for the input sequences, or when you are typing the wildcard   files name in a web browser of GUI (such as Jemboss or SPIN) fieldOutput file format  Output files for usage example  File: hbb_human.aln>HBB_HUMAN--------VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP----ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------>HBB_HORSE--------VQLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDP----ENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------>HBA_HUMAN---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDP----VNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------>HBA_HORSE---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF------DLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDP----VNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------>MYG_PHYCA---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPI----KYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG>GLB5_PETMAPIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRDLSGKHAKSFQ----VDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSAY------------->LGB2_LUPLU--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKG--TSEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---  File: hbb_human.dnd((((HBB_HUMAN:0.08080,HBB_HORSE:0.08359):0.21952,(HBA_HUMAN:0.05452,HBA_HORSE:0.06605):0.21070):0.06034,MYG_PHYCA:0.39882):0.01490,GLB5_PETMA:0.38267,LGB2_LUPLU:0.50324);  Sequences   emma writes the aligned sequences and a dendrogram file showing how   the sequences were clustered during the progressive alignments.   The clustalw output sequences are reformatted into the default EMBOSS   output format instead of being left as Clustal-format '.aln' files.  Trees   Believe it or not, we now use the New Hampshire (nested parentheses)   format as default for our trees. This format is compatible with e.g.   the PHYLIP package. If you want to view a tree, you can use the RETREE   or DRAWGRAM/DRAWTREE programs of PHYLIP. This format is used for all   our trees, even the initial guide trees for deciding the order of   multiple alignment. The output trees from the phylogenetic tree menu   can also be requested in our old verbose/cryptic format. This may be   more useful if, for example, you wish to see the bootstrap figures.   The bootstrap trees in the default New Hampshire format give the   bootstrap figures as extra labels which can be viewed very easily   using TREETOOL which is available as part of the GDE package. TREETOOL   is available from the RDP project by ftp from rdp.life.uiuc.edu.   The New Hampshire format is only useful if you have software to   display or manipulate the trees. The PHYLIP package is highly   recommended if you intend to do much work with trees and includes   programs for doing this. WE DO NOT PROVIDE ANY DIRECT MEANS FOR   VIEWING TREES GRAPHICALLY.Data files   The comparison matrices available for clustalw are not EMBOSS matrix   files, as they are defined in the clustalw code. The matrices   available for carrying out a protein sequence alignment are:     * blosum     * pam     * gonnet     * id     * user defined   The comparison matrices available in clustalw for carrying out a   nucleotide sequence alignment are:     * iub     * clustalw     * user definedNotes   NoneReferences   The main reference for ClustalW is Thompson et al below.    1. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) "CLUSTAL W:       improving the sensitivity of progressive multiple sequence       alignment through sequence weighting, positions-specific gap       penalties and weight matrix choice." Nucleic Acids Research,       22:4673-4680.    2. Feng, D.-F. and Doolittle, R.F. (1987). J. Mol. Evol. 25, 351-360.    3. Needleman, S.B. and Wunsch, C.D. (1970). J. Mol. Biol. 48,       443-453.    4. Dayhoff, M.O., Schwartz, R.M. and Orcutt, B.C. (1978) in Atlas of       Protein Sequence and Structure, vol. 5, suppl. 3 (Dayhoff, M.O.,       ed.), pp 345-352, NBRF, Washington.    5. Henikoff, S. and Henikoff, J.G. (1992). Proc. Natl. Acad. Sci. USA       89, 10915-10919.    6. Lipman, D.J., Altschul, S.F. and Kececioglu, J.D. (1989). Proc.       Natl. Acad. Sci. USA 86, 4412-4415.    7. Barton, G.J. and Sternberg, M.J.E. (1987). J. Mol. Biol. 198,       327-337.    8. Gotoh, O. (1993). CABIOS 9, 361-370.    9. Altschul, S.F. (1989). J. Theor. Biol. 138, 297-309.   10. Lukashin, A.V., Engelbrecht, J. and Brunak, S. (1992). Nucl. Acids       Res. 20, 2511-2516.   11. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald,       A.F. and Wooton, J.C. (1993). Science, 262, 208-214.   12. Vingron, M. and Waterman, M.S. (1993). J. Mol. Biol. 234, 1-12.   13. Pascarella, S. and Argos, P. (1992). J. Mol. Biol. 224, 461-471.   14. Collins, J.F. and Coulson, A.F.W. (1987). In Nucleic acid and       protein sequence analysis a practical approach, Bishop, M.J. and       Rawlings, C.J. ed., chapter 13, pp. 323-358.   15. Vingron, M. and Sibbald, P.R. (1993). Proc. Natl. Acad. Sci. USA,       90, 8777-8781.   16. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994). CABIOS, 10,       19-29.   17. Lthy, R., Xenarios, I. and Bucher, P. (1994). Protein Science, 3,       139-146.   18. Higgins, D.G. and Sharp, P.M. (1988). Gene, 73, 237-244.   19. Higgins, D.G. and Sharp, P.M. (1989). CABIOS, 5, 151-153.   20. Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992). CABIOS, 8,       189-191.   21. Sneath, P.H.A. and Sokal, R.R. (1973). Numerical Taxonomy, W.H.       Freeman, San Francisco.   22. Saitou, N. and Nei, M. (1987). Mol. Biol. Evol. 4, 406-425.   23. Wilbur, W.J. and Lipman, D.J. (1983). Proc. Natl. Acad. Sci. USA,       80, 726-730.   24. Musacchio, A., Gibson, T., Lehto, V.-P. and Saraste, M. (1992).       FEBS Lett. 307, 55-61.   25. Musacchio, A., Noble, M., Pauptit, R., Wierenga, R. and Saraste,       M. (1992). Nature, 359, 851-855.   26. Bashford, D., Chothia, C. and Lesk, A.M. (1987). J. Mol. Biol.       196, 199-216.   27. Myers, E.W. and Miller, W. (1988). CABIOS, 4, 11-17.   28. Thompson, J.D. (1994). CABIOS, (Submitted).   29. Smith, T.F., Waterman, M.S. and Fitch, W.M. (1981). J. Mol. Evol.       18, 38-46.   30. Pearson, W.R. and Lipman, D.J. (1988). Proc. Natl. Acad. Sci. USA.       85, 2444-2448.   31. Devereux, J., Haeberli, P. and Smithies, O. (1984). Nucleic Acids       Res. 12, 387-395.   32. Felsenstein, J. (1989). Cladistics 5, 164-166.   33. Kimura, M. (1980). J. Mol. Evol. 16, 111-120.   34. Kimura, M. (1983). The Neutral Theory of Molecular Evolution.       Cambridge University Press, Cambridge.   35. Felsenstein, J. (1985). Evolution 39, 783-791.   36. Smith, R.F. and Smith, T.F. (1992) Protein Engineering 5, 35-41.   37. Krogh, A., Brown, M., Mian, S., Sjlander, K. and Haussler, D.       (1994) J. Mol. Biol. 235-1501-1531.   38. Jones, D.T., Taylor, W.R. and Thornton, J.M. (1994). FEBS Lett.       339, 269-275.   39. Bairoch, A. and Bckmann, B. (1992) Nucleic Acids Res., 20,       2019-2022.   40. Noble, M.E.M., Musacchio, A., Saraste, M., Courtneidge, S.A. and       Wierenga, R.K. (1993) EMBO J. 12, 2617-2624.   41. Kabsch, W. and Sander, C. (1983) Biopolymers, 22, 2577-2637.Warnings   None.Diagnostic Error Messages   "cannot find program 'clustalw'" - means that the ClustalW program has   not been set up on your site or is not in your environment (i.e. is   not on your path). The solutions are to (1) install clustalw in the   path so that emma can find it with the command "clustalw", or (2)   define a variable (an environment variable of in emboss.defaults or   your .embossrc file) called EMBOSS_CLUSTALW containing the command   (program name or full path) to run clustalw if you have it elsewhere   on your system.Exit status   It exits with status 0 unless an error is reportedKnown bugs   None.See also   Program name                       Description   edialign     Local multiple alignment of sequences   infoalign    Information on a multiple sequence alignment   plotcon      Plot quality of conservation of a sequence alignment   prettyplot   Displays aligned sequences, with colouring and boxing   showalign    Displays a multiple sequence alignment   tranalign    Align nucleic coding regions given the aligned proteinsAuthor(s)   Mark Faller (current e-mail address unknown)   while he was with:   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UKHistory   Completed 18 February 1999Target users   This program is intended to be used by everyone and everything, from   naive users to embedded scripts.Comments   None
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -