📄 needle.txt
字号:
FT TURN 114 116FT HELIX 119 136FT TURN 137 139SQ SEQUENCE 141 AA; 15126 MW; 5EC7DB1E CRC32; VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA VHASLDKFLA SVSTVLTSKY R// Database entry: tsw:hbb_humanID HBB_HUMAN STANDARD; PRT; 146 AA.AC P02023;DT 21-JUL-1986 (Rel. 01, Created)DT 21-JUL-1986 (Rel. 01, Last sequence update)DT 15-JUL-1999 (Rel. 38, Last annotation update)DE HEMOGLOBIN BETA CHAIN.GN HBB.OS Homo sapiens (Human), Pan troglodytes (Chimpanzee), andOS Pan paniscus (Pygmy chimpanzee) (Bonobo).OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;OC Eutheria; Primates; Catarrhini; Hominidae; Homo.RN [1]RP SEQUENCE.RC SPECIES=HUMAN;RA BRAUNITZER G., GEHRING-MULLER R., HILSCHMANN N., HILSE K., HOBOM G.,RA RUDLOFF V., WITTMANN-LIEBOLD B.;RT "The constitution of normal adult human haemoglobin.";RL Hoppe-Seyler's Z. Physiol. Chem. 325:283-286(1961).RN [2]RP SEQUENCE FROM N.A.RC SPECIES=HUMAN;RX MEDLINE; 81064667.RA LAWN R.M., EFSTRATIADIS A., O'CONNELL C., MANIATIS T.;RT "The nucleotide sequence of the human beta-globin gene.";RL Cell 21:647-651(1980).RN [3]RP SEQUENCE OF 121-146 FROM N.A.RC SPECIES=HUMAN;RX MEDLINE; 85205333.RA LANG K.M., SPRITZ R.A.;RT "Cloning specific complete polyadenylylated 3'-terminal cDNART segments.";RL Gene 33:191-196(1985).RN [4]RP X-RAY CRYSTALLOGRAPHY (2.5 ANGSTROMS) OF DEOXYHEMOGLOBIN.RC SPECIES=HUMAN;RX MEDLINE; 76027820.RA FERMI G.;RT "Three-dimensional fourier synthesis of human deoxyhaemoglobin atRT 2.5-A resolution: refinement of the atomic model.";RL J. Mol. Biol. 97:237-256(1975).RN [5]RP SEQUENCE.RC SPECIES=P.TROGLODYTES;RX MEDLINE; 66071496.RA RIFKIN D.B., KONIGSBERG W.;RT "The characterization of the tryptic peptides from the hemoglobin ofRT the chimpanzee (Pan troglodytes).";RL Biochim. Biophys. Acta 104:457-461(1965).RN [6] [Part of this file has been deleted for brevity]FT VARIANT 140 140 A -> T (IN ST JACQUES: O2 AFFINITY UP).FT /FTId=VAR_003081.FT VARIANT 140 140 A -> V (IN PUTTELANGE; POLYCYTHEMIA;FT O2 AFFINITY UP).FT /FTId=VAR_003082.FT VARIANT 141 141 L -> R (IN OLMSTED; UNSTABLE).FT /FTId=VAR_003083.FT VARIANT 142 142 A -> D (IN OHIO; O2 AFFINITY UP).FT /FTId=VAR_003084.FT VARIANT 143 143 H -> D (IN RANCHO MIRAGE).FT /FTId=VAR_003085.FT VARIANT 143 143 H -> Q (IN LITTLE ROCK; O2 AFFINITY UP).FT /FTId=VAR_003086.FT VARIANT 143 143 H -> P (IN SYRACUSE; O2 AFFINITY UP).FT /FTId=VAR_003087.FT VARIANT 143 143 H -> R (IN ABRUZZO; O2 AFFINITY UP).FT /FTId=VAR_003088.FT VARIANT 144 144 K -> E (IN MITO; O2 AFFINITY UP).FT /FTId=VAR_003089.FT VARIANT 145 145 Y -> C (IN RAINIER; O2 AFFINITY UP).FT /FTId=VAR_003090.FT VARIANT 145 145 Y -> H (IN BETHESDA; O2 AFFINITY UP).FT /FTId=VAR_003091.FT VARIANT 146 146 H -> D (IN HIROSHIMA; O2 AFFINITY UP).FT /FTId=VAR_003092.FT VARIANT 146 146 H -> L (IN COWTOWN; O2 AFFINITY UP).FT /FTId=VAR_003093.FT VARIANT 146 146 H -> P (IN YORK; O2 AFFINITY UP).FT /FTId=VAR_003094.FT VARIANT 146 146 H -> Q (IN KODAIRA; O2 AFFINITY UP).FT /FTId=VAR_003095.FT HELIX 5 15FT TURN 16 17FT HELIX 20 34FT HELIX 36 41FT HELIX 43 45FT HELIX 51 55FT TURN 56 56FT HELIX 58 75FT TURN 76 77FT HELIX 78 94FT TURN 95 96FT TURN 100 100FT HELIX 101 121FT HELIX 124 142FT TURN 143 144SQ SEQUENCE 146 AA; 15867 MW; EC9744C9 CRC32; VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK EFTPPVQAAY QKVVAGVANA LAHKYH//Output file format The output is a standard EMBOSS alignment file. The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats. Output files for usage example File: hba_human.needle######################################### Program: needle# Rundate: Sat 15 Jul 2006 12:00:00# Commandline: needle# [-asequence] tsw:hba_human# [-bsequence] tsw:hbb_human# Align_format: srspair# Report_file: hba_human.needle#########################################=======================================## Aligned_sequences: 2# 1: HBA_HUMAN# 2: HBB_HUMAN# Matrix: EBLOSUM62# Gap_penalty: 10.0# Extend_penalty: 0.5## Length: 148# Identity: 63/148 (42.6%)# Similarity: 88/148 (59.5%)# Gaps: 9/148 ( 6.1%)# Score: 290.5###=======================================HBA_HUMAN 1 -VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DL 48 .|:|.:|:.|.|.|||| :..|.|.|||.|:.:.:|.|:.:|..| ||HBB_HUMAN 1 VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDL 48HBA_HUMAN 49 S-----HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRV 93 | .|:.:||.|||||..|.::.:||:|::....:.||:||..||.|HBB_HUMAN 49 STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV 98HBA_HUMAN 94 DPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR 141 ||.||:||.:.|:..||.|...||||.|.|:..|.:|.|:..|..||.HBB_HUMAN 99 DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH 146#---------------------------------------#--------------------------------------- The Identity: is the percentage of identical matches between the two sequences over the reported aligned region (including any gaps in the length). The Similarity: is the percentage of matches between the two sequences over the reported aligned region (including any gaps in the length).Data files For protein sequences EBLOSUM62 is used for the substitution matrix. For nucleotide sequence, EDNAFULL is used. Others can be specified. EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA. To see the available EMBOSS data files, run:% embossdata -showall To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:% embossdata -fetch -file Exxx.dat Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata". The directories are searched in the following order: * . (your current directory) * .embossdata (under your current directory) * ~/ (your home directory) * ~/.embossdataNotes needle is a true implementation of the Needleman-Wunsch algorithm and so produces a full path matrix. It therefore cannot be used with genome sized sequences unless you've a lot of memory and a lot of time.References 1. Needleman, S. B. and Wunsch, C. D. (1970) J. Mol. Biol. 48, 443-453. 2. Kruskal, J. B. (1983) An overview of squence comparison In D. Sankoff and J. B. Kruskal, (ed.), Time warps, string edits and macromolecules: the theory and practice of sequence comparison, pp. 1-44 Addison Wesley.Warnings needle is for aligning two sequences over their entire length. This works best with closely related sequences. If you use needle to align very distantly-related sequences, it will produce a result but much of the alignment may have little or no biological significance. A true Needleman Wunsch implementation like needle needs memory proportional to the product of the sequence lengths. For two sequences of length 10,000,000 and 1,000 it therefore needs memory proportional to 10,000,000,000 characters. Two arrays of this size are produced, one of ints and one of floats so multiply that figure by 8 to get the memory usage in bytes. That doesn't include other overheads. Therefore only use water and needle for accurate alignment of reasonably short sequences. If you run out of memory, try using stretcher instead.Diagnostic Error MessagesUncaught exception Assertion failed raised at ajmem.c:xxx Probably means you have run out of memory. Try using stretcher if this happens.Exit status 0 upon successful completion.Known bugs None.See also Program name Description est2genome Align EST and genomic DNA sequences stretcher Finds the best global alignment between two sequences When you want an alignment that covers the whole length of both sequences, use needle. When you are trying to find the best region of similarity between two sequences, use water. stretcher is a more suitable program to use to find global alignments of very long sequences.Author(s) Alan Bleasby (ajb
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -