📄 edialign.txt
字号:
File: vtest.seq>HTL2LDTAPCLFSDGSPQKAAYVLWDQTILQQDITPLPSHETHSAQKGELLALICGLRAAKPWPSLNIFLDSKY>MMLVGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL>HEPBRPGLCQVFADATPTGWGLVMGHQRMRGTFSAPLPIHTAELLAACFARSRSGANIIGTDNSGRTSLYADSPSVPSHLPDRVHOutput file format edialign produces two output files with a multiple sequence alignment. The first one is a file in DIALIGN format and the second one is a sequence file in any format you choose (by default fastA). Capital letters denote aligned residues, i.e. residues involved in at least one of the "diagonals" in the alignment. Lower-case letters denote residues not belonging to any of these selected "diagonals". They are not considered to be aligned by DIALIGN. Thus, if a lower-case letter is standing in the same column with other letters, this is pure chance ; these residues are not considered to be homologous. Numbers below the alignment reflect the degree of local similarity among sequences. More precisely, they represent the sum of weights of fragments connecting residues at the respective position. These numbers are normalized such that regions of maximum similarity always get a score of 9 - no matter how strong this maximum simliarity is. In previous verions of the program, '*' characters were used instead of numbers ; with the -stars=n option, '*' characters can be used as previously. At the bottom of the file you can find the "guide tree" used to make the alignment, written in "nested parentheses" format. Output files for usage example File: vtest.fasta>HTL2ldtapC-LFSDGSPQKAAYVLWDQTILQQDITPLPSHethsaqkgELLAliCglraAKPWPSLNIFLDSKY------------------------------------------------------------------------------------------>MMLVgkk-----------------------------------------------------------LNVYTDSRYafatahihgeiyrrrglltsegkeiknkdeilallkalflpkrlsiihcpghqkghsaeargnrmADQAARKAAITETPDTSTLL----->HEPBrpgl-CqVFADATPTGWGLVMGHQRMRGTFSAPLPIHta------ELLAa-Cf---ARSRSGANIIg---------------------------------------------------------------------TDNSGRTSLYADSPSVPSHLpdrvh File: vtest.edialign DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: dialign (at) gobics (dot) de Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218. For more information, please visit the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/ ************************************************************ program call: edialign Aligned sequences: length: ================== ======= 1) HTL2 70 2) MMLV 97 3) HEPB 81 Average seq. length: 82.7 Please note that only upper-case letters are considered to be aligned. Alignment (DIALIGN format): ===========================HTL2 1 ldtapC-LFS DGSPQKAAYV LWDQTILQQD ITPLPSHeth saqkgELLAlMMLV 1 gkk------- ---------- ---------- ---------- ----------HEPB 1 rpgl-CqVFA DATPTGWGLV MGHQRMRGTF SAPLPIHta- -----ELLAa 0000000999 9999999999 9999999999 9999999000 0000000000HTL2 50 iCglraAKPW PSLNIFLDSK Y--------- ---------- ----------MMLV 4 ---------- --LNVYTDSR Yafatahihg eiyrrrgllt segkeiknkdHEPB 44 -Cf---ARSR SGANIIg--- ---------- ---------- ---------- 0000000000 0077777777 7000000000 0000000000 0000000000HTL2 71 ---------- ---------- ---------- ---------- ----------MMLV 42 eilallkalf lpkrlsiihc pghqkghsae argnrmADQA ARKAAITETPHEPB 57 ---------- ---------- ---------- ------TDNS GRTSLYADSP 0000000000 0000000000 0000000000 0000001111 1111111111HTL2 71 ---------- -MMLV 92 DTSTLL---- -HEPB 71 SVPSHLpdrv h 1111110000 0 Sequence tree: ==============Tree constructed using UPGMA based on DIALIGN fragment weight scores((HTL2 :0.145587HEPB :0.145587):0.108531MMLV :0.254117);Data files The scoring schemes are hard coded in the program and cannot be changed. For proteins edialign always uses the BLOSUM62 table.Notes We strongly recommend to use the "translation" option if nucleic acid sequences are expected to contain protein coding regions, as it will significantly increase the sensitivity of the alignment procedure in such cases. If you want to compare long genomic sequences it is recommended to speed up the algorithm by: * setting "Nucleic acid sequence alignment mode" to "mixed alignment" (-nucmode=ma) * setting "Maximum fragment length" to 30 (-lmax=30) * setting "Consider only N-fragment pairs that start with two matches" to yes (-fragmat) and setting the similarity score threshold for considering P-fragment pairs to 8 (-fragsim=8) (which actually implies that you consider only fragments that start with a match). * setting the "Threshold" T to 2.0 (-threshold=2.0) It is also recommended to increase the chance of finding coding exons by setting "Nucleic acid sequence alignment mode" to "mixed alignment" (-nucmode=ma) and setting "Also consider the reverse complement" (-revcomp).References 1. B. Morgenstern, A. Dress, T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA 93, 12098 - 12103 (1996) 2. B. Morgenstern. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218 (1999). 3. B. Morgenstern, O. Rinner, S. Abdeddaim, D. Haase, K. F. X. Mayer, A. W. M. Dress H.-W. Mewes. Exon discovery by genomic sequence alignment. Bioinformatics 18, 777 - 787 (2002)Warnings Remember that lowercase characters represent parts of the sequence that are not aligned. You should not use the dialign output as such for sequence family or phylogeny studies, but take only part of the alignment and/or remove the lowercase characters using a multiple sequence editor. The current version of the program has no provision for doing this automatically.Diagnostic Error Messages None.Exit status It always exits with status 0.Known bugs None.See also Program name Description emma Multiple alignment program - interface to ClustalW program infoalign Information on a multiple sequence alignment plotcon Plot quality of conservation of a sequence alignment prettyplot Displays aligned sequences, with colouring and boxing showalign Displays a multiple sequence alignment tranalign Align nucleic coding regions given the aligned proteinsAuthor(s) The EMBOSS direct port was done by Alan Bleasby (ajb
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -