📄 readme.dna
字号:
DNA Latest Version: 1.1.0 (May 6, 2000) Thomas Nelson (tjnelson@las1.ninds.nih.gov, tjnelson@helix.nih.gov) DNA is a miscellaneous collection of utilities for handling sequence data in Linux. The programs should compile without difficulty in any Unix system (type 'make'). If desired, they can be manually copied to some suitable location, e.g., /usr/local/bin. These programs were written to remedy deficiencies in other, more complete packages. No attempt was made to replicate the complete functionality of programs like SeqLab. However, if you find a bug, please notify me and I will post an updated version. If you would like to contribute a useful utility of your own, let me know and I will include it in the package. These programs can be freely distributed and altered provided that any altered version is clearly indicated as such. So far the package consists of the following: dna - Motif-based sequence editor. Has the following features: 1. Edit up to 256 peptide or DNA sequences simultaneously. 2. Translates DNA->protein; click 'next' to display next frame. 3. Dot matrix plot of any 2 sequences. 4. Rudimentary amino acid statistics (MW and amino acid percentage) 5. Saves matrix plot as PBM image format. 6. Sequence reversal. 7. Creates alignment file for 'highlight' (below). 8. Tab key toggles editing of next sequence. Various display parameters can be changed by editing the configuration file ~/.dna/dna.ini. Motif and X parameters can be set in ~/.Xdefaults. See X documentation for details on .Xdefaults. Printing requires pnmtops (which is part of the netpbm package) to be on your system. This can be downloaded from http://download.sourceforge.net/netpbm (this package in turn also requires libpng). dnatrans - Converts DNA or RNA to protein. All 6 strands are translated, written to separate files, and given suffixes of '.1' to '.6'. They are also printed to the screen. seqrev - converts sense<->antisense. The output strand is written from 5'->3'. highlight - displays multiple sequences and creates a PostScript file showing identical bases or related bases/amino acids highlighted. Highlight does not create an alignment, it merely creates an output file suitable for inclusion in a LaTeX document or for printing. The file can be viewed with 'gv' or 'gs' or printed on a PostScript printer. The questions asked by 'highlight' are: Controlling sequence - The sequence to which similarities are to be compared. Can be: 1. 1st - only matches to the top sequence are highlighted. 2. 1st+2nd - matches to the first or 2nd sequence are highlighted. 3. Any - All identities in each position are highlighted. For example, if two sequences contain a 'D', and two others contain a 'R' at that position, both D and R are highlighted. 4. Highest - Only the matches that are present in the greatest number of sequences (i.e., the consensus) are highlighted. If two of these exist, only the first one is highlighted. For example, if the first 3 sequences contain 'M' and the next 3 contain 'A', only the 'M's will be highlighted. Feel free to change this if you don't like it. 5. Next - Each sequence is compared with the one that comes after it. Enter 1=identity 2=homology - If 'identity' is selected, the positions must be identical. If 'homology' is selected, the following homologies are assumed: D is homologous to E Q is homologous to N F is homologous to W or Y I is homologous to L K is homologous to R S is homologous to T Note that the homologies are all symmetric. The homologies are conservative. For example, aspartate and asparagine are considered non-homologous, as are glutamate and glutamine. This is obviously not desirable if the results are from an Edman, and is easy to change if you don't like it. Well maybe not easy but possible. Font - Can be any valid PostScript font name, for example: Helvetica Helvetica-Oblique Helvetica-Bold Times-Roman Courier Do a 'strings /usr/bin/gs' for a complete list. Any font, including proportional fonts, will work, but Courier and Helvetica are the traditional choices for sequences. BW/Color - selects whether color or grayscale should be used. Normal character RGBcolor, etc. - If 'color' is selected, change these values to define the color to use. There must be 3 numbers between 0 and 1 where 0 is no color and 1 is the maximal color. For example, 0.000 0.500 1.000 will create a color with no red, 50% of green, and 100% of the maximum possible blue. Normal character gray value, etc. - If 'B/W' is selected, change these values to define the grayscale value to use. This must be one number between 0 and 1. For example, 0.500 will create a medium gray. Hitting <Enter> will accept the default value. If you change the color, be sure to enter the parameter in the correct format, otherwise the Postscript file will be invalid and you will not get any printout. Needless to say, care should be taken to make sure the character color and the background colors are different. Spacing between chunks - The sequences are wrapped if they are wider than a page width. This parameter sets the spacing between wrapped groups of sequences. Chunks per page - Number of wrapped groups of sequences per page. All the programs except 'highlight' take raw sequences as input. Lines containing documentation or comments must start with a hash mark '#'. Thus, if your data are in EMBL format, the sequences must be stripped out or prefaced with a '#'. Spaces, punctuation marks, and digits are ignored. Highlight requires data in the following format: o 1 sequence per line o Sequence name is at the beginning of each line o A vertical bar separates the name from the sequence data o Maximum of 1000 sequences total, of 10000 characters each. (This can be easily changed). o Everything between the lines "SEQUENCES" and "END" is considered to be a sequence. Everything outside this is a label. o Labels can be BOX, TITLE, COMMENT, LABEL, or TEXT. There can be a total of 1000 labels of any type. TITLE - is followed by the title BOX - followed by starting x position, ending x position, starting sequence no., and ending sequence no. The box is drawn to include all the indicated characters. COMMENT - is ignored LABEL - followed by starting x position and starting sequence position. These do not have to be valid numbers; for example LABEL 2 0 will draw a label above the sequences starting at the 2nd character in the sequence. Label is drawn in the same font as sequence names. TEXT - followed by a starting y position in sequence units. Similar to LABEL, except that the x label positions are obtained from its position in the text string. This makes it easier to align a label with a specific sequence element, since the label can be placed directly above the position being labeled. Label is drawn in the same font as sequence names and is printed left-justified starting at the position at which it appears in the text string. Successive labels must be separated by two or more spaces. Labels are not truncated at the end of a line, so they can extend past the end of the sequence. For example: TITLE Alignment of calcium-binding proteins BOX 100 120 1 4 BOX 21 31 1 16 COMMENT The above will draw a box around DFQ..RV- down to -YRG..IV- BOX 204 215 1 16 BOX 322 324 1 1 COMMENT This is a comment LABEL 10 0 This is a label above the M in the first sequence # The next line will put labels above the first sequence (1=cp20b) TEXT 0 The LS motif The FNTFY motif # The next line will numbers below the last (16th) sequence # Note, each number must be separated by 2 spaces otherwise they # will be spaced incorrectly. TEXT 17 10 20 30 40 SEQUENCES calexcitin | MA-AHQ LS- DFQRNKILRV- FNTFY DC ... scp2.pep | KKKTNTIMS IS- DFRKKKLLFL- FNVFF DV f56d1.6 | MVVAKPTAAVS IE- DLIKKHSDVDP FLVKK -- kchip1 | ..YAQFFP HG- DASTYAH-YL- FNAF- DT kchip2 | ..YSQFFP QG- DSSTYAT-FL- FNAF- DT kchip3 | ..YAQFFP QG- DATTYAH-FL- FNAF- DA kip2 | MGNKQTIFT EE QLDNYQ DCTF FN KK DI frequenin | MG KK SSK LKQ-DTIDRLTTDTY F-TEK EI neurocalcin | MG K QNSK LAP EVMEDLVKSTE FN-EH EL c18b11.04- | MG KSQ SK LSQ DQLQDLVRSTR FD-KK EL redb | KRET WQ TS- EHAGRD----- TSRHS MA caltractin | MARRGQQPPPQQAPPAQKNQTGK --- FNPA- EF calcyphosine|MDAVDATVEKLRAQC LSR GALGIQGLAR- FFRRL D- calretinin | ..KYDK NS DGKIEMAELAQI LPTEENFL oncomodulin | -MSITDI LS AEDIAAA L QE CQ gpxa_neime |GNAVD---------- LSG -YRGKVLLIV- -NTAT -- END Any other characters such as '.', '-' are printed as they appear but do not influence whether a position is highlighted.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -