📄 getorf.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12
     caaatgctga atgagggcat cgttcccact gcgatgctgg ttgccaacga tcagatggcg       780     ctgggcgcaa tgcgcgccat taccgagtcc gggctgcgcg ttggtgcgga tatctcggta       840     gtgggatacg acgataccga agacagctca tgttatatcc cgccgtcaac caccatcaaa       900     caggattttc gcctgctggg gcaaaccagc gtggaccgct tgctgcaact ctctcagggc       960     caggcggtga agggcaatca gctgttgccc gtctcactgg tgaaaagaaa aaccaccctg      1020     gcgcccaata cgcaaaccgc ctctccccgc gcgttggccg attcattaat gcagctggca      1080     cgacaggttt cccgactgga aagcgggcag tga                                   1113//Output file format   The output is a sequence file containing predicted open reading frames   longer than the minimum size, which defaults to 30 bases (i.e. 10   amino acids).  Output files for usage example  File: eclaci.orf>ECLACI_1 [735 - 1112] E. coli laci gene (codes for the lac repressor).GHRSHCDAGCQRSDGAGRNARHYRVRAARWCGYLGSGIRRYRRQLMLYPAVNHHQTGFSPAGANQRGPLAATLSGPGGEGQSAVARLTGEKKNHPGAQYANRLSPRVGRFINAAGTTGFPTGKRAV>ECLACI_2 [1 - 1110] E. coli laci gene (codes for the lac repressor).PEESQFRVVNVKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQSLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVSGLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQQIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPTAMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPSTTIKQDFRLLGQTSVDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASPRALADSLMQLARQVSRLESGQ*>ECLACI_3 [465 - 49] (REVERSE SENSE) E. coli laci gene (codes for the lac repressor).RRNISAGSFHSNGILVIQRIVNDQPTDALREKIVHRRFTGFDAASFYHRHHHAGTQLIGARFNRRDNLRRRVQGQTGGGNANQQRLFARQLLCHAVGNVIQLRHRRFHFFPRFRRNVAGLVHHAGNGLIRDTGILCDIV   The name of the ORF sequences is constructed from the name of the   input sequence with an underscore character ('_') and a unique ordinal   number of the ORF found appended. The description of the output ORF   sequence is constructed from the description of the input sequence   with the start and end positions of the ORF prepended.   The unique number appended to the name is simply used to create new   unique sequence names, it does not imply any further information   indicating any order, positioning or sense-strand of the ORFs.   If the ORF has been found in the reverse sense, then the start   position will be smaller than the end position. The numbering uses the   forward-sense positions, but read in the reverse sense. For example,   >ECLACI_3 [465 - 49] in the output above is a reverse-sense ORF   running from position 465 to 49. The description will also contain   '(REVERSE SENSE)'.   If the sequence has been specified as a circular genome (using the   command-line switch '-circular'), then ORFs can potentially continue   past the 'end' of the input sequence (the breakpoint of the circular   genome) and into the 'start' of the sequence again. This is dealt with   by appending the sequence to itself three times and reporting long   ORFs that are found in this extended sequence. Any ORF that is longer   that three times the sequence length (i.e one that continues without   hitting a STOP at any point in the genome) will be reported as being a   maximum of three times the length of the input sequence. Note that the   end position of an ORF in circular genomes can be apparently longer   than the input sequence if the ORF crosses the breakpoint. If the ORF   crosses the breakpoint, then the text '(ORF crosses the breakpoint)'   will be added to the description of the output sequence.Data files   The START and STOP codons used by getorf are defined in the Genetic   Code data files. By default, Genetic Code file EGC.0 is used.   The default file EGC.0 is the 'Standard Code' with the rarely used   alternate START codons omitted, it only has the normal 'AUG' START   codon. The 'Standard Code' with the rarely used alternate START codons   included is Genetic Code file EGC.1.   It is expected that user will sometimes wish to customise a Genetic   Code file. To do this, use the program embossdata.   EMBOSS data files are distributed with the application and stored in   the standard EMBOSS data directory, which is defined by the EMBOSS   environment variable EMBOSS_DATA.   To see the available EMBOSS data files, run:% embossdata -showall   To fetch one of the data files (for example 'Exxx.dat') into your   current directory for you to inspect or modify, run:% embossdata -fetch -file Exxx.dat   Users can provide their own data files in their own directories.   Project specific files can be put in the current directory, or for   tidier directory listings in a subdirectory called ".embossdata".   Files for all EMBOSS runs can be put in the user's home directory, or   again in a subdirectory called ".embossdata".   The directories are searched in the following order:     * . (your current directory)     * .embossdata (under your current directory)     * ~/ (your home directory)     * ~/.embossdata   The Genetic Code data files are based on the NCBI genetic code tables.   Their names and descriptions are:   EGC.0          Standard (Differs from GC.1 in that it only has initiation site          'AUG')   EGC.1          Standard   EGC.2          Vertebrate Mitochodrial   EGC.3          Yeast Mitochondrial   EGC.4          Mold, Protozoan, Coelenterate Mitochondrial and          Mycoplasma/Spiroplasma   EGC.5          Invertebrate Mitochondrial   EGC.6          Ciliate Macronuclear and Dasycladacean   EGC.9          Echinoderm Mitochondrial   EGC.10          Euplotid Nuclear   EGC.11          Bacterial   EGC.12          Alternative Yeast Nuclear   EGC.13          Ascidian Mitochondrial   EGC.14          Flatworm Mitochondrial   EGC.15          Blepharisma Macronuclear   EGC.16          Chlorophycean Mitochondrial   EGC.21          Trematode Mitochondrial   EGC.22          Scenedesmus obliquus   EGC.23          Thraustochytrium Mitochondrial   The format of these files is very simple.   It consists of several lines of optional comments, each starting with   a '#' character.   These are followed the line: 'Genetic Code [n]', where 'n' is the   number of the genetic code file.   This is followed by the description of the code and then by four lines   giving the IUPAC one-letter code of the translated amino acid, the   start codons (indicdated by an 'M') and the three bases of the codon,   lined up one on top of the other.   For example:------------------------------------------------------------------------------# Genetic Code Table## Obtained from: http://www.ncbi.nlm.nih.gov/collab/FT/genetic_codes.html# and: http://www3.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c## Differs from Genetic Code [1] only in that the initiation sites have been# changed to only 'AUG'Genetic Code [0]StandardAAs  =   FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGGStarts = -----------------------------------M----------------------------Base1  = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGGBase2  = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGBase3  = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG------------------------------------------------------------------------------Notes   If you have selected one of the options to report a regions around a   START or STOP codon, then note that any such region that crosses the   beginning or end of the sequence will not be reported.References   None.Warnings   None.Diagnostic Error Messages   None.Exit status   It always exits with status 0.Known bugs   '-sbegin' and -send' do not work with this program.See also   Program name                        Description   marscan      Finds MAR/SAR sites in nucleic sequences   plotorf      Plot potential open reading frames   showorf      Pretty output of DNA translations   sixpack      Display a DNA sequence with 6-frame translation and ORFs   syco         Synonymous codon usage Gribskov statistic plot   tcode        Fickett TESTCODE statistic to identify protein-coding DNA   wobble       Wobble base plot     * checktrans - Reports STOP codons and ORF statistics of a protein       sequenceAuthor(s)   Gary Williams (gwilliam
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -