📄 getorf.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
                                  getorf Function   Finds and extracts open reading frames (ORFs)Description   This program finds and outputs the sequences of open reading frames   (ORFs).   The ORFs can be defined as regions of a specified minimum size between   STOP codons or between START and STOP codons.   The ORFs can be output as the nucleotide sequence or as the   translation.   The program can also output the region around the START or the initial   STOP codon or the ending STOP codons of an ORF for those doing   analysis of the properties of these regions.   The START and STOP codons are defined in the Genetic Code tables. A   suitable Genetic Code table can be selected for the organism you are   investigating.Usage   Here is a sample session with getorf% getorf -minsize 300 Finds and extracts open reading frames (ORFs)Input nucleotide sequence(s): tembl:eclaciprotein output sequence(s) [eclaci.orf]:    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequence]          seqall     Nucleotide sequence(s) filename and optional                                  format, or reference (input USA)  [-outseq]            seqoutall  [.] Protein sequence                                  set(s) filename and optional format (output                                  USA)   Additional (Optional) qualifiers:   -table              menu       [0] Code to use (Values: 0 (Standard); 1                                  (Standard (with alternative initiation                                  codons)); 2 (Vertebrate Mitochondrial); 3                                  (Yeast Mitochondrial); 4 (Mold, Protozoan,                                  Coelenterate Mitochondrial and                                  Mycoplasma/Spiroplasma); 5 (Invertebrate                                  Mitochondrial); 6 (Ciliate Macronuclear and                                  Dasycladacean); 9 (Echinoderm                                  Mitochondrial); 10 (Euplotid Nuclear); 11                                  (Bacterial); 12 (Alternative Yeast Nuclear);                                  13 (Ascidian Mitochondrial); 14 (Flatworm                                  Mitochondrial); 15 (Blepharisma                                  Macronuclear); 16 (Chlorophycean                                  Mitochondrial); 21 (Trematode                                  Mitochondrial); 22 (Scenedesmus obliquus);                                  23 (Thraustochytrium Mitochondrial))   -minsize            integer    [30] Minimum nucleotide size of ORF to                                  report (Any integer value)   -maxsize            integer    [1000000] Maximum nucleotide size of ORF to                                  report (Any integer value)   -find               menu       [0] This is a small menu of possible output                                  options. The first four options are to                                  select either the protein translation or the                                  original nucleic acid sequence of the open                                  reading frame. There are two possible                                  definitions of an open reading frame: it can                                  either be a region that is free of STOP                                  codons or a region that begins with a START                                  codon and ends with a STOP codon. The last                                  three options are probably only of interest                                  to people who wish to investigate the                                  statistical properties of the regions around                                  potential START or STOP codons. The last                                  option assumes that ORF lengths are                                  calculated between two STOP codons. (Values:                                  0 (Translation of regions between STOP                                  codons); 1 (Translation of regions between                                  START and STOP codons); 2 (Nucleic sequences                                  between STOP codons); 3 (Nucleic sequences                                  between START and STOP codons); 4                                  (Nucleotides flanking START codons); 5                                  (Nucleotides flanking initial STOP codons);                                  6 (Nucleotides flanking ending STOP codons))   Advanced (Unprompted) qualifiers:   -[no]methionine     boolean    [Y] START codons at the beginning of protein                                  products will usually code for Methionine,                                  despite what the codon will code for when it                                  is internal to a protein. This qualifier                                  sets all such START codons to code for                                  Methionine by default.   -circular           boolean    [N] Is the sequence circular   -[no]reverse        boolean    [Y] Set this to be false if you do not wish                                  to find ORFs in the reverse complement of                                  the sequence.   -flanking           integer    [100] If you have chosen one of the options                                  of the type of sequence to find that gives                                  the flanking sequence around a STOP or START                                  codon, this allows you to set the number of                                  nucleotides either side of that codon to                                  output. If the region of flanking                                  nucleotides crosses the start or end of the                                  sequence, no output is given for this codon.                                  (Any integer value)   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outseq" associated qualifiers   -osformat2          string     Output seq format   -osextension2       string     File name extension   -osname2            string     Base file name   -osdirectory2       string     Output directory   -osdbname2          string     Database name to add   -ossingle2          boolean    Separate file for each entry   -oufo2              string     UFO features   -offormat2          string     Features format   -ofname2            string     Features file name   -ofdirectory2       string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   getorf reads any nucleic acid sequence USA.  Input files for usage example   'tembl:eclaci' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:eclaciID   ECLACI     standard; DNA; PRO; 1113 BP.XXAC   V00294;XXSV   V00294.1XXDT   09-JUN-1982 (Rel. 01, Created)DT   10-FEB-1999 (Rel. 58, Last updated, Version 2)XXDE   E. coli laci gene (codes for the lac repressor).XXKW   DNA binding protein; repressor.XXOS   Escherichia coliOC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;OC   Escherichia.XXRN   [1]RP   1-1113RX   MEDLINE; 78246991.RA   Farabaugh P.J.;RT   "Sequence of the lacI gene";RL   Nature 274:765-769(1978).XXDR   SWISS-PROT; P03023; LACI_ECOLI.XXCC   KST ECO.LACIXXFH   Key             Location/QualifiersFHFT   source          1..1113FT                   /db_xref="taxon:562"FT                   /organism="Escherichia coli"FT   CDS             31..1113FT                   /db_xref="SWISS-PROT:P03023"FT                   /note="reading frame"FT                   /transl_table=11FT                   /protein_id="CAA23569.1"FT                   /translation="MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELFT                   NYIPNRVAQQLAGKQSLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVFT                   EACKAAVHNLLAQRVSGLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHFT                   EDGTRLGVEHLVALGHQQIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAFT                   MSGFQQTMQMLNEGIVPTAMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCFT                   YIPPSTTIKQDFRLLGQTSVDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASPRFT                   ALADSLMQLARQVSRLESGQ"XXSQ   Sequence 1113 BP; 249 A; 304 C; 322 G; 238 T; 0 other;     ccggaagaga gtcaattcag ggtggtgaat gtgaaaccag taacgttata cgatgtcgca        60     gagtatgccg gtgtctctta tcagaccgtt tcccgcgtgg tgaaccaggc cagccacgtt       120     tctgcgaaaa cgcgggaaaa agtggaagcg gcgatggcgg agctgaatta cattcccaac       180     cgcgtggcac aacaactggc gggcaaacag tcgttgctga ttggcgttgc cacctccagt       240     ctggccctgc acgcgccgtc gcaaattgtc gcggcgatta aatctcgcgc cgatcaactg       300     ggtgccagcg tggtggtgtc gatggtagaa cgaagcggcg tcgaagcctg taaagcggcg       360     gtgcacaatc ttctcgcgca acgcgtcagt gggctgatca ttaactatcc gctggatgac       420     caggatgcca ttgctgtgga agctgcctgc actaatgttc cggcgttatt tcttgatgtc       480     tctgaccaga cacccatcaa cagtattatt ttctcccatg aagacggtac gcgactgggc       540     gtggagcatc tggtcgcatt gggtcaccag caaatcgcgc tgttagcggg cccattaagt       600     tctgtctcgg cgcgtctgcg tctggctggc tggcataaat atctcactcg caatcaaatt       660     cagccgatag cggaacggga aggcgactgg agtgccatgt ccggttttca acaaaccatg       720
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -