tcode.txt

来自「emboss的linux版本的源代码」· 文本代码 · 共 663 行 · 第 1/2 页
TXT
663 行
                                   tcode Function   Fickett TESTCODE statistic to identify protein-coding DNADescription   tcode tests DNA sequences for protein coding regions using an   algorithm which looks for simple and universal differences between   protein-coding and noncoding DNA.The original paper reports that the   test had been thoroughly proven on 400,000 bases of sequence data: it   misclassifies 5% of the regions tested and gives an answer of "No   Opinion" one fifth of the time.   The program slides a window of user-selectable size over the DNA   sequence. For each window the TESTCODE statistic is applied. The   results can be output as a text report or displayed graphically. The   text output reports each window as "Coding", "Noncoding" or "No   opinion". Entries marked "No opinion" have a TESTCODE value that falls   between the maximum and minimum values required to report a region as   noncoding or coding. For the graphical plot, all points above a green   horizontal line are determined to be coding regions. Those below a red   line are determined to be noncoding. Points between the red and green   lines are "no opinion" ones.Biological Relevance   The statistic reflects the fact that codons are used with unequal   frequency and that oligonucleotides and nucleotides tend to be   repeated with a periodicity of three.   This application can assist in determining the probability of a region   of nucleic sequence encoding a functional protein.Algorithm   The Fickett (1982) algorithm is used (1).   A window of at least 200 bases is moved over the sequence in steps of   3 bases   Let:  A1 = Number of A's in positions 1,4,7 ...  A2 = Number of A's in positions 2,5,8 ...  A3 = Number of A's in positions 3,6,9 ...   A position value is determined that reflects the degree to which each   base is favoured in one codon position over another, i.e.  Apos = MAX(A1,A2,A3) / MIN(A1,A2,A3)+1   This is done for all 4 bases. The percentage composition of each base   is also determined. Eight values are therefore determined, four   position values and four composition values. These are then converted   to probabilities (p) of coding using a look-up table provided as the   data file for the program. The values in this look-up table have been   determined experimentally using known coding and noncoding sequences.   Each of the probabilities is multiplied by a weight (w) value (again   from the look-up table) for the respective base. The weight value   reflects the percentage of the time that each parameter alone   successfully predicted coding or noncoding function for the sequences   of known function.   The TESTCODE statistic is then:  p1w1 + p2w2 + p3w3 + p4w4 + p5w5 + p6w6 + p7w7 + p8w8   A result of less than 0.74 is probably a non-coding region.   A result equal or greater than 0.95 is probably a coding region.   Anything in between these two values is uncertain.Usage   Here is a sample session with tcode% tcode Fickett TESTCODE statistic to identify protein-coding DNAInput nucleotide sequence(s): tembl:hsfau1Length of sliding window [200]: Output report [hsfau1.tcode]:    Go to the input files for this example   Go to the output files for this example   Example 2   Produce a graphical plot% tcode -plot -graph cps Fickett TESTCODE statistic to identify protein-coding DNAInput nucleotide sequence(s): tembl:hsfau1Length of sliding window [200]: Created tcode.ps   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers (* if not always prompted):  [-sequence]          seqall     Nucleotide sequence(s) filename and optional                                  format, or reference (input USA)   -window             integer    [200] This is the number of nucleotide bases                                  over which the TESTCODE statistic will be                                  performed each time. The window will then                                  slide along the sequence, covering the same                                  number of bases each time. (Integer 200 or                                  more)*  -outfile            report     [*.tcode] Output report file name*  -graph              xygraph    [$EMBOSS_GRAPHICS value, or x11] Graph type                                  (ps, hpgl, hp7470, hp7580, meta, cps, x11,                                  tekt, tek, none, data, xterm, png)   Additional (Optional) qualifiers: (none)   Advanced (Unprompted) qualifiers:   -datafile           datafile   [Etcode.dat] The default data file is                                  Etcode.dat and contains coding probabilities                                  for each base. The probabilities are for                                  both positional and compositional                                  information.   -step               integer    [3] The selected window will, by default,                                  slide along the nucleotide sequence by three                                  bases at a time, retaining the frame                                  (although the algorithm is not frame                                  sensitive). This may be altered to increase                                  or decrease the increment of the slide.                                  (Integer 1 or more)   -plot               toggle     [N] On selection a graph of the sequence (X                                  axis) plotted against the coding score (Y                                  axis) will be displayed. Sequence above the                                  green line is coding, that below the red                                  line is non-coding.   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -rformat            string     Report format   -rname              string     Base file name   -rextension         string     File name extension   -rdirectory         string     Output directory   -raccshow           boolean    Show accession number in the report   -rdesshow           boolean    Show description in the report   -rscoreshow         boolean    Show the score in the report   -rusashow           boolean    Show the full USA in the report   -rmaxall            integer    Maximum total hits to report   -rmaxseq            integer    Maximum hits to report for one sequence   "-graph" associated qualifiers   -gprompt            boolean    Graph prompting   -gdesc              string     Graph description   -gtitle             string     Graph title   -gsubtitle          string     Graph subtitle   -gxtitle            string     Graph x axis title   -gytitle            string     Graph y axis title   -goutfile           string     Output file for non interactive displays   -gdirectory         string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   tcode reads any normal sequence USAs.   The program will ignore ambiguity codes in the nucleic acid sequence   and just accept the four common bases. This is a function of the   algorithm, and the data tables.  Input files for usage example   'tembl:hsfau1' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:hsfau1ID   HSFAU1     standard; DNA; HUM; 2016 BP.XXAC   X65921; S45242;XXSV   X65921.1XXDT   13-MAY-1992 (Rel. 31, Created)DT   21-JUL-1993 (Rel. 36, Last updated, Version 5)XXDE   H.sapiens fau 1 geneXXKW   fau 1 gene.XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-2016RA   Kas K.;RT   ;RL   Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases.RL   K. Kas, University of Antwerp, Dept of Biochemistry T3.22,RL   Universiteitsplein 1, 2610 Wilrijk, BELGIUMXXRN   [2]RP   1-2016RX   MEDLINE; 92412144.RA   Kas K., Michiels L., Merregaert J.;RT   "Genomic structure and expression of the human fau gene: encoding theRT   ribosomal protein S30 fused to a ubiquitin-like protein.";RL   Biochem. Biophys. Res. Commun. 187:927-933(1992).XXDR   SWISS-PROT; P35544; UBIM_HUMAN.DR   SWISS-PROT; Q05472; RS30_HUMAN.XXFH   Key             Location/QualifiersFHFT   source          1..2016FT                   /db_xref="taxon:9606"FT                   /organism="Homo sapiens"FT                   /clone_lib="CML cosmid"FT                   /clone="15.1"FT   mRNA            join(408..504,774..856,951..1095,1557..1612,1787..>1912)FT                   /gene="fau 1"FT   exon            408..504FT                   /number=1FT   intron          505..773FT                   /number=1FT   exon            774..856  [Part of this file has been deleted for brevity]FT                   RAKRRMQYNRRFVNVVPTFGKKKGPNANS"FT   intron          857..950FT                   /number=2FT   exon            951..1095FT                   /number=3FT   intron          1096..1556FT                   /number=3FT   exon            1557..1612FT                   /number=4FT   intron          1613..1786FT                   /number=4FT   exon            1787..>1912FT                   /number=5FT   polyA_signal    1938..1943XXSQ   Sequence 2016 BP; 421 A; 562 C; 538 G; 495 T; 0 other;     ctaccatttt ccctctcgat tctatatgta cactcgggac aagttctcct gatcgaaaac        60     ggcaaaacta aggccccaag taggaatgcc ttagttttcg gggttaacaa tgattaacac       120     tgagcctcac acccacgcga tgccctcagc tcctcgctca gcgctctcac caacagccgt       180     agcccgcagc cccgctggac accggttctc catccccgca gcgtagcccg gaacatggta       240     gctgccatct ttacctgcta cgccagcctt ctgtgcgcgc aactgtctgg tcccgccccg       300     tcctgcgcga gctgctgccc aggcaggttc gccggtgcga gcgtaaaggg gcggagctag       360     gactgccttg ggcggtacaa atagcaggga accgcgcggt cgctcagcag tgacgtgaca       420     cgcagcccac ggtctgtact gacgcgccct cgcttcttcc tctttctcga ctccatcttc       480     gcggtagctg ggaccgccgt tcaggtaaga atggggcctt ggctggatcc gaagggcttg       540     tagcaggttg gctgcggggt cagaaggcgc ggggggaacc gaagaacggg gcctgctccg       600     tggccctgct ccagtcccta tccgaactcc ttgggaggca ctggccttcc gcacgtgagc       660     cgccgcgacc accatcccgt cgcgatcgtt tctggaccgc tttccactcc caaatctcct       720     ttatcccaga gcatttcttg gcttctctta caagccgtct tttctttact cagtcgccaa       780     tatgcagctc tttgtccgcg cccaggagct acacaccttc gaggtgaccg gccaggaaac       840     ggtcgcccag atcaaggtaa ggctgcttgg tgcgccctgg gttccatttt cttgtgctct       900     tcactctcgc ggcccgaggg aacgcttacg agccttatct ttccctgtag gctcatgtag       960     cctcactgga gggcattgcc ccggaagatc aagtcgtgct cctggcaggc gcgcccctgg      1020     aggatgaggc cactctgggc cagtgcgggg tggaggccct gactaccctg gaagtagcag      1080     gccgcatgct tggaggtgag tgagagagga atgttctttg aagtaccggt aagcgtctag      1140     tgagtgtggg gtgcatagtc ctgacagctg agtgtcacac ctatggtaat agagtacttc      1200     tcactgtctt cagttcagag tgattcttcc tgtttacatc cctcatgttg aacacagacg      1260     tccatgggag actgagccag agtgtagttg tatttcagtc acatcacgag atcctagtct      1320     ggttatcagc ttccacacta aaaattaggt cagaccaggc cccaaagtgc tctataaatt      1380     agaagctgga agatcctgaa atgaaactta agatttcaag gtcaaatatc tgcaactttg      1440     ttctcattac ctattgggcg cagcttctct ttaaaggctt gaattgagaa aagaggggtt      1500
tcode.txt - 源码说明

本页面展示了「emboss的linux版本的源代码」中的 tcode.txt 源码文件，采用文本编程语言编写，共 663 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与emboss相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?