📄 cpgreport.txt

📁 emboss的linux版本的源代码
💻 TXT
字号:
                                 cpgreport Function   Reports all CpG rich regionsDescription   cpgreport scans a nucleotide sequence for regions with higher than   expected frequencies of the dinucleotide CG.   CpG refers to a C nucleotide immediately followed by a G. The 'p' in   'CpG' refers to the phosphate group linking the two bases.   Detection of regions of genomic sequences that are rich in the CpG   pattern is important because such regions are resistant to methylation   and tend to be associated with genes which are frequently switched on.   Regions rich in the CpG pattern are known as CpG islands.   This program does not find CpG islands as normally defined: "a region   of greater than 200 bp with a %GC of greater than 50% and   observed/expected CpG > 0.6". cpgreport instead uses a running sum   rather than a window to create the score as follows: if not CpG at   position i, then decrement running-Sum counter, but if CpG then   running-Sum counter is incremented by the CPGSCORE. Spans greater than   the threshold are searched for recursively.   This method overpredicts islands but finds the smaller ones around   primary exons.Usage   Here is a sample session with cpgreport% cpgreport tembl:rnu68037 Reports all CpG rich regionsCpG score [17]: Output file [rnu68037.cpgreport]: Features output [rnu68037.gff]:    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequence]          seqall     Nucleotide sequence(s) filename and optional                                  format, or reference (input USA)   -score              integer    [17] This sets the score for each CG                                  sequence found. A value of 17 is more                                  sensitive, but 28 has also been used with                                  some success. (Integer from 1 to 200)  [-outfile]           outfile    [*.cpgreport] Output file name  [-outfeat]           featout    [unknown.gff] File for output features   Additional (Optional) qualifiers: (none)   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -odirectory2        string     Output directory   "-outfeat" associated qualifiers   -offormat3          string     Output feature format   -ofopenfile3        string     Features file name   -ofextension3       string     File name extension   -ofdirectory3       string     Output directory   -ofname3            string     Base file name   -ofsingle3          boolean    Separate file for each entry   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   Any DNA sequence USA.  Input files for usage example   'tembl:rnu68037' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:rnu68037ID   RNU68037   standard; RNA; ROD; 1218 BP.XXAC   U68037;XXSV   U68037.1XXDT   23-SEP-1996 (Rel. 49, Created)DT   04-MAR-2000 (Rel. 63, Last updated, Version 2)XXDE   Rattus norvegicus EP1 prostanoid receptor mRNA, complete cds.XXKW   .XXOS   Rattus norvegicus (Norway rat)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus.XXRN   [1]RP   1-1218RA   Abramovitz M., Boie Y.;RT   "Cloning of the rat EP1 prostanoid receptor";RL   Unpublished.XXRN   [2]RP   1-1218RA   Abramovitz M., Boie Y.;RT   ;RL   Submitted (26-AUG-1996) to the EMBL/GenBank/DDBJ databases.RL   Biochemistry & Molecular Biology, Merck Frosst Center for TherapeuticRL   Research, P. O. Box 1005, Pointe Claire - Dorval, Quebec H9R 4P8, CanadaXXDR   SWISS-PROT; P70597; PE21_RAT.XXFH   Key             Location/QualifiersFHFT   source          1..1218FT                   /db_xref="taxon:10116"FT                   /organism="Rattus norvegicus"FT                   /strain="Sprague-Dawley"FT   CDS             1..1218FT                   /codon_start=1FT                   /db_xref="SWISS-PROT:P70597"FT                   /note="family 1 G-protein coupled receptor"FT                   /product="EP1 prostanoid receptor"FT                   /protein_id="AAB07735.1"FT                   /translation="MSPYGLNLSLVDEATTCVTPRVPNTSVVLPTGGNGTSPALPIFSMFT                   TLGAVSNVLALALLAQVAGRLRRRRSTATFLLFVASLLAIDLAGHVIPGALVLRLYTAGFT                   RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLIHAARVSVARARLALALLFT                   AAMALAVALLPLVHVGHYELQYPGTWCFISLGPPGGWRQALLAGLFAGLGLAALLAALVFT                   CNTLSGLALLRARWRRRRSRRFRENAGPDDRRRWGSRGLRLASASSASSITSTTAALRSFT                   SRGGGSARRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVFT                   RLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELSLTKSAWEASSLRSSRHFT                   SGFSHL"XXSQ   Sequence 1218 BP; 162 A; 397 C; 387 G; 272 T; 0 other;     atgagcccct acgggcttaa cctgagccta gtggatgagg caacaacgtg tgtaacaccc        60     agggtcccca atacatctgt ggtgctgcca acaggcggta acggcacatc accagcgctg       120     cctatcttct ccatgacgct gggtgctgtg tccaacgtgc tggcgctggc gctgctggcc       180     caggttgcag gcagactgcg gcgccgccgc tcgactgcca ccttcctgtt gttcgtcgcc       240     agcctgcttg ccatcgacct agcaggccat gtgatcccgg gcgccttggt gcttcgcctg       300     tatactgcag gacgtgcgcc cgctggcggg gcctgtcatt tcctgggcgg ctgtatggtc       360     ttctttggcc tgtgcccact tttgcttggc tgtggcatgg ccgtggagcg ctgcgtgggt       420     gtcacgcagc cgctgatcca cgcggcgcgc gtgtccgtag cccgcgcacg cctggcacta       480     gccctgctgg ccgccatggc tttggcagtg gcgctgctgc cactagtgca cgtgggtcac       540     tacgagctac agtaccctgg cacttggtgt ttcattagcc ttgggcctcc tggaggttgg       600     cgccaggcgt tgcttgcggg cctcttcgcc ggccttggcc tggctgcgct ccttgccgca       660     ctagtgtgta atacgctcag cggcctggcg ctccttcgtg cccgctggag gcggcgtcgc       720     tctcgacgtt tccgagagaa cgcaggtccc gatgatcgcc ggcgctgggg gtcccgtgga       780     ctccgcttgg cctccgcctc gtctgcgtca tccatcactt caaccacagc tgccctccgc       840     agctctcggg gaggcggctc cgcgcgcagg gttcacgcac acgacgtgga aatggtgggc       900     cagctcgtgg gcatcatggt ggtgtcgtgc atctgctgga gccccctgct ggtattggtg       960     gtgttggcca tcgggggctg gaactctaac tccctgcagc ggccgctctt tctggctgta      1020     cgcctcgcgt cgtggaacca gatcctggac ccatgggtgt acatcctgct gcgccaggct      1080     atgctgcgcc aacttcttcg cctcctaccc ctgagggtta gtgccaaggg tggtccaacg      1140     gagctgagcc taaccaagag tgcctgggag gccagttcac tgcgtagctc ccggcacagt      1200     ggcttcagcc acttgtga                                                    1218//Output file format  Output files for usage example  File: rnu68037.cpgreportCPGREPORT of RNU68037 from 1 to 1218Sequence              Begin    End Score        CpG   %CG  CG/GCRNU68037                 12     13    17          1 100.0    -RNU68037                 47     48    17          1 100.0    -RNU68037                 96   1032   630         87  66.1   0.65RNU68037               1072   1100    26          3  62.1   0.00RNU68037               1139   1140    17          1 100.0    -RNU68037               1183   1193    26          2  72.7   2.00  File: rnu68037.gff##gff-version 2.0##date 2006-07-15##Type DNA RNU68037RNU68037        cpgreport       misc_feature    12      13      17.000  +.       Sequence "RNU68037.1"RNU68037        cpgreport       misc_feature    47      48      17.000  +.       Sequence "RNU68037.2"RNU68037        cpgreport       misc_feature    96      1032    630.000 +.       Sequence "RNU68037.3"RNU68037        cpgreport       misc_feature    1072    1100    26.000  +.       Sequence "RNU68037.4"RNU68037        cpgreport       misc_feature    1139    1140    17.000  +.       Sequence "RNU68037.5"RNU68037        cpgreport       misc_feature    1183    1193    26.000  +.       Sequence "RNU68037.6"   The first non-blank line of the output file 'rnu68037.cpgreport' is   the title line giving the program name, the name of sequence being   analysed and the start and end positions of the sequence.   The second non-blank line contains the headings of the columns.   Subsequent lines contain columns with the following information:     * The name of the sequence.     * The begin position and the end position of the CpG-rich region.     * The score of the CpG-rich region.     * The number of CpG's in the CpG-rich region.     * The %(G+C) in the CpG-rich region.     * The ratio of CpG to GpC in the CpG-rich region.   If the count of GpC in the region is zero, then the ratio of CG/GC is   reported as '-'.Data files   None.Notes   This program does not find CpG islands as normally defined (see   cpgplot).References   None.Warnings   None.Diagnostic Error Messages   None.Exit status   0 if successful.Known bugs   None.See also   Program name                        Description   cpgplot      Plot CpG rich areas   geecee       Calculates fractional GC content of nucleic acid sequences   newcpgreport Report CpG rich areas   newcpgseek   Reports CpG rich regions   As there is no official definition of what is a cpg island is, and   worst where they begin and end, we have to live with 2 definitions and   thus two methods. These are:   1. newcpgseek and cpgreport - both declare a putative island if the   score is higher than a threshold (17 at the moment). They now also   displaying the actual CpG count, the % CG and the observed/expected   ration in the region where the score is above the threshold. This   scoring method based on sum/frequencies overpredicts islands but finds   the smaller ones around primary exons. newcpgseek uses the same method   as cpgreport but the output is different and more readable.   2. newcpgreport and cpgplot use a sliding window within which the   Obs/Exp ratio of CpG is calculated. The important thing to note in   this method is that an island, in order to be reported, is defined as   a region that satisfies the following contraints:   Obs/Exp ratio > 0.6   % C + % G > 50%   Length > 200.   For all practical purposes you should probably use newcpgreport. It is   actually used to produce the human cpgisland database you can find on   the EBI's ftp server as well as on the EBI's SRS server.   geecee measures CG content in the entire input sequence and is not to   be used to detect CpG islands. It can be usefull for detecting   sequences that MIGHT contain an island.Author(s)   This program was originally written by Gos Micklem (gos
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -