📄 cpgplot.txt

📁 emboss的linux版本的源代码
💻 TXT
字号:
                                  cpgplot Function   Plot CpG rich areasDescription   CpG refers to a C nucleotide immediately followed by a G. The 'p' in   'CpG' refers to the phosphate group linking the two bases.   Detection of regions of genomic sequences that are rich in the CpG   pattern is important because such regions are resistant to methylation   and tend to be associated with genes which are frequently switched on.   Regions rich in the CpG pattern are known as CpG islands.   It has been estimated that about half of all mammalian genes have a   CpG-rich region around their 5' end. It is said that all mammalian   house-keeping genes have a CpG island!   Non-mammalian vertebrates have some CpG islands that are associated   with genes, but the association gets equivocal in the farther   taxonomic groups.   Finding a CpG island upstream of predicted exons or genes is good   contributory evidence.   By default, this program defines a CpG island as a region where, over   an average of 10 windows, the calculated % composition is over 50% and   the calculated Obs/Exp (i.e. Observed/Expected) ratio is over 0.6 and   the conditions hold for a minimum of 200 bases. These conditions can   be modified by setting the values of the appropriate parameters.   The Observed number of CpG patterns in a window is simply the count of   the number of times a 'C' is found followed immediately by a 'G'.   The Expected number of CpG patterns is calculated for each window as   the number of CpG dinucleotides you would expect to see in that window   based on the frequency of C's and G's in that window. Thus, the   Expected frequency of CpG's in a window is calculated as the number of   'C's in the window multiplied by the number of 'G's in the window,   divided by the window length.        Expected = (number of C's  * number of G's) / window length   This program reads in one or more sequences and calculates the Obs/Exp   ratio, the percentage CpG over a window which is moved along the   sequence. These calculated values can be plotted, together with the   regions which match this program's definition of a CpG island.Usage   Here is a sample session with cpgplot% cpgplot tembl:rnu68037 -graph cps Plot CpG rich areasWindow size [100]: Minimum length of an island [200]: Minimum observed/expected [0.6]: Minimum percentage [50.]: Output file [rnu68037.cpgplot]: Features output [rnu68037.gff]: Created cpgplot.ps   Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers (* if not always prompted):  [-sequence]          seqall     Nucleotide sequence(s) filename and optional                                  format, or reference (input USA)   -window             integer    [100] The percentage CG content and the                                  Observed frequency of CG is calculated                                  within a window whose size is set by this                                  parameter. The window is moved down the                                  sequence and these statistics are calculated                                  at each postition that the window is moved                                  to. (Integer 1 or more)   -minlen             integer    [200] This sets the minimum length that a                                  CpG island has to be before it is reported.                                  (Integer 1 or more)   -minoe              float      [0.6] This sets the minimum average observed                                  to expected ratio of C plus G to CpG in a                                  set of 10 windows that are required before a                                  CpG island is reported. (Number from 0.000                                  to 10.000)   -minpc              float      [50.] This sets the minimum average                                  percentage of G plus C a set of 10 windows                                  that are required before a CpG island is                                  reported. (Number from 0.000 to 100.000)  [-outfile]           outfile    [*.cpgplot] This sets the name of the file                                  holding the report of the input sequence                                  name, CpG island parameters and the output                                  details of any CpG islands that are found.*  -graph              xygraph    [$EMBOSS_GRAPHICS value, or x11] Graph type                                  (ps, hpgl, hp7470, hp7580, meta, cps, x11,                                  tekt, tek, none, data, xterm, png)  [-outfeat]           featout    [unknown.gff] File for output features   Additional (Optional) qualifiers: (none)   Advanced (Unprompted) qualifiers:   -[no]plot           toggle     [Y] Plot CpG island score   -[no]obsexp         boolean    [Y] If this is set to true then the graph of                                  the observed to expected ratio of C plus G                                  to CpG within a window is displayed.   -[no]cg             boolean    [Y] If this is set to true then the graph of                                  the regions which have been determined to                                  be CpG islands is displayed.   -[no]pc             boolean    [Y] If this is set to true then the graph of                                  the percentage C plus G within a window is                                  displayed.   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -odirectory2        string     Output directory   "-graph" associated qualifiers   -gprompt            boolean    Graph prompting   -gdesc              string     Graph description   -gtitle             string     Graph title   -gsubtitle          string     Graph subtitle   -gxtitle            string     Graph x axis title   -gytitle            string     Graph y axis title   -goutfile           string     Output file for non interactive displays   -gdirectory         string     Output directory   "-outfeat" associated qualifiers   -offormat3          string     Output feature format   -ofopenfile3        string     Features file name   -ofextension3       string     File name extension   -ofdirectory3       string     Output directory   -ofname3            string     Base file name   -ofsingle3          boolean    Separate file for each entry   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   Any DNA sequence USA.  Input files for usage example   'tembl:rnu68037' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:rnu68037ID   RNU68037   standard; RNA; ROD; 1218 BP.XXAC   U68037;XXSV   U68037.1XXDT   23-SEP-1996 (Rel. 49, Created)DT   04-MAR-2000 (Rel. 63, Last updated, Version 2)XXDE   Rattus norvegicus EP1 prostanoid receptor mRNA, complete cds.XXKW   .XXOS   Rattus norvegicus (Norway rat)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus.XXRN   [1]RP   1-1218RA   Abramovitz M., Boie Y.;RT   "Cloning of the rat EP1 prostanoid receptor";RL   Unpublished.XXRN   [2]RP   1-1218RA   Abramovitz M., Boie Y.;RT   ;RL   Submitted (26-AUG-1996) to the EMBL/GenBank/DDBJ databases.RL   Biochemistry & Molecular Biology, Merck Frosst Center for TherapeuticRL   Research, P. O. Box 1005, Pointe Claire - Dorval, Quebec H9R 4P8, CanadaXXDR   SWISS-PROT; P70597; PE21_RAT.XXFH   Key             Location/QualifiersFHFT   source          1..1218FT                   /db_xref="taxon:10116"FT                   /organism="Rattus norvegicus"FT                   /strain="Sprague-Dawley"FT   CDS             1..1218FT                   /codon_start=1FT                   /db_xref="SWISS-PROT:P70597"FT                   /note="family 1 G-protein coupled receptor"FT                   /product="EP1 prostanoid receptor"FT                   /protein_id="AAB07735.1"FT                   /translation="MSPYGLNLSLVDEATTCVTPRVPNTSVVLPTGGNGTSPALPIFSMFT                   TLGAVSNVLALALLAQVAGRLRRRRSTATFLLFVASLLAIDLAGHVIPGALVLRLYTAGFT                   RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLIHAARVSVARARLALALLFT                   AAMALAVALLPLVHVGHYELQYPGTWCFISLGPPGGWRQALLAGLFAGLGLAALLAALVFT                   CNTLSGLALLRARWRRRRSRRFRENAGPDDRRRWGSRGLRLASASSASSITSTTAALRSFT                   SRGGGSARRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVFT                   RLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELSLTKSAWEASSLRSSRHFT                   SGFSHL"XXSQ   Sequence 1218 BP; 162 A; 397 C; 387 G; 272 T; 0 other;     atgagcccct acgggcttaa cctgagccta gtggatgagg caacaacgtg tgtaacaccc        60     agggtcccca atacatctgt ggtgctgcca acaggcggta acggcacatc accagcgctg       120     cctatcttct ccatgacgct gggtgctgtg tccaacgtgc tggcgctggc gctgctggcc       180     caggttgcag gcagactgcg gcgccgccgc tcgactgcca ccttcctgtt gttcgtcgcc       240     agcctgcttg ccatcgacct agcaggccat gtgatcccgg gcgccttggt gcttcgcctg       300     tatactgcag gacgtgcgcc cgctggcggg gcctgtcatt tcctgggcgg ctgtatggtc       360     ttctttggcc tgtgcccact tttgcttggc tgtggcatgg ccgtggagcg ctgcgtgggt       420     gtcacgcagc cgctgatcca cgcggcgcgc gtgtccgtag cccgcgcacg cctggcacta       480     gccctgctgg ccgccatggc tttggcagtg gcgctgctgc cactagtgca cgtgggtcac       540     tacgagctac agtaccctgg cacttggtgt ttcattagcc ttgggcctcc tggaggttgg       600     cgccaggcgt tgcttgcggg cctcttcgcc ggccttggcc tggctgcgct ccttgccgca       660     ctagtgtgta atacgctcag cggcctggcg ctccttcgtg cccgctggag gcggcgtcgc       720     tctcgacgtt tccgagagaa cgcaggtccc gatgatcgcc ggcgctgggg gtcccgtgga       780     ctccgcttgg cctccgcctc gtctgcgtca tccatcactt caaccacagc tgccctccgc       840     agctctcggg gaggcggctc cgcgcgcagg gttcacgcac acgacgtgga aatggtgggc       900     cagctcgtgg gcatcatggt ggtgtcgtgc atctgctgga gccccctgct ggtattggtg       960     gtgttggcca tcgggggctg gaactctaac tccctgcagc ggccgctctt tctggctgta      1020     cgcctcgcgt cgtggaacca gatcctggac ccatgggtgt acatcctgct gcgccaggct      1080     atgctgcgcc aacttcttcg cctcctaccc ctgagggtta gtgccaaggg tggtccaacg      1140     gagctgagcc taaccaagag tgcctgggag gccagttcac tgcgtagctc ccggcacagt      1200     ggcttcagcc acttgtga                                                    1218//Output file format  Output files for usage example  File: rnu68037.cpgplotCPGPLOT islands of unusual CG compositionRNU68037 from 1 to 1218     Observed/Expected ratio > 0.60     Percent C + Percent G > 50.00     Length > 200 Length 406 (104..509) Length 329 (596..924)  File: rnu68037.gff##gff-version 2.0##date 2006-07-15##Type DNA RNU68037RNU68037        cpgplot misc_feature    104     509     0.000   +       .Sequence "RNU68037.1"RNU68037        cpgplot misc_feature    596     924     0.000   +       .Sequence "RNU68037.2"  Graphics File: cpgplot.ps   [cpgplot results]Notes   None.References   The original program was described in:    1. Larsen F, Gundersen G, Lopez R, Prydz H "CpG islands as gene       markers in the human genome." Genomics 1992 Aug;13(4):1095-107Warnings   None.Diagnostic Error Messages   None.Exit status   0 if successful.Known bugs   None.See also   Program name                        Description   cpgreport    Reports all CpG rich regions   geecee       Calculates fractional GC content of nucleic acid sequences   newcpgreport Report CpG rich areas   newcpgseek   Reports CpG rich regions   As there is no official definition of what is a cpg island is, and   worst where they begin and end, we have to live with 2 definitions and   thus two methods. These are:   1. newcpgseek and cpgreport - both declare a putative island if the   score is higher than a threshold (17 at the moment). They now also   displaying the actual CpG count, the % CG and the observed/expected   ration in the region where the score is above the threshold. This   scoring method based on sum/frequencies overpredicts islands but finds   the smaller ones around primary exons. newcpgseek uses the same method   as cpgreport but the output is different and more readable.   2. newcpgreport and cpgplot use a sliding window within which the   Obs/Exp ratio of CpG is calculated. The important thing to note in   this method is that an island, in order to be reported, is defined as   a region that satisfies the following contraints:   Obs/Exp ratio > 0.6   % C + % G > 50%   Length > 200.   For all practical purposes you should probably use newcpgreport. It is   actually used to produce the human cpgisland database you can find on   the EBI's ftp server as well as on the EBI's SRS server.   geecee measures CG content in the entire input sequence and is not to   be used to detect CpG islands. It can be usefull for detecting   sequences that MIGHT contain an island.Author(s)   Alan Bleasby (ajb
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -