📄 compseq.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
                                  compseq Function   Count composition of dimer/trimer/etc words in a sequenceDescription   This takes a specified length of sequence and counts the number of   distinct subsequences of that length that there are in the input   sequence(s).   It can read in the result of a previous compseq analysis and use this   to set the expected frequencies of the subsequences.   Unless you tell 'compseq' otherwise, it expects each word to be   equally likely. The 'Expected' frequency therefore of any dimer is   1/16 - this is simply the inverse of the number of possible dimers   (AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT).   Similarly, the 'Expected' frequency of any trimer is 1/64, etc.   Obviously this is not the case in real sequences - there will be bias   in favour of some words.   Compseq cannot otherwise guess what the 'Expected' frequency is. You   can, however, tell it what the Expected frequencies are by giving   compseq the output of the analysis of another set of sequences,   produced by a previous compseq run.   So you take a set of sequences that are representative of the type of   sequence you expect and you run compseq on it to get your expected   sequence frequencies.   You then take the sequences you wish to investigate, run compseq on   them giving compseq the expected frequencies that you have   established, above. You tell compseq what the file of expected   frequencies is by specifying it with '-infile filename' on the   command-line.Usage   Here is a sample session with compseq   To count the frequencies of dinucleotides in a file:% compseq tembl:hsfau -word 2 result3.comp Count composition of dimer/trimer/etc words in a sequence   Go to the input files for this example   Go to the output files for this example   Example 2   To count the frequencies of hexanucleotides, without outputting the   results of hexanucleotides that do not occur in the sequence:% compseq tembl:hsfau -word 6 result6.comp -nozero Count composition of dimer/trimer/etc words in a sequence   Go to the output files for this example   Example 3   To count the frequencies of trinucleotides in frame 2 of a sequence   and use a previously prepared compseq output to show the expected   frequencies:% compseq tembl:hsfau -word 3 result3.comp -frame 2 -in prev.comp Count composition of dimer/trimer/etc words in a sequence   Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequence]          seqall     Sequence(s) filename and optional format, or                                  reference (input USA)   -word               integer    [2] This is the size of word (n-mer) to                                  count.                                  Thus if you want to count codon frequencies,                                  you should enter 3 here. (Integer from 1 to                                  20)  [-outfile]           outfile    [*.compseq] This is the results file.   Additional (Optional) qualifiers (* if not always prompted):   -infile             infile     This is a file previously produced by                                  'compseq' that can be used to set the                                  expected frequencies of words in this                                  analysis.                                  The word size in the current run must be the                                  same as the one in this results file.                                  Obviously, you should use a file produced                                  from protein sequences if you are counting                                  protein sequence word frequencies, and you                                  must use one made from nucleotide                                  frequencies if you are analysing a                                  nucleotide sequence.   -frame              integer    [0] The normal behaviour of 'compseq' is to                                  count the frequencies of all words that                                  occur by moving a window of length 'word' up                                  by one each time.                                  This option allows you to move the window up                                  by the length of the word each time,                                  skipping over the intervening words.                                  You can count only those words that occur in                                  a single frame of the word by setting this                                  value to a number other than zero.                                  If you set it to 1 it will only count the                                  words in frame 1, 2 will only count the                                  words in frame 2 and so on. (Integer 0 or                                  more)*  -[no]ignorebz       boolean    [Y] The amino acid code B represents                                  Asparagine or Aspartic acid and the code Z                                  represents Glutamine or Glutamic acid.                                  These are not commonly used codes and you                                  may wish not to count words containing them,                                  just noting them in the count of 'Other'                                  words.*  -reverse            boolean    [N] Set this to be true if you also wish to                                  also count words in the reverse complement                                  of a nucleic sequence.   -calcfreq           boolean    [N] If this is set true then the expected                                  frequencies of words are calculated from the                                  observed frequency of single bases or                                  residues in the sequences.                                  If you are reporting a word size of 1                                  (single bases or residues) then there is no                                  point in using this option because the                                  calculated expected frequency will be equal                                  to the observed frequency.                                  Calculating the expected frequencies like                                  this will give an approximation of the                                  expected frequencies that you might get by                                  using an input file of frequencies produced                                  by a previous run of this program. If an                                  input file of expected word frequencies has                                  been specified then the values from that                                  file will be used instead of this                                  calculation of expected frequency from the                                  sequence, even if 'calcfreq' is set to be                                  true.   -[no]zerocount      boolean    [Y] You can make the output results file                                  much smaller if you do not display the words                                  with a zero count.   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -odirectory2        string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   Normal sequence(s) USA.  Input files for usage example   'tembl:hsfau' is a sequence entry in the example nucleic acid database   'tembl'  Database entry: tembl:hsfauID   HSFAU      standard; RNA; HUM; 518 BP.XXAC   X65923;XXSV   X65923.1XXDT   13-MAY-1992 (Rel. 31, Created)DT   23-SEP-1993 (Rel. 37, Last updated, Version 10)XXDE   H.sapiens fau mRNAXXKW   fau gene.XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-518RA   Michiels L.M.R.;RT   ;RL   Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases.RL   L.M.R. Michiels, University of Antwerp, Dept of Biochemistry,RL   Universiteisplein 1, 2610 Wilrijk, BELGIUMXXRN   [2]RP   1-518RX   MEDLINE; 93368957.RA   Michiels L., Van der Rauwelaert E., Van Hasselt F., Kas K., Merregaert J.;RT   " fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed asRT   an antisense sequences in the Finkel-Biskis-Reilly murine sarcoma virus";RL   Oncogene 8:2537-2546(1993).XXDR   SWISS-PROT; P35544; UBIM_HUMAN.DR   SWISS-PROT; Q05472; RS30_HUMAN.XXFH   Key             Location/QualifiersFHFT   source          1..518FT                   /chromosome="11q"
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -