📄 compseq.txt
字号:
AATATG 1 0.0019493 0.0002441 7.9844055AATGCC 1 0.0019493 0.0002441 7.9844055AATTCT 1 0.0019493 0.0002441 7.9844055ACAACC 1 0.0019493 0.0002441 7.9844055ACACAC 1 0.0019493 0.0002441 7.9844055 [Part of this file has been deleted for brevity]TGAGGC 1 0.0019493 0.0002441 7.9844055TGCAGC 1 0.0019493 0.0002441 7.9844055TGCAGT 1 0.0019493 0.0002441 7.9844055TGCCAA 1 0.0019493 0.0002441 7.9844055TGCCCA 1 0.0019493 0.0002441 7.9844055TGCCCC 1 0.0019493 0.0002441 7.9844055TGCGGG 1 0.0019493 0.0002441 7.9844055TGCTCC 1 0.0019493 0.0002441 7.9844055TGCTGG 1 0.0019493 0.0002441 7.9844055TGCTTG 1 0.0019493 0.0002441 7.9844055TGGAAA 1 0.0019493 0.0002441 7.9844055TGGAAG 1 0.0019493 0.0002441 7.9844055TGGAGG 4 0.0077973 0.0002441 31.9376218TGGCAA 1 0.0019493 0.0002441 7.9844055TGGCAG 1 0.0019493 0.0002441 7.9844055TGGCCA 1 0.0019493 0.0002441 7.9844055TGGCCC 1 0.0019493 0.0002441 7.9844055TGGCTT 1 0.0019493 0.0002441 7.9844055TGGGAC 1 0.0019493 0.0002441 7.9844055TGGGCC 1 0.0019493 0.0002441 7.9844055TGGTTC 1 0.0019493 0.0002441 7.9844055TGTAAT 1 0.0019493 0.0002441 7.9844055TGTAGC 1 0.0019493 0.0002441 7.9844055TGTCAA 1 0.0019493 0.0002441 7.9844055TGTCCG 1 0.0019493 0.0002441 7.9844055TGTGCC 1 0.0019493 0.0002441 7.9844055TTAAGT 1 0.0019493 0.0002441 7.9844055TTAGTT 1 0.0019493 0.0002441 7.9844055TTCAGT 2 0.0038986 0.0002441 15.9688109TTCATG 1 0.0019493 0.0002441 7.9844055TTCCCT 1 0.0019493 0.0002441 7.9844055TTCCTC 1 0.0019493 0.0002441 7.9844055TTCGAG 1 0.0019493 0.0002441 7.9844055TTCGCG 1 0.0019493 0.0002441 7.9844055TTCTCG 1 0.0019493 0.0002441 7.9844055TTCTCT 1 0.0019493 0.0002441 7.9844055TTCTGG 1 0.0019493 0.0002441 7.9844055TTGCCC 1 0.0019493 0.0002441 7.9844055TTGGAG 1 0.0019493 0.0002441 7.9844055TTGGCA 1 0.0019493 0.0002441 7.9844055TTGTAA 1 0.0019493 0.0002441 7.9844055TTGTCA 1 0.0019493 0.0002441 7.9844055TTGTCC 1 0.0019493 0.0002441 7.9844055TTGTGC 1 0.0019493 0.0002441 7.9844055TTTCTC 2 0.0038986 0.0002441 15.9688109TTTGGC 1 0.0019493 0.0002441 7.9844055TTTGTA 1 0.0019493 0.0002441 7.9844055TTTGTC 2 0.0038986 0.0002441 15.9688109TTTTGT 1 0.0019493 0.0002441 7.9844055Other 0 0.0000000 0.0000000 10000000000.0000000 Output files for usage example 3 File: result3.comp## Output from 'compseq'## Only words in frame 2 will be counted.# The Expected frequencies are taken from the file: ../../data/prev.comp## The input sequences are:# HSFAUWord size 3Total count 172## Word Obs Count Obs Frequency Exp Frequency Obs/Exp Frequency#AAA 7 0.0406977 0.0329457 1.2352955AAC 3 0.0174419 0.0096899 1.8000042AAG 11 0.0639535 0.0348837 1.8333344AAT 3 0.0174419 0.0077519 2.2500110ACA 1 0.0058140 0.0096899 0.6000014ACC 4 0.0232558 0.0116279 2.0000012ACG 1 0.0058140 0.0038760 1.4999880ACT 3 0.0174419 0.0135659 1.2857135AGA 1 0.0058140 0.0232558 0.2500002AGC 2 0.0116279 0.0135659 0.8571423AGG 0 0.0000000 0.0310078 0.0000000AGT 0 0.0000000 0.0193798 0.0000000ATA 0 0.0000000 0.0038760 0.0000000ATC 1 0.0058140 0.0058140 0.9999920ATG 3 0.0174419 0.0135659 1.2857135ATT 1 0.0058140 0.0038760 1.4999880CAA 1 0.0058140 0.0193798 0.3000007CAC 2 0.0116279 0.0116279 1.0000006CAG 9 0.0523256 0.0251938 2.0769229CAT 3 0.0174419 0.0096899 1.8000042CCA 0 0.0000000 0.0232558 0.0000000CCC 3 0.0174419 0.0251938 0.6923076CCG 1 0.0058140 0.0155039 0.3749994CCT 2 0.0116279 0.0193798 0.6000014CGA 1 0.0058140 0.0038760 1.4999880CGC 5 0.0290698 0.0193798 1.5000035CGG 4 0.0232558 0.0174419 1.3333303CGT 2 0.0116279 0.0077519 1.5000074CTA 1 0.0058140 0.0096899 0.6000014CTC 4 0.0232558 0.0213178 1.0909106CTG 7 0.0406977 0.0193798 2.1000049CTT 3 0.0174419 0.0213178 0.8181829GAA 3 0.0174419 0.0213178 0.8181829GAC 1 0.0058140 0.0116279 0.5000003GAG 7 0.0406977 0.0193798 2.1000049GAT 2 0.0116279 0.0077519 1.5000074GCA 2 0.0116279 0.0135659 0.8571423GCC 10 0.0581395 0.0348837 1.6666677GCG 1 0.0058140 0.0155039 0.3749994GCT 3 0.0174419 0.0193798 0.9000021GGA 2 0.0116279 0.0251938 0.4615384GGC 8 0.0465116 0.0329457 1.4117663GGG 1 0.0058140 0.0135659 0.4285712GGT 5 0.0290698 0.0174419 1.6666629GTA 2 0.0116279 0.0116279 1.0000006GTC 6 0.0348837 0.0174419 1.9999955GTG 6 0.0348837 0.0155039 2.2499965GTT 3 0.0174419 0.0096899 1.8000042TAA 3 0.0174419 0.0135659 1.2857135TAC 1 0.0058140 0.0058140 0.9999920TAG 0 0.0000000 0.0077519 0.0000000TAT 0 0.0000000 0.0019380 0.0000000TCA 3 0.0174419 0.0193798 0.9000021TCC 1 0.0058140 0.0116279 0.5000003TCG 0 0.0000000 0.0135659 0.0000000TCT 3 0.0174419 0.0193798 0.9000021TGA 0 0.0000000 0.0077519 0.0000000TGC 1 0.0058140 0.0174419 0.3333326TGG 1 0.0058140 0.0271318 0.2142856TGT 1 0.0058140 0.0096899 0.6000014TTA 1 0.0058140 0.0038760 1.4999880TTC 1 0.0058140 0.0193798 0.3000007TTG 0 0.0000000 0.0135659 0.0000000TTT 5 0.0290698 0.0135659 2.1428558Other 0 0.0000000 0.0000000 10000000000.0000000Data files The input data file is not required. The input data file format is exactly the same as the output file format. It expects to read in a previous output file of this program. An error is produced if the word size of the current compseq job and that of the output file being read in are different.Notes The results are held in an array in memory before being written to a file. For large values of wordsize, you may run out of memory. You can produce very large output files if you choose large values of wordsize.References None.Warnings If you use large word-sizes (over about 7 for nucleic, 5 for protein) you will use huge amounts of memory.Diagnostic Error Messages "The word size is too large for the data structure available." You chose a word size that cannot be stored by the program. "Insufficient memory - aborting." You do not have enough memory - use a machine with more memory. "The word size you are counting (n) is different to the word size in the file of expected frequencies (n)." You chose different word sizes in the run of compseq that produced your results file used to display the expected word frequencies to the word size used in this run of compseq. "The 'Word size' line was not found, instead found:" You appear to be trying to read a corrupted compseq results fileExit status It always exits with status 0 unless one of the above error conditions is foundKnown bugs This program can use a large amount of memory is you specify a large word size (7 or above). This may impact the behaviour of other programs on your machine. If you run out of memory, you may see the program crash with a generic error message that will be specific to your machine's operating system, but will probably be a warning about writing to memory that the program does not own (eg "Segmentation fault" on a Solaris machine) This is not a bug, it is a feature of the way this program grabs large amounts of memory.See also Program name Description backtranambig Back translate a protein sequence to ambiguous codons backtranseq Back translate a protein sequence banana Bending and curvature plot in B-DNA btwisted Calculates the twisting in a B-DNA sequence chaos Create a chaos game representation plot for a sequence charge Protein charge plot checktrans Reports STOP codons and ORF statistics of a protein dan Calculates DNA RNA/DNA melting temperature emowse Protein identification by mass spectrometry freak Residue/base frequency table or plot iep Calculates the isoelectric point of a protein isochore Plots isochores in large DNA sequences mwcontam Shows molwts that match across a set of files mwfilter Filter noisy molwts from mass spec output octanol Displays protein hydropathy pepinfo Plots simple amino acid properties in parallel pepstats Protein statistics pepwindow Displays protein hydropathy pepwindowall Displays protein hydropathy of a set of sequences sirna Finds siRNA duplexes in mRNA wordcount Counts words of a specified size in a DNA sequenceAuthor(s) Gary Williams (gwilliam
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -