📄 compseq.txt
字号:
FT /db_xref="taxon:9606"FT /organism="Homo sapiens"FT /tissue_type="placenta"FT /clone_lib="cDNA"FT /clone="pUIA 631"FT /map="13"FT misc_feature 57..278FT /note="ubiquitin like part"FT CDS 57..458FT /db_xref="SWISS-PROT:P35544"FT /db_xref="SWISS-PROT:Q05472"FT /gene="fau"FT /protein_id="CAA46716.1"FT /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLAGFT APLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKKTGFT RAKRRMQYNRRFVNVVPTFGKKKGPNANS"FT misc_feature 98..102FT /note="nucleolar localization signal"FT misc_feature 279..458FT /note="S30 part"FT polyA_signal 484..489FT polyA_site 509XXSQ Sequence 518 BP; 125 A; 139 C; 148 G; 106 T; 0 other; ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc gccaatatgc 60 agctctttgt ccgcgcccag gagctacaca ccttcgaggt gaccggccag gaaacggtcg 120 cccagatcaa ggctcatgta gcctcactgg agggcattgc cccggaagat caagtcgtgc 180 tcctggcagg cgcgcccctg gaggatgagg ccactctggg ccagtgcggg gtggaggccc 240 tgactaccct ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 300 gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag aagaagaaga 360 agacaggtcg ggctaagcgg cggatgcagt acaaccggcg ctttgtcaac gttgtgccca 420 cctttggcaa gaagaagggc cccaatgcca actcttaagt cttttgtaat tctggctttc 480 tctaataaaa aagccactta gttcagtcaa aaaaaaaa 518// Input files for usage example 3 File: prev.comp## Output from 'compseq'## The Expected frequencies are calculated on the (false) assumption that every# word has equal frequency.## The input sequences are:# HSFAUWord size 3Total count 516## Word Obs Count Obs Frequency Exp Frequency Obs/Exp Frequency#AAA 17 0.0329457 0.0156250 2.1085271AAC 5 0.0096899 0.0156250 0.6201550AAG 18 0.0348837 0.0156250 2.2325581AAT 4 0.0077519 0.0156250 0.4961240ACA 5 0.0096899 0.0156250 0.6201550ACC 6 0.0116279 0.0156250 0.7441860ACG 2 0.0038760 0.0156250 0.2480620ACT 7 0.0135659 0.0156250 0.8682171AGA 12 0.0232558 0.0156250 1.4883721AGC 7 0.0135659 0.0156250 0.8682171AGG 16 0.0310078 0.0156250 1.9844961AGT 10 0.0193798 0.0156250 1.2403101ATA 2 0.0038760 0.0156250 0.2480620ATC 3 0.0058140 0.0156250 0.3720930ATG 7 0.0135659 0.0156250 0.8682171ATT 2 0.0038760 0.0156250 0.2480620CAA 10 0.0193798 0.0156250 1.2403101CAC 6 0.0116279 0.0156250 0.7441860CAG 13 0.0251938 0.0156250 1.6124031CAT 5 0.0096899 0.0156250 0.6201550CCA 12 0.0232558 0.0156250 1.4883721CCC 13 0.0251938 0.0156250 1.6124031CCG 8 0.0155039 0.0156250 0.9922481CCT 10 0.0193798 0.0156250 1.2403101CGA 2 0.0038760 0.0156250 0.2480620CGC 10 0.0193798 0.0156250 1.2403101CGG 9 0.0174419 0.0156250 1.1162791CGT 4 0.0077519 0.0156250 0.4961240CTA 5 0.0096899 0.0156250 0.6201550CTC 11 0.0213178 0.0156250 1.3643411CTG 10 0.0193798 0.0156250 1.2403101CTT 11 0.0213178 0.0156250 1.3643411GAA 11 0.0213178 0.0156250 1.3643411GAC 6 0.0116279 0.0156250 0.7441860GAG 10 0.0193798 0.0156250 1.2403101GAT 4 0.0077519 0.0156250 0.4961240GCA 7 0.0135659 0.0156250 0.8682171GCC 18 0.0348837 0.0156250 2.2325581GCG 8 0.0155039 0.0156250 0.9922481GCT 10 0.0193798 0.0156250 1.2403101GGA 13 0.0251938 0.0156250 1.6124031GGC 17 0.0329457 0.0156250 2.1085271GGG 7 0.0135659 0.0156250 0.8682171GGT 9 0.0174419 0.0156250 1.1162791GTA 6 0.0116279 0.0156250 0.7441860GTC 9 0.0174419 0.0156250 1.1162791GTG 8 0.0155039 0.0156250 0.9922481GTT 5 0.0096899 0.0156250 0.6201550TAA 7 0.0135659 0.0156250 0.8682171TAC 3 0.0058140 0.0156250 0.3720930TAG 4 0.0077519 0.0156250 0.4961240TAT 1 0.0019380 0.0156250 0.1240310TCA 10 0.0193798 0.0156250 1.2403101TCC 6 0.0116279 0.0156250 0.7441860TCG 7 0.0135659 0.0156250 0.8682171TCT 10 0.0193798 0.0156250 1.2403101TGA 4 0.0077519 0.0156250 0.4961240TGC 9 0.0174419 0.0156250 1.1162791TGG 14 0.0271318 0.0156250 1.7364341TGT 5 0.0096899 0.0156250 0.6201550TTA 2 0.0038760 0.0156250 0.2480620TTC 10 0.0193798 0.0156250 1.2403101TTG 7 0.0135659 0.0156250 0.8682171TTT 7 0.0135659 0.0156250 0.8682171Other 0 0.0000000 0.0000000 10000000000.0000000Output file format The output format consists of: Header information and comments are preceeded by a '#' character at the start of the line. The Word size and the Total count are then given on separate lines, The headers of the columns of results are preceeded by a '#' The results columns are: the sub-sequence word, the observed frequency, the expected frequency (which will be read from the input file if one is given, else it is a simple inverse of the number of words of the size specified that can be constructed), the ratio of the observed to expected frequency. After a blank line at the end, the results of 'Other' words is given - this is the number of words with a sequence which has IUPAC ambiguity codes or other unusual characters in. Output files for usage example File: result3.comp## Output from 'compseq'## The Expected frequencies are calculated on the (false) assumption that every# word has equal frequency.## The input sequences are:# HSFAUWord size 2Total count 517## Word Obs Count Obs Frequency Exp Frequency Obs/Exp Frequency#AA 45 0.0870406 0.0625000 1.3926499AC 20 0.0386847 0.0625000 0.6189555AG 45 0.0870406 0.0625000 1.3926499AT 14 0.0270793 0.0625000 0.4332689CA 34 0.0657640 0.0625000 1.0522244CC 43 0.0831721 0.0625000 1.3307544CG 25 0.0483559 0.0625000 0.7736944CT 37 0.0715667 0.0625000 1.1450677GA 31 0.0599613 0.0625000 0.9593810GC 43 0.0831721 0.0625000 1.3307544GG 46 0.0889749 0.0625000 1.4235977GT 28 0.0541586 0.0625000 0.8665377TA 15 0.0290135 0.0625000 0.4642166TC 33 0.0638298 0.0625000 1.0212766TG 32 0.0618956 0.0625000 0.9903288TT 26 0.0502901 0.0625000 0.8046422Other 0 0.0000000 0.0000000 10000000000.0000000 Output files for usage example 2 File: result6.comp## Output from 'compseq'## Words with a frequency of zero are not reported.# The Expected frequencies are calculated on the (false) assumption that every# word has equal frequency.## The input sequences are:# HSFAUWord size 6Total count 513## Word Obs Count Obs Frequency Exp Frequency Obs/Exp Frequency#AAAAAA 6 0.0116959 0.0002441 47.9064327AAAAAG 1 0.0019493 0.0002441 7.9844055AAAAGC 1 0.0019493 0.0002441 7.9844055AAAAGT 1 0.0019493 0.0002441 7.9844055AAACAG 1 0.0019493 0.0002441 7.9844055AAACGG 1 0.0019493 0.0002441 7.9844055AAAGCC 1 0.0019493 0.0002441 7.9844055AAAGTG 1 0.0019493 0.0002441 7.9844055AAAGTT 1 0.0019493 0.0002441 7.9844055AACAGG 1 0.0019493 0.0002441 7.9844055AACCGG 1 0.0019493 0.0002441 7.9844055AACGGT 1 0.0019493 0.0002441 7.9844055AACGTT 1 0.0019493 0.0002441 7.9844055AACTCT 1 0.0019493 0.0002441 7.9844055AAGAAG 6 0.0116959 0.0002441 47.9064327AAGACA 1 0.0019493 0.0002441 7.9844055AAGATC 1 0.0019493 0.0002441 7.9844055AAGCCA 1 0.0019493 0.0002441 7.9844055AAGCGG 1 0.0019493 0.0002441 7.9844055AAGGCT 1 0.0019493 0.0002441 7.9844055AAGGGC 1 0.0019493 0.0002441 7.9844055AAGGTG 1 0.0019493 0.0002441 7.9844055AAGTAG 1 0.0019493 0.0002441 7.9844055AAGTCG 1 0.0019493 0.0002441 7.9844055AAGTCT 1 0.0019493 0.0002441 7.9844055AAGTGA 1 0.0019493 0.0002441 7.9844055AAGTTC 1 0.0019493 0.0002441 7.9844055AATAAA 1 0.0019493 0.0002441 7.9844055
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -