📄 etandem.txt
字号:
etandem Function Looks for tandem repeats in a nucleotide sequenceDescription etandem looks for tandem repeats in a sequence. It is normally used after equicktandem has been run to identify potential repeat sizes. It calculates a consensus for the repeat region and gives a score for how many matches there are to the consensus - the number of mismatches. Input sequences are converted into ACGT or N (so ambiguity codes are ignored). The score is +1 for a match, -1 for a mismatch. The first copy of a repeat is ignored. The highest score is kept for each start position and repeat size. The lowest score to be reported is set by the threshold score. The threshold score can be set on the command-line using the -threshold qualifier, the default is 20. For perfect repeats, the score is the length of the repeat (except for the first copy). Reduce the threshold score a little if you wish to to allow mismatches. Each mismatch scores -1 instead of +1 so it scores 2 less than a perfect match of the same number of bases. Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.Usage Here is a sample session with etandem The input sequence is the human herpesvirus tandem repeat.% etandem -noorigfile Looks for tandem repeats in a nucleotide sequenceInput nucleotide sequence: tembl:hhtetraMinimum repeat size [10]: 6Maximum repeat size [6]: Output report [hhtetra.tan]: Go to the input files for this example Go to the output files for this exampleCommand line arguments Standard (Mandatory) qualifiers: [-sequence] sequence Nucleotide sequence filename and optional format, or reference (input USA) -minrepeat integer [10] Minimum repeat size (Integer, 2 or higher) -maxrepeat integer [Same as -minrepeat] Maximum repeat size (Integer, same as -minrepeat or higher) [-outfile] report [*.etandem] Output report file name Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -threshold integer [20] Threshold score (Any integer value) -mismatch boolean Allow N as a mismatch -uniform boolean Allow uniform consensus -origfile outfile [*.etandem] Sanger Centre program tandem output file (optional) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of the sequence to be used -send1 integer End of the sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence "-origfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messagesInput file format The input for etandem is a nucleotide sequence USA. Input files for usage example 'tembl:hhtetra' is a sequence entry in the example nucleic acid database 'tembl' Database entry: tembl:hhtetraID HHTETRA standard; DNA; VRL; 1272 BP.XXAC L46634; L46689;XXSV L46634.1XXDT 06-NOV-1995 (Rel. 45, Created)DT 04-MAR-2000 (Rel. 63, Last updated, Version 3)XXDE Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region.XXKW telomeric repeat.XXOS Human herpesvirus 7OC Viruses; dsDNA viruses, no RNA stage; Herpesviridae; Betaherpesvirinae.XXRN [1]RP 1-1272RX MEDLINE; 96079055.RA Secchiero P., Nicholas J., Deng H., Xiaopeng T., van Loon N., Ruvolo V.R.,RA Berneman Z.N., Reitz M.S. Jr., Dewhurst S.;RT "Identification of human telomeric repeat motifs at the genome termini ofRT human herpesvirus 7: structural analysis and heterogeneity";RL J. Virol. 69(12):8041-8045(1995).XXFH Key Location/QualifiersFHFT source 1..1272FT /db_xref="taxon:10372"FT /organism="Human herpesvirus 7"FT /strain="JI"FT /clone="ED132'1.2"FT repeat_region 207..928FT /note="long and complex repeat region composed of variousFT direct repeats, including TAACCC (TRS), degenerate copiesFT of TRS motifs and a 14-bp repeat, TAGGGCTGCGGCCC"FT misc_signal 938..998FT /note="pac2 motif"FT misc_feature 1009FT /note="right genome terminus (...ACA)"XXSQ Sequence 1272 BP; 346 A; 455 C; 222 G; 249 T; 0 other; aagcttaaac tgaggtcaca cacgacttta attacggcaa cgcaacagct gtaagctgca 60 ggaaagatac gatcgtaagc aaatgtagtc ctacaatcaa gcgaggttgt agacgttacc 120 tacaatgaac tacacctcta agcataacct gtcgggcaca gtgagacacg cagccgtaaa 180 ttcaaaactc aacccaaacc gaagtctaag tctcacccta atcgtaacag taaccctaca 240 actctaatcc tagtccgtaa ccgtaacccc aatcctagcc cttagcccta accctagccc 300 taaccctagc tctaacctta gctctaactc tgaccctagg cctaacccta agcctaaccc 360 taaccgtagc tctaagttta accctaaccc taaccctaac catgaccctg accctaaccc 420 tagggctgcg gccctaaccc tagccctaac cctaacccta atcctaatcc tagccctaac 480 cctagggctg cggccctaac cctagcccta accctaaccc taaccctagg gctgcggccc 540 taaccctaac cctagggctg cggcccgaac cctaacccta accctaaccc taaccctagg 600 gctgcggccc taaccctaac cctagggctg cggccctaac cctaacccta gggctgcggc 660 ccgaacccta accctaaccc taaccctagg gctgcggccc taaccctaac cctagggctg 720 cggccctaac cctaacccta actctagggc tgcggcccta accctaaccc taaccctaac 780 cctagggctg cggcccgaac cctagcccta accctaaccc tgaccctgac cctaacccta 840 accctaaccc taaccctaac cctaacccta accctaaccc taaccctaac cctaacccta 900 accctaaccc taaccctaac cctaaccccg cccccactgg cagccaatgt cttgtaatgc 960 cttcaaggca ctttttctgc gagccgcgcg cagcactcag tgaaaaacaa gtttgtgcac 1020 gagaaagacg ctgccaaacc gcagctgcag catgaaggct gagtgcacaa ttttggcttt 1080 agtcccataa aggcgcggct tcccgtagag tagaaaaccg cagcgcggcg cacagagcga 1140 aggcagcggc tttcagactg tttgccaagc gcagtctgca tcttaccaat gatgatcgca 1200 agcaagaaaa atgttctttc ttagcatatg cgtggttaat cctgttgtgg tcatcactaa 1260 gttttcaagc tt 1272//Output file format The output is a standard EMBOSS report file. The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats. By default etandem writes a 'table' report file. Output files for usage example File: hhtetra.tan######################################### Program: etandem# Rundate: Sat 15 Jul 2006 12:00:00# Commandline: etandem# -noorigfile# -sequence tembl:hhtetra# -minrepeat 6# Report_format: table# Report_file: hhtetra.tan#########################################=======================================## Sequence: HHTETRA from: 1 to: 1272# HitCount: 5## Threshold: 20# Minrepeat: 6# Maxrepeat: 6# Mismatch: No# Uniform: No##======================================= Start End Score Size Count Identity Consensus 793 936 120 6 24 93.8 acccta 283 420 90 6 23 84.8 taaccc 432 485 38 6 9 90.7 ccctaa 494 529 26 6 6 94.4 ccctaa 568 597 24 6 5 100.0 aaccct#---------------------------------------#---------------------------------------Data files NoneNotes Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.References None.Warnings None.Diagnostics None.Exit status It always exits with status 0.Known bugs None.See also Program name Description einverted Finds DNA inverted repeats equicktandem Finds tandem repeats palindrome Looks for inverted repeats in a nucleotide sequence Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.Authors This program was originally written by Richard Durbin (rd
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -