📄 helixturnhelix.txt
字号:
helixturnhelix Function Report nucleic acid binding motifsDescription helixturnhelix uses the method of Dodd and Egan and finds helix-turn-helix nucleic acid binding motifs in proteins. The helix-turn-helix motif was originally identified as the DNA-binding domain of phage repressors. One alpha-helix lies in the wide groove of DNA; the other lies at an angle across DNA.Usage Here is a sample session with helixturnhelix% helixturnhelix Report nucleic acid binding motifsInput protein sequence(s): tsw:laci_ecoliOutput report [laci_ecoli.hth]: Go to the input files for this example Go to the output files for this exampleCommand line arguments Standard (Mandatory) qualifiers: [-sequence] seqall Protein sequence(s) filename and optional format, or reference (input USA) [-outfile] report [*.helixturnhelix] Output report file name Additional (Optional) qualifiers: -mean float [238.71] Mean value (Number from 1.000 to 10000.000) -sd float [293.61] Standard Deviation value (Number from 1.000 to 10000.000) -minsd float [2.5] Minimum SD (Number from 0.000 to 100.000) -eightyseven boolean Use the old (1987) weight data Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messagesInput file format helixturnhelix reads one or more protein sequence USAs. Input files for usage example 'tsw:laci_ecoli' is a sequence entry in the example protein database 'tsw' Database entry: tsw:laci_ecoliID LACI_ECOLI STANDARD; PRT; 360 AA.AC P03023; P71309; Q47338; O09196;DT 21-JUL-1986 (Rel. 01, Created)DT 01-NOV-1997 (Rel. 35, Last sequence update)DT 15-DEC-1998 (Rel. 37, Last annotation update)DE LACTOSE OPERON REPRESSOR.GN LACI.OS Escherichia coli.OC Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;OC Escherichia.RN [1]RP SEQUENCE FROM N.A.RX MEDLINE; 78246991.RA FARABAUGH P.J.;RT "Sequence of the lacI gene.";RL Nature 274:765-769(1978).RN [2]RP SEQUENCE FROM N.A.RC STRAIN=K12 / MG1655;RX MEDLINE; 97426617.RA BLATTNER F.R., PLUNKETT G. III, BLOCH C.A., PERNA N.T., BURLAND V.,RA RILEY M., COLLADO-VIDES J., GLASNER F.D., RODE C.K., MAYHEW G.F.,RA GREGOR J., DAVIS N.W., KIRKPATRICK H.A., GOEDEN M.A., ROSE D.J.,RA MAU B., SHAO Y.;RT "The complete genome sequence of Escherichia coli K-12.";RL Science 277:1453-1474(1997).RN [3]RP SEQUENCE FROM N.A.RC STRAIN=K12 / MG1655;RA DUNCAN M., ALLEN E., ARAUJO R., APARICIO A.M., CHUNG E., DAVIS K.,RA FEDERSPIEL N., HYMAN R., KALMAN S., KOMP C., KURDI O., LEW H.,RA LIN D., NAMATH A., OEFNER P., ROBERTS D., SCHRAMM S., DAVIS R.W.;RL Submitted (NOV-1996) to the EMBL/GenBank/DDBJ databases.RN [4]RP SEQUENCE FROM N.A.RA CHEN J., MATTHEWS K.K.S.M.;RL Submitted (MAY-1991) to the EMBL/GenBank/DDBJ databases.RN [5]RP SEQUENCE FROM N.A.RA MARSH S.;RL Submitted (JAN-1997) to the EMBL/GenBank/DDBJ databases.RN [6]RP SEQUENCE OF 1-147; 159-230 AND 233-360.RX MEDLINE; 76091932.RA BEYREUTHER K., ADLER K., FANNING E., MURRAY C., KLEMM A., GEISLER N.;RT "Amino-acid sequence of lac repressor from Escherichia coli.RT Isolation, sequence analysis and sequence assembly of trypticRT peptides and cyanogen-bromide fragments.";RL Eur. J. Biochem. 59:491-509(1975).RN [7] [Part of this file has been deleted for brevity]CC between the Swiss Institute of Bioinformatics and the EMBL outstation -CC the European Bioinformatics Institute. There are no restrictions on itsCC use by non-profit institutions as long as its content is in no wayCC modified and this statement is not removed. Usage by and for commercialCC entities requires a license agreement (See http://www.isb-sib.ch/announce/CC or send an email to license@isb-sib.ch).CC --------------------------------------------------------------------------DR EMBL; V00294; CAA23569.1; -.DR EMBL; J01636; AAA24052.1; -.DR EMBL; AE000141; AAC73448.1; -.DR EMBL; U73857; AAB18069.1; ALT_INIT.DR EMBL; X58469; CAA41383.1; -.DR EMBL; U86347; AAB47270.1; ALT_INIT.DR EMBL; U72488; AAB36549.1; -.DR EMBL; U78872; AAB37348.1; -.DR EMBL; U78873; AAB37351.1; -.DR EMBL; U78874; AAB37354.1; -.DR PIR; A03558; RPECL.DR PIR; S02540; S02540.DR PDB; 1LCC; 31-JAN-94.DR PDB; 1LCD; 31-JAN-94.DR PDB; 1LTP; 31-OCT-93.DR PDB; 1TLF; 31-JUL-95.DR PDB; 1LBG; 11-JUL-96.DR PDB; 1LBH; 11-JUL-96.DR PDB; 1LBI; 11-JUL-96.DR PDB; 1LQC; 12-FEB-97.DR ECO2DBASE; H039.0; 6TH EDITION.DR ECOGENE; EG10525; LACI.DR PFAM; PF00356; lacI; 1.DR PFAM; PF00532; Peripla_BP_like; 1.DR PROSITE; PS00356; HTH_LACI_FAMILY; 1.KW Transcription regulation; DNA-binding; Repressor; 3D-structure.FT DNA_BIND 6 25 H-T-H MOTIF.FT MUTAGEN 17 17 Y->H: BROADENING OF SPECIFICITY.FT MUTAGEN 22 22 R->N: RECOGNIZE AN OPERATOR VARIANT.FT VARIANT 282 282 Y -> D (IN T41 MUTANT).FT CONFLICT 286 286 S -> L (IN AAA24052, REF. 2, 4 AND 5).FT HELIX 6 13FT TURN 14 14FT HELIX 17 24FT HELIX 32 44FT TURN 49 50SQ SEQUENCE 360 AA; 38564 MW; 4CA5A1D6 CRC32; MKPVTLYDVA EYAGVSYQTV SRVVNQASHV SAKTREKVEA AMAELNYIPN RVAQQLAGKQ SLLIGVATSS LALHAPSQIV AAIKSRADQL GASVVVSMVE RSGVEACKAA VHNLLAQRVS GLIINYPLDD QDAIAVEAAC TNVPALFLDV SDQTPINSII FSHEDGTRLG VEHLVALGHQ QIALLAGPLS SVSARLRLAG WHKYLTRNQI QPIAEREGDW SAMSGFQQTM QMLNEGIVPT AMLVANDQMA LGAMRAITES GLRVGADISV VGYDDTEDSS CYIPPSTTIK QDFRLLGQTS VDRLLQLSQG QAVKGNQLLP VSLVKRKTTL APNTQTASPR ALADSLMQLA RQVSRLESGQ//Output file format The output is a standard EMBOSS report file. The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats. By default helixturnhelix writes a 'motif' report file. Output files for usage example File: laci_ecoli.hth######################################### Program: helixturnhelix# Rundate: Sat 15 Jul 2006 12:00:00# Commandline: helixturnhelix# -sequence tsw:laci_ecoli# Report_format: motif# Report_file: laci_ecoli.hth#########################################=======================================## Sequence: LACI_ECOLI from: 1 to: 360# HitCount: 1## Hits above +2.50 SD (972.73)##=======================================Maximum_score_at at "*"(1) Score 2160.000 length 22 at residues 4->25 * Sequence: VTLYDVAEYAGVSYQTVSRVVN | | 4 25 Standard_deviations: 6.54#---------------------------------------#---------------------------------------#---------------------------------------# Total_sequences: 1# Total_hitcount: 1#---------------------------------------Data files EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA. To see the available EMBOSS data files, run:% embossdata -showall To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:% embossdata -fetch -file Exxx.dat Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata". The directories are searched in the following order: * . (your current directory) * .embossdata (under your current directory) * ~/ (your home directory) * ~/.embossdata The data files are stored in the standard EMBOSS data directory. The names are: * Ehth.dat matrix file * Ehth87.dat 1987 shorter matrix file The old (1987) data has a motif length of 20 residues, whilst the default data (Ehth.dat) has a motif length of 22 residues. With care these can be replaced to suit your data sets. If the files are placed in the following directories they will be used in preference to the files in the EMBOSS distribution data directory: * . (your current directory) * .embossdata * ~/ (your home directory) * ~/.embossdata Here is the default file:# Amino acid counts for 91 Helix-turn-helix (presumed) protein motifs# from Dodd IB and Egan JB (1990) Nucl. Acids. Res. 18:5019-5026.#Sample: 91 aligned sequences## R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Total Exp# - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ----- --- A 2 1 3 14 10 12 75 6 15 9 1 1 4 3 8 15 4 4 4 11 0 10 212 995 C 0 0 1 1 0 0 0 0 0 3 3 1 1 0 0 0 0 0 0 1 0 3 14 106 D 0 1 0 1 14 0 0 14 1 0 5 0 1 2 0 0 0 0 1 1 0 2 43 556 E 4 5 0 11 26 0 0 16 9 3 3 0 3 12 13 0 0 2 0 1 13 6 127 669 F 4 0 4 0 0 4 0 1 0 10 0 0 0 0 1 0 0 1 1 1 22 0 49 358 G 9 7 1 4 0 0 8 0 0 0 50 0 6 0 7 1 0 3 1 1 0 4 102 761 H 4 3 1 1 2 0 0 3 2 0 5 0 3 3 0 2 0 2 4 5 0 2 42 225 I 10 0 13 3 2 15 0 4 9 4 0 17 0 2 0 1 31 1 4 8 16 1 141 583 K 4 4 6 11 12 1 1 14 11 0 5 2 2 7 2 1 0 5 8 4 5 15 120 516 L 16 1 17 0 1 35 0 3 12 31 0 22 0 2 1 1 22 1 1 12 20 0 198 954 M 7 0 2 1 1 1 0 0 5 7 1 10 0 0 2 0 2 0 0 2 0 1 42 275 N 0 8 0 1 0 0 0 2 1 1 14 0 8 1 4 2 0 4 9 0 0 11 66 383 P 1 6 0 1 0 0 0 0 0 0 0 0 3 13 7 0 0 0 0 0 0 3 34 403 Q 2 1 21 9 11 0 0 9 8 0 0 2 1 17 7 12 0 3 12 5 3 9 132 437 R 9 10 14 9 5 0 1 16 10 0 1 0 1 17 8 7 0 17 28 3 0 16 172 609 S 2 17 0 8 4 1 6 1 2 2 3 0 37 1 25 5 0 29 3 0 1 5 152 552 T 6 24 3 12 1 5 0 2 2 4 0 5 20 4 3 39 0 4 1 0 4 3 142 512 V 7 3 1 1 2 16 0 0 2 12 0 29 0 5 3 3 32 0 7 8 7 0 138 724 W 2 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 2 21 0 0 27 105 Y 2 0 4 3 0 1 0 0 2 4 0 1 1 2 0 2 0 15 5 7 0 0 49 267Notes None.References 1. Dodd I.B., Egan J.B. (1987) "Systematic method for the detection of potential lambda cro-like DNA-binding regions in proteins." J. Mol. Biol. 194: 557-564. 2. Dodd I.B., Egan J.B. (1990) "Improved detection of helix-turn-helix DNA-binding motifs in protein sequences." Nucleic Acids Res. 18: 5019-5026.Warnings The program will warn you if the data file is not mathematically accurate.Diagnostic Error Messages None.Exit status It exits with status 0 unless an error is reported.Known bugs None.See also Program name Description antigenic Finds antigenic sites in proteins digest Protein proteolytic enzyme or reagent cleavage digest epestfind Finds PEST motifs as potential proteolytic cleavage sites fuzzpro Protein pattern search fuzztran Protein pattern search after translation garnier Predicts protein secondary structure hmoment Hydrophobic moment calculation oddcomp Find protein sequence regions with a biased composition patmatdb Search a protein sequence with a motif patmatmotifs Search a PROSITE motif database with a protein sequence pepcoil Predicts coiled coil regions pepnet Displays proteins as a helical net pepwheel Shows protein sequences as helices preg Regular expression search of a protein sequence pscan Scans proteins using PRINTS sigcleave Reports protein signal cleavage sites tmap Displays membrane spanning regionsAuthor(s) Alan Bleasby (ajb
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -