📄 transeq.txt
字号:
GGPVVEVMHDDIRVGANRFGRLPADLEHAARIDFRDARRQAAVVVLALHPDRPLWRVDVDVVQVPLHVFHPPVACGLRGAAEQQGLPVGRLGPGGDGRVLREETMAGLDEGPAGGRIDAGEVRRDHHLPLCHVTLHLRHLRLAGGQAGDRRPPAVAVATGDGAIQLGGAGAHHGGEDHVGARLVDALDGALQVVVGGIQRNVDFLEHRAAVLAIQVAHHMVAFPRIDVVRADEHHPLAVVANQVRRQRRTVLVRRRTAVDDVRRILEALVGGRVAEQRVGALDHRHHRLARVRHVAAHEEPYPPVANEVLGAQPIAVRVAAGVLGQRFDRATADAALAVQLLDREQCAIRVRALDIGGDAGFGEQQADQRPLLVRSHPFPLL*LSLSAVRSEPARAPGSPGRSCSSRRPAGGDRADARPAVP Output files for usage example 7 File: amir.pep>PAAMIR_1 Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulationVPLAEHLLDHHQPGEASLEHERVRFVRR*ATVTGEETDGIAPGAAADRPAVLRNRRHRRYVRRGLHSQPGGTVPRGLLHVAHAQGGDAGGRARRRAALLPDPLRGLRVFAEHRLRRSGAEX Output files for usage example 8 File: mito.pep>NC_001321.1_1 Balaenoptera physalus mitochondrion, complete genomeVNY*SAHDHNMTEVSYIWYFFIFFGGLARTPLWP**VSSQSDKL*LGLDVFVIWLAQPTCAVKLMVTGHSTPLFPPGSKNCMS**TKPPSFHTMLTLCLDIHHPP*QARP*I*KPFYL*INTKSDTSPMMKMHERHPYPMRWCSLNTYKA*HWKCLDGSSQPHWH**FGPSLSISS*QTYTCKYPHPSENAL*IMKIK*SGYQAR*H*QLTTPRLATPPRDTAVMKIKL*TKVRLSHVNL*VGKLRASHRGHTIDPN**KHGVKSVKEPHEMKSNLN*AVKSPN*N*AKLRKWL*YNLITRQL*SKLGLDTPLCLVVNPNSHKT*LFA*VLLATA*NSKDLAVPHTHLEEPVL*PMNPDQPHQPLLLQSMYRHLQQTLKGEK*A*PSYMKTLGQGVTHGLGSNGLHFLS*EHPLYSHESFYET*KLKEDLVVNQEQSAWLNKAM*ARTHRPSPSSSTPAMNPSSLTQAKQLYE**QVVT**AYRKVCLDKT*YSLNKACSLHLEDSTARVYLELALAHTLPTSTTTNQSNKTFTIPSKY***KFKYQWRY*DSTV*KDE*KT*K**KAKLTTCTFCMMT*LVMN*Q*DLKLNYPKPDELLMSST*NELIYVAK*WEDL*VEVKSLTSLVMAGCPWKESQFNIK*Y*KPMPSLNVYLTVNLK*YSFLEMGTTLT*E*NQT*T*LA*KQPSIKKAFKLDNKMMF*FQH*VNQLLAWLLD*SMQM*KQYC*YE*QEIFLLAQAYTSNW*YTDN*QQMNKTQH*IIY*NTVNPTQACIKE*LKKVKGTRQTQTPPVYQKHHL*HNQY*STACPVTNR*TAAVSWPCKGSMITCSLI*DLYEWPHEGFTVSYF*SVKLTSPW*GGDNKM*REDPMELQLINPKTMTLNHQGMTKPYMGWQFRLGWPRSTKNPPSD*NLGPLAKVQYHLLIQSFDQRNKLP*G*QRNPILESMSTMGFTTSMLDQDILMVQLLL*VRLFND*SPTWSEF*PE*S*SVSIYYAFLPVRKDK*NKANFKQAPSNN*WPSLNLMIKRKQTCP*PGPCWGG*VR*LHKT*TFTP*GSNPLPNKMFMINILTLILPILLAVAFLTLVERKILGYMQFRKGPNIVGPHGLLQPFADAIKLFTKEPLRPATSSTTMFIIAPVLALTLALTMWSPLPMPYPLINMNLGVLFMLAMSSLAVYSILWSGWASNSKYALIGALRAVAQTISYEVTLAIILLSVLLMNGSYTLSTLATTQEQLWLLFPSWPLAMMWFISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFALFFLAEYANIIMMNMLTAILFLGTFHNPHNPELYTANLIIKTLLLTMSFLWIRASYPRFRYDQLMHLLWKNFLPLTLALCMWHISLPIMTASIPPQT*EMCLMKELLW*SK***PKSSYF*NN*NRTYP*EFKVLRATMLHYNLQ*GQLNKLSGPYPENVGSYPSHTNKPINPYYPPDNPYP*YNNGSHQLSLTISLNWLRNEHNSLHPYHNKKSYSPGH*SFYQVPPNTSHCFRTPHNSSHH*LNAL*PMNYYKTI*PNSIHTHNSSPSHQTGISPLPLLSS*SNT*YPPNH*PNPINMTKTSSLINPMPNFTIN*PTPNINHILTFHLN**L*WTKPNTTSKNHSLLINCPH*MNNGHSTM*SNPNITKSTNLHHNNLHHIYIIYPKLNYHYIVTVSNLK*NTRHHNPYHTHFTLD**TPTTIGVYTQMNNYS*TNKKWYTHCTNIHSHYSITQPMLLYTSYLLHSTNTISLHK*YKNKMTIQLHKTNSSPTNSNRNFHYATTPHTNTLNPTMGV*VKP*P*AFKALSKYNLLNSCPM*TA*LYLTSIECKSNALIKLNPH*IGGMHLPRIFS*QLNTLINWLQSTSPAA*KK*REKSRQDLKLLPWICNSKWSFTTGLGKK*TQPLSLDLQSNTYSAILPMFMNRWLFSTNHKDIGTLYLLFGAWAGMVGTGLSLLIRAELGQPGTLIGDDQVYNVLVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLMASSMIEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMTQYQTPLFVWSVLVTAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMVSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPALMWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFVHWFPLFSGYTLNTTWAKIHFMIMFVGVNLTFFPQHFLGLSGMPRRYSDYPDAYTTWNTISSMGSFISLTAVMLMIFIIWEAFTSKREVLAVDLTSTNLEWLNGCPPPYHTFEEPAFVNPKWS*KEGIEPSPIGFKPTS*LLCLSL*T*Y**NLM*LCQS*VTSENPVYLHGMSIPT*FP*CSITHH**APTLSRSYTNNRFSN*LFSSLHYYPNAYNQINTY*YN*RP*S*NCLNYPPSHYLNFNCLAFITDPLHN*RSQ*PLPHCKNN*SPMMLKLWVYRLR*PKLRLLYNPNI*PKA**TTII*S**PSCLTY*NNNPNISLI**RTPLMGRTLLGPKN*CNP*TPKPNNLNINTT*PILWTML*DLRLKPQFHTNCP*TSTP*SLWKMICINTMTSL*S*ISINLLS**L*VYNSP*WYATI*YINMTPYYSINTLNPLCIIPIKNLKALLFP*PQTSTYQNTKTTSSLKHHMNENLFAPFMIPVMLGIPITTLIIILPSMLFPAPNRLINNRTIAIQQWLTKLTSKQLMNVHSPKGQTWSLMLISLFLFIASTNLLGMLPHSFTPTTQLSMNVGMAIPLWAGTVTTGFRNKTKMSLAHLLPQGTPTFLIPMLVIIETISLFIQPVAWAVRLTANITAGHLLMHLIGETTLALMNINLFSAFITFTILALLTILEFAVALIQAYVFTLLVSLYLHDNT*WPTKPTHTT**TPALDPSPELYQPF**HQA*LYDFTSTQYSY*L*ACQQMF*QYTNDGEMSSEKAPSKAIMHQPSK*AYDTE*FYLSSQKSYFSQASSEPSTTQALPLLQN*ADVDHQQASAL*IP*KFPFSTPPYY*PLAYLLPEPTMAW*KETANTYFKHSSSQLH*ASTSPYYKHQSTTKPLSQSQTESTAPPSL*PQAFMGYM*SLDLLSLSSVSYVK*NSTSHQTTTLALNVPLDTDIS*TSYDYFFMYLSIDEVPSPFSINKYNWLPIS*FRCTPKKNNKPSTNTTNKYNTSPTTRIHRLLTSTTKRMR*KNKPMWMRIWPH*ISPPTLLHKILLGGHYFPSLWL*NRSLTPPSLSNSVKQPKHNTHNSLILNLPTSSQPSLWMNS**P*MSWMWYLV*DKTSDFDPLDCDQIHNYQMTLIHMNILMAFSMSLMGLLMYRSHLMSALLCLEGMMLSLFVLAALTILSSHFTLANMMPIILLVFAACEAAIGLALLVMVSNTYGTDYVQNLNLLQC*NLLFLQSY*YP*PDYQKMT*SELTPQPTVY*LASQAFSSSINSTTTALTTH*YSSPTPFLPHSWS*QYDSFP*Y**QVNPISSKNHQSEKNSTLRY*SHYKPS*L*HLLPLN*SYFMSYLKPH*SLPLSLSLAGATKQNDSMPDYTSYSMH*LDLSHY**H*YIYKMQQDP*TFYSYNTELNHYLRPDPTSSYD*PA**PS**KYLSMDYTFDCPKHT*KPPLQAP*SLQPYY*NLEAMAYYELHPYSIP*QNT*HTHFLYSLFEE*S*PALSVYVKQT*NH*LHIPQLVT*HSSSQLSSSKPPEAM*GPLP**LPTASHPPYYSVWQTRTTNAFMAEP*FCPEAYKSFYH**PVDDY*QA*QILHYPQPST*SENYS*SCRSSHDQIPLFS**EQML*LLLSTLYMY*S*HNVANTHTTSMMSPLPSHESMP**PYTLFPSCSYH*TLKSS*ALSTVSMV*K*R*FVKLTMEDQNFLLTEKVLQELLIHAPTPNSCGFFKLLQDSSYPLVLGAKKLVQLQMKVMNLFTSFTLLTLLILTTPIMMSHTGSHVNNKYQSYVKNIVFCAFITSLVPAMVYLHTNQETLISNWHWITIQTLKLTLSFKMDYFSLMFMPVALFITWSIMEFSMWYMHSDPYINQFFKYLLLFLITMLILVTANNLFQLFIGWEGVGIMSFLLIGWWFGRTDANTAALQAILYNRIGDIGLLASMAWFLSNMNTWDLEQIFMLNQNPLNFPLMGLVLAAAGKSAQFGLHPWLPSAMEGPTPVSALLHSSTMVVAGIFLLVRFYPLMENNKLIQTVTLCLGAITTLFTAICALTQNDIKKIIAFSTSSQLGLMMVTIGLNQPYLAFLHICTHAFFKAMLFLCSGSIIHNLNNEQDIRKMGGLFKALPFTTTALIIGCLALTGMPFLTGFYSKDPIIEAATSSYTNAWALLLTLIATSLTAVYSTRIIFFALLGQPRFPPSTTINENNPLLINPIKRLLVGSIFAGFILSNSIPPMTTPLMTMPLHLKLTALAMTTLGFIIAFEINLDTQNLKHKHPSNSFKFSTLLGYFPTIMHRLPPHLDLLMSQKLATSLLDLTWLETILPKTTALIQLKASTLTSNQQGLIKLYFLSFLITITLSMILFNYPE*SP**LQH**MKTNP*QSPTKHHNYMMPQSL*PPH*KPQNPQYHKQPSPLVHQTQT*SSPPHSSKHKSQLKTPPPTLKQMLLVQLY*KPKPQDTVQ*P*LLYNQMLPAFPPNKSKTPLTPKTNHQNSK*LHIQHHHPQSTLNPHK*VKALKKPPQN*LQK*YLKWKQYTLSLFSHGLQPWPMTWKIIVVIQLQEHQWPTSEKHTH**KSSTTHSSISPPHQMSLHDGTSAPYSASA*LYKS*QAYS*QYTTHQTQQPPSHQSHTSAETWITAELSDTYMQMGLLYSSSASTLT*DEAYTTAPTPSEKHEMLELFYYSQL*PPHS*ATSCPEDKYHSEAQL*SLTSYQQSHTLVPP*SNESEAVSL*MKQH*HAFLPFTLSSPSSS*H*QLSTLFSFTKQDPTTPQASHPT*MKSHSTPTTQLKTF*VPYY*S*SY*Y*PYSHPTYLETQTTMPQQTHSVPQHTLNQNGIFYSHTQSYDQSPTN*AES*PYYSQS*S*PSSQYSTHPINEA*YFDPLASSCSES*SQIY*P*HGSAANQ*NTPT*L*ANSHPSSISS*F*Y*YQ*LVLS*TNL*NEESL*YN*MPRFCKPEKET*HTSL*LKEEVLHSTISTQSWSST*TIPWKSMLYNNH*TTVLCPYWK*LALLDIIM*LVHACTST*LMASFHGYEQMYMLCMIVHSIIFTTSSWSSY*ILLILHIT*YVLMVQ*RMFLCIP*SI*IKWFLWPLH*ITSLVSMPRETSNPLG*DPSSRTGPITRGGSYLMIFM*HLVLTSGPY*LKIAHSFPLNKTSRW One or more peptide sequences are written out. The names of the resulting protein sequences are formed from the name of the input nucleic acid sequence with '_' and the translation frame appended to it. Thus a nucleic acid sequence with the name 'XYZ' franslated in all 6 frame would produce protein sequences with the names: 'XYZ_1', 'XYZ_2', 'XYZ_3', 'XYZ_4', 'XYZ_5', 'XYZ_6'. If regions are specified, they are taken to be translated in frame 1 and so the output name would be 'XYZ_1'.Data files EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA. To see the available EMBOSS data files, run:% embossdata -showall To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:% embossdata -fetch -file Exxx.dat Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata". The directories are searched in the following order: * . (your current directory) * .embossdata (under your current directory) * ~/ (your home directory) * ~/.embossdata The EMBOSS REBASE restriction enzyme data files are stored iin directory 'data/REBASE/*' under the EMBOSS installation directory. These files must first be set up using the program 'rebaseextract'. Running 'rebaseextract' may be the job of your system manager. The data files are stored in the REBASE directory of the standard EMBOSS data directory. The names are: * embossre.enz Cleavage information * embossre.ref Reference/methylation information * embossre.sup Supplier information The column information is described at the top of the data files The reported enzyme from any one group of isoschizomers (the prototype) is specified in the REBASE database and the information is held in the data file 'embossre.equ'. You may edit this file to set your own preferred prototype, if you wish. The format of the file "embossre.equ" is Enzyme-name Prototype-name i.e. two columns of enzyme names separated by a space. The first name of the pair of enzymes is the name that is not preferred and the second is the preferred (prototype) name.Notes The reverse frame '-1' is defined as the translation you get when you use the reverse-complement of the sequence with the same codon phase as the codon in frame '1'. Thus the sequence ACTGG in frame 1 is the translation of the codons ACT,GG; the translation of frame -1 uses these same codons, reverse complemented: forward sense ACT GG reverse sense TGA CC reverse-complement CC AGT frame -1 translation S Frame -1 is the translation of CCAGT (the reverse complement of ACTGG) using the codon 'AGT' (the first bases 'CC' are ignored). The result is the peptide 'S'. Similarly frame -2 is the phase used by frame 2, 'CAG T' (the first base 'C' is ignored). The last base cannot be successfully translated and is output as the unknown residue 'X'. The result is the peptide 'QX'. Frame -3 is the phase used by frame 3, 'CCA GT'. The last two bases will translate to 'V' as it does not matter what the next base is. (GTA, GTC, GTG, GTT all code for 'V'). The result is the peptide 'PV'. The alternative way of generating the reverse translation frames used by some people is that frame -1 is made by taking the frame '1' of the reverse complement. There is no correspondance betwen the codons used in frame 1 and -1, 2 and -2, 3 and -3; the codons used change with the length modulus 3. There does not appear to be a convention on which definition to use. The Staden package uses the same convention as this program. The GCG package sneakily avoids the problem by naming the frames using letters (a, b, c, d, e, f) If you really need to define frame -1 as the frame given when you reverse complement the sequence and then start translating at the first frame in the resulting sequence, then use the '-alternative' qualifier. (Reverse sense translations are a biological nonsense, really, but can be very useful in practice.)References None.Warnings When translating using non-standard genetic code table, always check the table carefully for deviations from your particular organism's code. When using the '-regions' option, you should always leave the '-frames' option at the default of frame '1'. If you change the frame while specifying a region to translate, then the regions will be offset by 1 or 2 bases, which is not what you want.Diagnostic Error Messages Several warning messages about malformed region specifications: * Non-digit found in region ... * Unpaired start of a region found in ... * Non-digit found in region ... * The start of a pair of region positions must be smaller than the end in ...Exit status It exits with status 0, unless a region is badly constructed.Known bugs When using the '-regions' option, you should always leave the '-frames' option at the default of frame '1'. If you change the frame while specifying a region to translate, then the regions will be offset by 1 or 2 bases, which is not what you want.See also Program name Description backtranambig Back translate a protein sequence to ambiguous codons backtranseq Back translate a protein sequence coderet Extract CDS, mRNA and translations from feature tables plotorf Plot potential open reading frames prettyseq Output sequence with translated ranges remap Display sequence with restriction sites, translation etc showorf Pretty output of DNA translations showseq Display a sequence with features, translation etc sixpack Display a DNA sequence with 6-frame translation and ORFsAuthor(s) Gary Williams (gwilliam
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -