📄 extractfeat.txt
字号:
properties of a feature that has been output, this lets you specify one or more tag names that should be added to the output sequence Description text, together with their values (if any). For example, if this is set to be 'gene', then if any output feature has the tag (for example) '/gene=BRCA1' associated with it, then the text '(gene=BRCA1)' will be added to the Description line. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default no feature tag is displayed. You can set this to match any feature tag you wish to show. The tag may be wildcarded by using '*'. If you wish to extract more than one tag, separate their names with the character '|', eg: gene | label (Any string is accepted) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outseq" associated qualifiers -osformat2 string Output seq format -osextension2 string File name extension -osname2 string Base file name -osdirectory2 string Output directory -osdbname2 string Database name to add -ossingle2 boolean Separate file for each entry -oufo2 string UFO features -offormat2 string Features format -ofname2 string Features file name -ofdirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messagesInput file format extractfeat reads normal sequences with features. Feature tables in Swissprot, EMBL, GFF, etc. format can be added using '-ufo featurefile' on the command line. Input files for usage example 'tembl:hsfau1' is a sequence entry in the example nucleic acid database 'tembl' Database entry: tembl:hsfau1ID HSFAU1 standard; DNA; HUM; 2016 BP.XXAC X65921; S45242;XXSV X65921.1XXDT 13-MAY-1992 (Rel. 31, Created)DT 21-JUL-1993 (Rel. 36, Last updated, Version 5)XXDE H.sapiens fau 1 geneXXKW fau 1 gene.XXOS Homo sapiens (human)OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN [1]RP 1-2016RA Kas K.;RT ;RL Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases.RL K. Kas, University of Antwerp, Dept of Biochemistry T3.22,RL Universiteitsplein 1, 2610 Wilrijk, BELGIUMXXRN [2]RP 1-2016RX MEDLINE; 92412144.RA Kas K., Michiels L., Merregaert J.;RT "Genomic structure and expression of the human fau gene: encoding theRT ribosomal protein S30 fused to a ubiquitin-like protein.";RL Biochem. Biophys. Res. Commun. 187:927-933(1992).XXDR SWISS-PROT; P35544; UBIM_HUMAN.DR SWISS-PROT; Q05472; RS30_HUMAN.XXFH Key Location/QualifiersFHFT source 1..2016FT /db_xref="taxon:9606"FT /organism="Homo sapiens"FT /clone_lib="CML cosmid"FT /clone="15.1"FT mRNA join(408..504,774..856,951..1095,1557..1612,1787..>1912)FT /gene="fau 1"FT exon 408..504FT /number=1FT intron 505..773FT /number=1FT exon 774..856 [Part of this file has been deleted for brevity]FT RAKRRMQYNRRFVNVVPTFGKKKGPNANS"FT intron 857..950FT /number=2FT exon 951..1095FT /number=3FT intron 1096..1556FT /number=3FT exon 1557..1612FT /number=4FT intron 1613..1786FT /number=4FT exon 1787..>1912FT /number=5FT polyA_signal 1938..1943XXSQ Sequence 2016 BP; 421 A; 562 C; 538 G; 495 T; 0 other; ctaccatttt ccctctcgat tctatatgta cactcgggac aagttctcct gatcgaaaac 60 ggcaaaacta aggccccaag taggaatgcc ttagttttcg gggttaacaa tgattaacac 120 tgagcctcac acccacgcga tgccctcagc tcctcgctca gcgctctcac caacagccgt 180 agcccgcagc cccgctggac accggttctc catccccgca gcgtagcccg gaacatggta 240 gctgccatct ttacctgcta cgccagcctt ctgtgcgcgc aactgtctgg tcccgccccg 300 tcctgcgcga gctgctgccc aggcaggttc gccggtgcga gcgtaaaggg gcggagctag 360 gactgccttg ggcggtacaa atagcaggga accgcgcggt cgctcagcag tgacgtgaca 420 cgcagcccac ggtctgtact gacgcgccct cgcttcttcc tctttctcga ctccatcttc 480 gcggtagctg ggaccgccgt tcaggtaaga atggggcctt ggctggatcc gaagggcttg 540 tagcaggttg gctgcggggt cagaaggcgc ggggggaacc gaagaacggg gcctgctccg 600 tggccctgct ccagtcccta tccgaactcc ttgggaggca ctggccttcc gcacgtgagc 660 cgccgcgacc accatcccgt cgcgatcgtt tctggaccgc tttccactcc caaatctcct 720 ttatcccaga gcatttcttg gcttctctta caagccgtct tttctttact cagtcgccaa 780 tatgcagctc tttgtccgcg cccaggagct acacaccttc gaggtgaccg gccaggaaac 840 ggtcgcccag atcaaggtaa ggctgcttgg tgcgccctgg gttccatttt cttgtgctct 900 tcactctcgc ggcccgaggg aacgcttacg agccttatct ttccctgtag gctcatgtag 960 cctcactgga gggcattgcc ccggaagatc aagtcgtgct cctggcaggc gcgcccctgg 1020 aggatgaggc cactctgggc cagtgcgggg tggaggccct gactaccctg gaagtagcag 1080 gccgcatgct tggaggtgag tgagagagga atgttctttg aagtaccggt aagcgtctag 1140 tgagtgtggg gtgcatagtc ctgacagctg agtgtcacac ctatggtaat agagtacttc 1200 tcactgtctt cagttcagag tgattcttcc tgtttacatc cctcatgttg aacacagacg 1260 tccatgggag actgagccag agtgtagttg tatttcagtc acatcacgag atcctagtct 1320 ggttatcagc ttccacacta aaaattaggt cagaccaggc cccaaagtgc tctataaatt 1380 agaagctgga agatcctgaa atgaaactta agatttcaag gtcaaatatc tgcaactttg 1440 ttctcattac ctattgggcg cagcttctct ttaaaggctt gaattgagaa aagaggggtt 1500 ctgctgggtg gcaccttctt gctcttacct gctggtgcct tcctttccca ctacaggtaa 1560 agtccatggt tccctggccc gtgctggaaa agtgagaggt cagactccta aggtgagtga 1620 gagtattagt ggtcatggtg ttaggacttt ttttcctttc acagctaaac caagtccctg 1680 ggctcttact cggtttgcct tctccctccc tggagatgag cctgagggaa gggatgctag 1740 gtgtggaaga caggaaccag ggcctgatta accttccctt ctccaggtgg ccaaacagga 1800 gaagaagaag aagaagacag gtcgggctaa gcggcggatg cagtacaacc ggcgctttgt 1860 caacgttgtg cccacctttg gcaagaagaa gggccccaat gccaactctt aagtcttttg 1920 taattctggc tttctctaat aaaaaagcca cttagttcag tcatcgcatt gtttcatctt 1980 tacttgcaag gcctcaggga gaggtgtgct tctcgg 2016// Input files for usage example 5 'tsw:*' is a sequence entry in the example protein database 'tsw'Output file format The sequences of the specified features are written out. The ID name of the sequence is formed from the original sequence name with the start and end positions of the feature appended to it. So if the feature came from a sequence with an ID name of 'XYZ' from positions 10 to 22, then the resulting ID name of the feature sequence will be 'XYZ_10_22' The name of the type of feature is added to the start of the description of the sequence in brackets, e.g.: '[exon]'. The sequence is written out as a normal sequence. If the feature is in the reverse sense of a nucleic acid sequence, then it is reverse-complemented before being written.Data files None.Notes If a feature is specified as being a part of a different sequence entry in a database, then this feature is ignored. If you are extracting 'joined' features and one of more of the component features is in a different sequence entry, then the whole joined feature is ignored.References None.Warnings None.Diagnostic Error Messages If the end position of the sequence to be written is less than the start position, then the warning message "Extraction region end less than start for feature type [start-end] in ID name" is written and no sequence is output.Exit status It always exits with status 0.Known bugs None.See also Program name Description biosed Replace or delete sequence sections codcopy Reads and writes a codon usage table coderet Extract CDS, mRNA and translations from feature tables cutseq Removes a specified section from a sequence degapseq Removes gap characters from sequences descseq Alter the name or description of a sequence entret Reads and writes (returns) flatfile entries extractalign Extract regions from a sequence alignment extractseq Extract regions from a sequence listor Write a list file of the logical OR of two sets of sequences makenucseq Creates random nucleotide sequences makeprotseq Creates random protein sequences maskfeat Mask off features of a sequence maskseq Mask off regions of a sequence newseq Type in a short new sequence noreturn Removes carriage return from ASCII files notseq Exclude a set of sequences and write out the remaining ones nthseq Writes one sequence from a multiple set of sequences pasteseq Insert one sequence into another revseq Reverse and complement a sequence seqret Reads and writes (returns) sequences seqretsplit Reads and writes (returns) sequences in individual files showfeat Show features of a sequence skipseq Reads and writes (returns) sequences, skipping first few splitter Split a sequence into (overlapping) smaller sequences trimest Trim poly-A tails off EST sequences trimseq Trim ambiguous bits off the ends of sequences twofeat Finds neighbouring pairs of features in sequences union Reads sequence fragments and builds one sequence vectorstrip Strips out DNA between a pair of vector sequences yank Reads a sequence range, appends the full USA to a list fileAuthor(s) Gary Williams (gwilliam
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -