📄 maskseq.txt

📁 emboss的linux版本的源代码
💻 TXT
字号:
                                  maskseq Function   Mask off regions of a sequenceDescription   This simple editing program allows you to mask off regions of a   sequence with a specified letter.   Why would you wish to do this? It is common for database searches to   mask out low-complexity or biased composition regions of a sequence so   that spurious matches do not occur. It is just possible that you have   a program that has reported such biased regions but which has not   masked the sequence itself. In that case, you can use this program to   do the masking.   You may find other uses for it.   Some non-EMBOSS programs (for example FASTA) are capable of treating   lower-case regions as if they are masked. maskseq can mask a region to   lower-case instead of replacing the sequence with 'N's or 'X's if you   use the qualifier '-tolower' or use a space character as the masking   character.Usage   Here is a sample session with maskseq   Mask off bases 10 to 12 from a sequence 'prot.fasta' and write to the   new sequence file 'prot2.seq':% maskseq prot.fasta prot2.seq -reg=10-12 Mask off regions of a sequence.   Go to the input files for this example   Go to the output files for this example   Example 2   Mask off bases 20 to 30 from a sequence 'prot.fasta' using the   character 'x' and write to the new sequence file 'prot2.seq':% maskseq prot.fasta prot2.seq -reg=20-30 -mask=x Mask off regions of a sequence.   Go to the output files for this example   Example 3   Mask off the regions 20 to 23, 34 to 45 and 88 to 90 in 'prot.fasta':% maskseq prot.fasta prot2.seq -reg=20-23,34-45,88-90 Mask off regions of a sequence.   Go to the output files for this example   Example 4   Change to lower-case the regions 20 to 23, 34 to 45 and 88 to 90 in   'prot.fasta':% maskseq prot.fasta prot2.seq -reg=20-23,34-45,88-90 -tolower Mask off regions of a sequence.   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequence]          sequence   Sequence filename and optional format, or                                  reference (input USA)   -regions            range      [None] Regions to mask.                                  A set of regions is specified by a set of                                  pairs of positions.                                  The positions are integers.                                  They are separated by any non-digit,                                  non-alpha character.                                  Examples of region specifications are:                                  24-45, 56-78                                  1:45, 67=99;765..888                                  1,5,8,10,23,45,57,99  [-outseq]            seqout     [.] Sequence filename and                                  optional format (output USA)   Additional (Optional) qualifiers (* if not always prompted):   -tolower            toggle     [N] The region can be 'masked' by converting                                  the sequence characters to lower-case, some                                  non-EMBOSS programs e.g. fasta can                                  interpret this as a masked region. The                                  sequence is unchanged apart from the case                                  change. You might like to ensure that the                                  whole sequence is in upper-case before                                  masking the specified regions to lower-case                                  by using the '-supper' flag.*  -maskchar           string     ['X' for protein, 'N' for nucleic] Character                                  to use when masking.                                  Default is 'X' for protein sequences, 'N'                                  for nucleic sequences.                                  If the mask character is set to be the SPACE                                  character or a null character, then the                                  sequence is 'masked' by changing it to                                  lower-case, just as with the '-lowercase'                                  flag. (Any string from 1 to 1 characters)   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of the sequence to be used   -send1              integer    End of the sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outseq" associated qualifiers   -osformat2          string     Output seq format   -osextension2       string     File name extension   -osname2            string     Base file name   -osdirectory2       string     Output directory   -osdbname2          string     Database name to add   -ossingle2          boolean    Separate file for each entry   -oufo2              string     UFO features   -offormat2          string     Features format   -ofname2            string     Features file name   -ofdirectory2       string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   maskseq reads in a single sequence USA.  Input files for usage example  File: prot.fasta>FASTA F00001 FASTA FORMAT PROTEIN SEQUENCEACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWY   You can specify a file of ranges to mask out by giving the '-regions'   qualifier the value '@' followed by the name of the file containing   the ranges. (eg: '-regions @myfile').   The format of the range file is:     * Comment lines start with '#' in the first column.     * Comment lines and blank lines are ignored.     * The line may start with white-space.     * There are two positive (integer) numbers per line separated by one       or more space or TAB characters.     * The second number must be greater or equal to the first number.     * There can be optional text after the two numbers to annotate the       line.     * White-space before or after the text is removed.   An example range file is:     _________________________________________________________________# this is my set of ranges12   23 4   5       this is like 12-23, but smaller67   10348   interesting region     _________________________________________________________________Output file format   maskseq writes s single masked sequence file.  Output files for usage example  File: prot2.seq>FASTA F00001 FASTA FORMAT PROTEIN SEQUENCEACDEFGHIKXXXPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWY  Output files for usage example 2  File: prot2.seq>FASTA F00001 FASTA FORMAT PROTEIN SEQUENCEACDEFGHIKLMNPQRSTVWxxxxxxxxxxxMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWY  Output files for usage example 3  File: prot2.seq>FASTA F00001 FASTA FORMAT PROTEIN SEQUENCEACDEFGHIKLMNPQRSTVWXXXXEFGHIKLMNPXXXXXXXXXXXXGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHXXXMNPQRSTVWY  Output files for usage example 4  File: prot2.seq>FASTA F00001 FASTA FORMAT PROTEIN SEQUENCEACDEFGHIKLMNPQRSTVWyacdEFGHIKLMNPqrstvwyacdefGHIKLMNPQRSTVWYACDEFGHIKLMNPQRSTVWYACDEFGHiklMNPQRSTVWYData files   None.Notes   None.References   None.Warnings   You can mask out a complete sequence.Diagnostic Error Messages   Several warning messages about malformed region specifications:     * Non-digit found in region ...     * Unpaired start of a region found in ...     * Non-digit found in region ...     * The start of a pair of region positions must be smaller than the       end in ...Exit status   It exits with status 0, unless a region is badly constructed.Known bugs   None.See also   Program name                         Description   biosed       Replace or delete sequence sections   codcopy      Reads and writes a codon usage table   cutseq       Removes a specified section from a sequence   degapseq     Removes gap characters from sequences   descseq      Alter the name or description of a sequence   entret       Reads and writes (returns) flatfile entries   extractalign Extract regions from a sequence alignment   extractfeat  Extract features from a sequence   extractseq   Extract regions from a sequence   listor       Write a list file of the logical OR of two sets of sequences   makenucseq   Creates random nucleotide sequences   makeprotseq  Creates random protein sequences   maskfeat     Mask off features of a sequence   newseq       Type in a short new sequence   noreturn     Removes carriage return from ASCII files   notseq       Exclude a set of sequences and write out the remaining ones   nthseq       Writes one sequence from a multiple set of sequences   pasteseq     Insert one sequence into another   revseq       Reverse and complement a sequence   seqret       Reads and writes (returns) sequences   seqretsplit  Reads and writes (returns) sequences in individual files   skipseq      Reads and writes (returns) sequences, skipping first few   splitter     Split a sequence into (overlapping) smaller sequences   trimest      Trim poly-A tails off EST sequences   trimseq      Trim ambiguous bits off the ends of sequences   union        Reads sequence fragments and builds one sequence   vectorstrip  Strips out DNA between a pair of vector sequences   yank         Reads a sequence range, appends the full USA to a list fileAuthor(s)   Gary Williams (gwilliam
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -