📄 diffseq.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
                                  diffseq Function   Find differences between nearly identical sequencesDescription   diffseq takes two overlapping, nearly identical sequences and reports   the differences between them, together with any features that overlap   with these regions. GFF files of the differences in each sequence are   also produced.   diffseq finds the region of overlap of the input sequences and then   reports differences within this region, like a local alignment.   The start and end positions of the overlap are reported.   diffseq should be of value when looking for SNPs, differences between   strains of an organism and anything else that requires the differences   between sequences to be highlighted.   The sequences can be very long. The program does a match of all   sequence words of size 10 (by default). It then reduces this to the   minimum set of overlapping matches by sorting the matches in order of   size (largest size first) and then for each such match it removes any   smaller matches that overlap. The result is a set of the longest   ungapped alignments between the two sequences that do not overlap with   each other. The mismatched regions between these matches are reported.   It should be possible to find differences between sequences that are   Mega-bases long.Usage   Here is a sample session with diffseq% diffseq tembl:ap000504 tembl:af129756 Find differences between nearly identical sequencesWord size [10]: Output report [ap000504.diffseq]: Features output [AP000504.diffgff]: Second features output [AF129756.diffgff]:    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-asequence]         sequence   Sequence filename and optional format, or                                  reference (input USA)  [-bsequence]         sequence   Sequence filename and optional format, or                                  reference (input USA)   -wordsize           integer    [10] The similar regions between the two                                  sequences are found by creating a hash table                                  of 'wordsize'd subsequences. 10 is a                                  reasonable default. Making this value larger                                  (20?) may speed up the program slightly,                                  but will mean that any two differences                                  within 'wordsize' of each other will be                                  grouped as a single region of difference.                                  This value may be made smaller (4?) to                                  improve the resolution of nearby                                  differences, but the program will go much                                  slower. (Integer 2 or more)  [-outfile]           report     [*.diffseq] Output report file name  [-aoutfeat]          featout    [$(asequence.name).diffgff] File for output                                  of first sequence's features  [-boutfeat]          featout    [$(bsequence.name).diffgff] File for output                                  of second sequence's features   Additional (Optional) qualifiers:   -globaldifferences  boolean    [N] Normally this program will find regions                                  of identity that are the length of the                                  specified word-size or greater and will then                                  report the regions of difference between                                  these matching regions. This works well and                                  is what most people want if they are working                                  with long overlapping nucleic acid                                  sequences. You are usually not interested in                                  the non-overlapping ends of these                                  sequences. If you have protein sequences or                                  short RNA sequences however, you will be                                  interested in differences at the very ends .                                  It this option is set to be true then the                                  differences at the ends will also be                                  reported.   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-asequence" associated qualifiers   -sbegin1            integer    Start of the sequence to be used   -send1              integer    End of the sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-bsequence" associated qualifiers   -sbegin2            integer    Start of the sequence to be used   -send2              integer    End of the sequence to be used   -sreverse2          boolean    Reverse (if DNA)   -sask2              boolean    Ask for begin/end/reverse   -snucleotide2       boolean    Sequence is nucleotide   -sprotein2          boolean    Sequence is protein   -slower2            boolean    Make lower case   -supper2            boolean    Make upper case   -sformat2           string     Input sequence format   -sdbname2           string     Database name   -sid2               string     Entryname   -ufo2               string     UFO features   -fformat2           string     Features format   -fopenfile2         string     Features file name   "-outfile" associated qualifiers   -rformat3           string     Report format   -rname3             string     Base file name   -rextension3        string     File name extension   -rdirectory3        string     Output directory   -raccshow3          boolean    Show accession number in the report   -rdesshow3          boolean    Show description in the report   -rscoreshow3        boolean    Show the score in the report   -rusashow3          boolean    Show the full USA in the report   -rmaxall3           integer    Maximum total hits to report   -rmaxseq3           integer    Maximum hits to report for one sequence   "-aoutfeat" associated qualifiers   -offormat4          string     Output feature format   -ofopenfile4        string     Features file name   -ofextension4       string     File name extension   -ofdirectory4       string     Output directory   -ofname4            string     Base file name   -ofsingle4          boolean    Separate file for each entry   "-boutfeat" associated qualifiers   -offormat5          string     Output feature format   -ofopenfile5        string     Features file name   -ofextension5       string     File name extension   -ofdirectory5       string     Output directory   -ofname5            string     Base file name   -ofsingle5          boolean    Separate file for each entry   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   This program reads in two nucleic acid sequence USAs or two protein   sequence USAs.  Input files for usage example   'tembl:ap000504' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:ap000504ID   AP000504   standard; DNA; HUM; 100000 BP.XXAC   AP000504; BA000025;XXSV   AP000504.1XXDT   28-SEP-1999 (Rel. 61, Created)DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)XXDE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, sectionDE   3/20.XXKW   .XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-100000RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;RT   ;RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), AdvancedRL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, JapanRL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)XXRN   [2]RA   Shiina S., Tamiya G., Oka A., Inoko H.;RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";RL   Unpublished.XXDR   SWISS-PROT; O00299; CLI1_HUMAN.DR   SWISS-PROT; O43196; MSH5_HUMAN.DR   SWISS-PROT; O95445; APOM_HUMAN.DR   SWISS-PROT; O95865; DDH2_HUMAN.DR   SWISS-PROT; O95867; NG24_HUMAN.DR   SWISS-PROT; P13862; KC2B_HUMAN.XXCC   This sequence is conducted by Tokai University as a JST sequencingCC   Team.CC   Principal Investigator: Hidetoshi Inoko Ph.DCC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,CC   The sequence is submitted by Human Genome Sequencing in ALISCC   project of JSTCC   Japan Science and Technology Corporation (JST)CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 JapanCC   For further infomation about this sequences, please visit ourCC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.  [Part of this file has been deleted for brevity]     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     97080     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     97140     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     97200     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     97260     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     97320     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     97380     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     97440     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     97500     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     97560     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     97620     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     97680     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     97740     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     97800     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     97860     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     97920     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     97980     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     98040     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     98100     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     98160     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     98220     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     98280     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     98340     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     98400
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -