📄 megamerger.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
                                megamerger Function   Merge two large overlapping nucleic acid sequencesDescription   megamerger takes two overlapping sequences and merges them into one   sequence. It could thus be regarded as the opposite of what splitter   does.   The sequences can be very long. The program does a match of all   sequence words of size 20 (by default). It then reduces this to the   minimum set of overlapping matches by sorting the matches in order of   size (largest size first) and then for each such match it removes any   smaller matches that overlap. The result is a set of the longest   ungapped alignments between the two sequences that do not overlap with   each other. If the two sequences are identical in their region of   overlap then there will be one region of match and no mismatches.   It should be possible to merge sequences that are Mega bytes long.   Compare this with the program merger which does a more accurate   alignment of more divergent sequences using the Needle and Wunsch   algorithm but which uses much more memory.   The sequences should ideally be identical in their region of overlap.   If there are any mismatches between the two sequences then megamerger   will still attempt to create a merged sequence, but you should check   that this is what you required.   A report of the actions of megamerger is written out. Any actions that   require a choice between using regions of the two sequences where they   have a mismatch is marked with the word WARNING!. The sequence in   these regions is written out in uppercase. All other regions of the   output sequence are written in lowercase.   Where there is a mismatch then the sequence that is chosen to supply   the region of the mismatch in the final merged sequence is that   sequence whose mismatch region is furthest from the start of end of   the sequence.Usage   Here is a sample session with megamerger% megamerger tembl:ap000504 tembl:af129756 Merge two large overlapping nucleic acid sequencesWord size [20]: output sequence [ap000504.merged]: Output file [ap000504.megamerger]: report   Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-asequence]         sequence   Nucleotide sequence filename and optional                                  format, or reference (input USA)  [-bsequence]         sequence   Nucleotide sequence filename and optional                                  format, or reference (input USA)   -wordsize           integer    [20] Word size (Integer 2 or more)  [-outseq]            seqout     [.] Sequence filename and                                  optional format (output USA)  [-outfile]           outfile    [*.megamerger] Output file name   Additional (Optional) qualifiers:   -prefer             boolean    [N] When a mismatch between the two sequence                                  is discovered, one or other of the two                                  sequences must be used to create the merged                                  sequence over this mismatch region. The                                  default action is to create the merged                                  sequence using the sequence where the                                  mismatch is closest to that sequence's                                  centre. If this option is used, then the                                  first sequence (seqa) will always be used in                                  preference to the other sequence when there                                  is a mismatch.   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-asequence" associated qualifiers   -sbegin1            integer    Start of the sequence to be used   -send1              integer    End of the sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-bsequence" associated qualifiers   -sbegin2            integer    Start of the sequence to be used   -send2              integer    End of the sequence to be used   -sreverse2          boolean    Reverse (if DNA)   -sask2              boolean    Ask for begin/end/reverse   -snucleotide2       boolean    Sequence is nucleotide   -sprotein2          boolean    Sequence is protein   -slower2            boolean    Make lower case   -supper2            boolean    Make upper case   -sformat2           string     Input sequence format   -sdbname2           string     Database name   -sid2               string     Entryname   -ufo2               string     UFO features   -fformat2           string     Features format   -fopenfile2         string     Features file name   "-outseq" associated qualifiers   -osformat3          string     Output seq format   -osextension3       string     File name extension   -osname3            string     Base file name   -osdirectory3       string     Output directory   -osdbname3          string     Database name to add   -ossingle3          boolean    Separate file for each entry   -oufo3              string     UFO features   -offormat3          string     Features format   -ofname3            string     Features file name   -ofdirectory3       string     Output directory   "-outfile" associated qualifiers   -odirectory4        string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   megamerger reads any two Sequence USAs.  Input files for usage example   'tembl:ap000504' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:ap000504ID   AP000504   standard; DNA; HUM; 100000 BP.XXAC   AP000504; BA000025;XXSV   AP000504.1XXDT   28-SEP-1999 (Rel. 61, Created)DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)XXDE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, sectionDE   3/20.XXKW   .XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-100000RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;RT   ;RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), AdvancedRL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, JapanRL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)XXRN   [2]RA   Shiina S., Tamiya G., Oka A., Inoko H.;RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";RL   Unpublished.XXDR   SWISS-PROT; O00299; CLI1_HUMAN.DR   SWISS-PROT; O43196; MSH5_HUMAN.DR   SWISS-PROT; O95445; APOM_HUMAN.DR   SWISS-PROT; O95865; DDH2_HUMAN.DR   SWISS-PROT; O95867; NG24_HUMAN.DR   SWISS-PROT; P13862; KC2B_HUMAN.XXCC   This sequence is conducted by Tokai University as a JST sequencingCC   Team.CC   Principal Investigator: Hidetoshi Inoko Ph.DCC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,CC   The sequence is submitted by Human Genome Sequencing in ALISCC   project of JSTCC   Japan Science and Technology Corporation (JST)CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 JapanCC   For further infomation about this sequences, please visit ourCC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.  [Part of this file has been deleted for brevity]     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     97080     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     97140     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     97200     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     97260     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     97320     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     97380     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     97440     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     97500     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     97560     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     97620     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     97680     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     97740     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     97800     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     97860     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     97920     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     97980     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     98040     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     98100     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     98160     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     98220     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     98280     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     98340     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     98400     ttttctgcct ttttgtgttt ggctggttgt gcgcagcacg agggccggga ggaggatcag     98460     cggttgatca acttggtagg ggagagcctg cgactgctgg gcaacacctt tgttgcactg     98520     tctgacctgc gctgcaatct ggcctgcacg cccccacgac acctgcatgt ggtccggcct     98580     atgtctcact acaccacccc catggtgctc cagcaggcag ccattcccat acaggtgggt     98640     tagggggagt ctggcctgag ggagagtgag gggtgttgat agagtgaccc agggtagcta     98700     ctgggcctga aggaggttag gaaaggagga gactggaaac atggtgatga aggctggaga     98760     tactttagag gtttatcatg aggttttctt ggttaggctc ttgtattttt ctcacatctg     98820     cctgtccatc tgtctttttc agatcaatgt gggaaccact gtgaccatga caggaaatgg     98880     gactcggccc cccccaactc ccaatgcaga ggcacctccc cctggtcctg ggcaggcctc     98940     atccgtggct ccgtcttcta ccaatgtcga gtcctcagct gagggggctc ccccgccagg     99000     tccagctccc ccgccagcca ccagccaccc gagggtcatc cggatttccc accagagtgt     99060     ggaacccgtg gtcatgatgc acatgaacat tcaaggtgag aatagttgct ggcgagaaga     99120     gcaggatcag catgatgagg gaggttcatg ctgaggtgtg agggaacagg gtggggaagg     99180     gagaggcaca tgctggtggt ggtagcctgg ggaccagagc agaagcttaa gtagacagat     99240     gtggggggtg tgggggttgg tttgtctttg gaggtgtgtt tgtgtggtga agggagtacc     99300     tctccctgtt tagatggagg gaaaggcagg ctttctgatt gggggattat gggcctgaag     99360     tatgcctgat ctcagaagga tatagttagg ccttggccct acctacctca gggccactgt     99420     ctctgtctcc ctgcccagat tctggcacac agcctggtgg tgttccgagt gctcccactg     99480     gccccctggg accccctggt catggccaaa ccctgggtaa gagtgagggc atcagggcag     99540     gctgagctct gggtagagaa agggaagggc tgagtgggtg ggttgaaggg gtccaggttc     99600     aaggttacat cagacccgcc ccccaggctc caccctcatc cagctgccct ccctgccccc     99660     tgagttcatg cacgccgtcg cccaccagat cactcatcag gccatggtgg cagctgttgc     99720     ctccgcggcc gcaggtaatg acctggaagg ggaggcttgg gaggtagggc acagtccatg     99780     gtggcagctg gctggcaagg gcctggccct cagccctctt cggtctgtct cttctgccac     99840     ccacaggaca gcaggtgcca ggcttcccaa cagctccaac ccgggtggtg attgcccggc     99900     ccactcctcc acaggctcgg ccttcccatc ctggagggcc cccagtctct gggacactgg     99960     tgagcaaggg tcggggagtt ctagtgcgta acagtctagg                          100000//  Database entry: tembl:af129756ID   AF129756   standard; DNA; HUM; 184666 BP.XXAC   AF129756;XXSV   AF129756.1XXDT   12-MAR-1999 (Rel. 59, Created)DT   29-OCT-1999 (Rel. 61, Last updated, Version 2)XXDE   Homo sapiens MSH55 gene, partial cds; and CLIC1, DDAH, G6b, G6c, G5b, G6d,DE   G6e, G6f, BAT5, G5b, CSK2B, BAT4, G4, Apo M, BAT3, BAT2, AIF-1, 1C7, LST-1,DE   LTB, TNF, and LTA genes, complete cds.XXKW   .XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-184666RA   Rowen L., Madan A., Qin S., Shaffer T., James R., Ratcliffe A., Abbasi N.,RA   Dickhoff R., Loretz C., Madan A., Dors M., Young J., Lasky S., Hood L.;RT   "Sequence of the human major histocompatibility complex class III region";RL   Unpublished.XXRN   [2]RP   1-184666RA   Rowen L.;RT   ;RL   Submitted (22-FEB-1999) to the EMBL/GenBank/DDBJ databases.RL   Department of Molecular Biotechnology, Box 357730 University of Washington,RL   Seattle, WA 98195, USAXXRN   [3]RP   1-184666RA   Rowen L.;RT   ;RL   Submitted (28-OCT-1999) to the EMBL/GenBank/DDBJ databases.RL   Multimegabase Sequencing Center, University of Washington, PO Box 357730,RL   Seattle, WA 98195, USAXXDR   EPD; EP11158; HS_TNFA.DR   EPD; EP11159; HS_TNFB.DR   SPTREMBL; O00452; O00452.DR   SPTREMBL; O14931; O14931.DR   SPTREMBL; O95866; O95866.DR   SPTREMBL; O95868; O95868.DR   SPTREMBL; O95869; O95869.DR   SPTREMBL; O95870; O95870.  [Part of this file has been deleted for brevity]     aaaccagttt accaccactc ctaacactaa acttaaatct gactctaaat gtaagtccaa    181740     tctgagccac aagcctaaag ttgaacttta tcctgcttta tgaattattc atccattcct    181800     ccatttagtg agtatctgcg tgcctaacac atgctgggca ttgtcctaag gcaggaggga    181860
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -