📄 merger.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
                                  merger Function   Merge two overlapping sequencesDescription   This joins two overlapping nucleic acid sequences into one merged   sequence.   It uses a global alignment algorithm (Needleman & Wunsch) to optimally   align the sequences and then it creates the merged sequence from the   alignment. When there is a mismatch in the alignment between the two   sequences, the correct base to include in the resulting sequence is   chosen by using the base from the sequence which has the best local   sequence quality score. The following heuristic is used to find the   sequence quality score:   If one of the bases is a 'N', then the other sequence's base is used,   else:   A window size around the disputed base is used to find the local   quality score. This window size is increased from 5, to 10 to 20 bases   or until there is a clear decision on the best choice. If there is no   best choice after using a window of 20, then the base in the first   sequence is used.   To calculate the quality of a window of a sequence around a base:     * quality = sequence value/length under window either side of the       base     * sequence value = sum of points in that window     * unambiguous bases (ACGTU) score 2 points     * ambiguous bases (MRWSYKVHDB) score 1 point     * Ns score 0 points     * off end of the sequence scores 0 points   N.B. This heavily discriminates against the iffy bits at the end of   sequence reads.   This program was originally written to aid in the reconstruction of   mRNA sequences which had been sequenced from both ends as a 5' and 3'   EST (cDNA). eg. joining two reads produced by primer walking   sequencing.   Care should be taken to reverse one of the sequences (e.g. using the   qualifier '-sreverse2') if this is required to get them both in the   correct orientation.   Because it uses a Needleman & Wunsch alignment the required memory may   be greater than the available memory when attempting to merge large   (cosmid-sized or greater) sequences.   The gap open and gap extension penalties have been set at a higher   level than is usual (50 and 5). This was experimentally determined to   give the best results with a set of poor quality EST test sequences.Usage   Here is a sample session with merger% merger Merge two overlapping sequencesInput sequence: tembl:eclacySecond sequence: tembl:eclacaOutput alignment [eclacy.merger]: output sequence [eclacy.fasta]:    Go to the input files for this example   Go to the output files for this example   Typically, one of the sequences will need to be reverse-complemented   to put it into the correct orientation to make it join. For example:% merger file1.seq file2.seq -sreverse2 -outseq merged.seqCommand line arguments   Standard (Mandatory) qualifiers:  [-asequence]         sequence   Sequence filename and optional format, or                                  reference (input USA)  [-bsequence]         sequence   Sequence filename and optional format, or                                  reference (input USA)  [-outfile]           align      [*.merger] Output alignment file name  [-outseq]            seqout     [.] Sequence filename and                                  optional format (output USA)   Additional (Optional) qualifiers:   -datafile           matrixf    [EBLOSUM62 for protein, EDNAFULL for DNA]                                  This is the scoring matrix file used when                                  comparing sequences. By default it is the                                  file 'EBLOSUM62' (for proteins) or the file                                  'EDNAFULL' (for nucleic sequences). These                                  files are found in the 'data' directory of                                  the EMBOSS installation.   -gapopen            float      [@($(acdprotein)? 50.0 : 50.0 )] Gap opening                                  penalty (Number from 0.000 to 100.000)   -gapextend          float      [@($(acdprotein)? 5.0 : 5.0)] Gap extension                                  penalty (Number from 0.000 to 10.000)   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-asequence" associated qualifiers   -sbegin1            integer    Start of the sequence to be used   -send1              integer    End of the sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-bsequence" associated qualifiers   -sbegin2            integer    Start of the sequence to be used   -send2              integer    End of the sequence to be used   -sreverse2          boolean    Reverse (if DNA)   -sask2              boolean    Ask for begin/end/reverse   -snucleotide2       boolean    Sequence is nucleotide   -sprotein2          boolean    Sequence is protein   -slower2            boolean    Make lower case   -supper2            boolean    Make upper case   -sformat2           string     Input sequence format   -sdbname2           string     Database name   -sid2               string     Entryname   -ufo2               string     UFO features   -fformat2           string     Features format   -fopenfile2         string     Features file name   "-outfile" associated qualifiers   -aformat3           string     Alignment format   -aextension3        string     File name extension   -adirectory3        string     Output directory   -aname3             string     Base file name   -awidth3            integer    Alignment width   -aaccshow3          boolean    Show accession number in the header   -adesshow3          boolean    Show description in the header   -ausashow3          boolean    Show the full USA in the alignment   -aglobal3           boolean    Show the full sequence in alignment   "-outseq" associated qualifiers   -osformat4          string     Output seq format   -osextension4       string     File name extension   -osname4            string     Base file name   -osdirectory4       string     Output directory   -osdbname4          string     Database name to add   -ossingle4          boolean    Separate file for each entry   -oufo4              string     UFO features   -offormat4          string     Features format   -ofname4            string     Features file name   -ofdirectory4       string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   merger reads any two sequence USAs of the same type (protein or   nucleic acid.)  Input files for usage example   'tembl:eclacy' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:eclacyID   ECLACY     standard; DNA; PRO; 1500 BP.XXAC   V00295;XXSV   V00295.1XXDT   09-JUN-1982 (Rel. 01, Created)DT   07-JUL-1995 (Rel. 44, Last updated, Version 4)XXDE   E. coli lacY gene (codes for lactose permease).XXKW   membrane protein.XXOS   Escherichia coliOC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;OC   Escherichia.XXRN   [1]RP   1-1500RX   MEDLINE; 80120651.RA   Buechel D.E., Gronenborn B., Mueller-Hill B.;RT   "Sequence of the lactose permease gene";RL   Nature 283:541-545(1980).XXDR   SWISS-PROT; P00722; BGAL_ECOLI.DR   SWISS-PROT; P02920; LACY_ECOLI.DR   SWISS-PROT; P07464; THGA_ECOLI.XXCC   lacZ is a beta-galactosidase and lacA is transacetylase.CC   KST ECO.LACYXXFH   Key             Location/QualifiersFHFT   source          1..1500FT                   /db_xref="taxon:562"FT                   /organism="Escherichia coli"FT   CDS             <1..54FT                   /codon_start=1FT                   /db_xref="SWISS-PROT:P00722"FT                   /note="reading frame (lacZ)"FT                   /transl_table=11FT                   /protein_id="CAA23570.1"FT                   /translation="FQLSAGRYHYQLVWCQK"FT   CDS             106..1359FT                   /db_xref="SWISS-PROT:P02920"FT                   /note="reading frame (lacY)"FT                   /transl_table=11FT                   /protein_id="CAA23571.1"FT                   /translation="MYYLKNTNFWMFGLFFFFYFFIMGAYFPFFPIWLHDINHISKSDTFT                   GIIFAAISLFSLLFQPLFGLLSDKLGLRKYLLWIITGMLVMFAPFFIFIFGPLLQYNILFT                   VGSIVGGIYLGFCFNAGAPAVEAFIEKVSRRSNFEFGRARMFGCVGWALCASIVGIMFTFT                   INNQFVFWLGSGCALILAVLLFFAKTDAPSSATVANAVGANHSAFSLKLALELFRQPKLFT                   WFLSLYVIGVSCTYDVFDQQFANFFTSFFATGEQGTRVFGYVTTMGELLNASIMFFAPLFT                   IINRIGGKNALLLAGTIMSVRIIGSSFATSALEVVILKTLHMFEVPFLLVGCFKYITSQFT                   FEVRFSATIYLVCFCFFKQLAMIFMSVLAGNMYESIGFQGAYLVLGLVALGFTLISVFTFT                   LSGPGPLSLLRRQVNEVA"FT   CDS             1423..>1500FT                   /db_xref="SWISS-PROT:P07464"FT                   /note="reading frame (lacA)"FT                   /transl_table=11FT                   /protein_id="CAA23572.1"FT                   /translation="MNMPMTERIRAGKLFTDMCEGLPEKR"XXSQ   Sequence 1500 BP; 315 A; 342 C; 357 G; 486 T; 0 other;     ttccagctga gcgccggtcg ctaccattac cagttggtct ggtgtcaaaa ataataataa        60     ccgggcaggc catgtctgcc cgtatttcgc gtaaggaaat ccattatgta ctatttaaaa       120     aacacaaact tttggatgtt cggtttattc tttttctttt acttttttat catgggagcc       180     tacttcccgt ttttcccgat ttggctacat gacatcaacc atatcagcaa aagtgatacg       240     ggtattattt ttgccgctat ttctctgttc tcgctattat tccaaccgct gtttggtctg       300     ctttctgaca aactcgggct gcgcaaatac ctgctgtgga ttattaccgg catgttagtg       360     atgtttgcgc cgttctttat ttttatcttc gggccactgt tacaatacaa cattttagta       420     ggatcgattg ttggtggtat ttatctaggc ttttgtttta acgccggtgc gccagcagta       480     gaggcattta ttgagaaagt cagccgtcgc agtaatttcg aatttggtcg cgcgcggatg       540     tttggctgtg ttggctgggc gctgtgtgcc tcgattgtcg gcatcatgtt caccatcaat       600     aatcagtttg ttttctggct gggctctggc tgtgcactca tcctcgccgt tttactcttt       660     ttcgccaaaa cggatgcgcc ctcttctgcc acggttgcca atgcggtagg tgccaaccat       720     tcggcattta gccttaagct ggcactggaa ctgttcagac agccaaaact gtggtttttg       780     tcactgtatg ttattggcgt ttcctgcacc tacgatgttt ttgaccaaca gtttgctaat       840     ttctttactt cgttctttgc taccggtgaa cagggtacgc gggtatttgg ctacgtaacg       900     acaatgggcg aattacttaa cgcctcgatt atgttctttg cgccactgat cattaatcgc       960     atcggtggga aaaacgccct gctgctggct ggcactatta tgtctgtacg tattattggc      1020     tcatcgttcg ccacctcagc gctggaagtg gttattctga aaacgctgca tatgtttgaa      1080     gtaccgttcc tgctggtggg ctgctttaaa tatattacca gccagtttga agtgcgtttt      1140     tcagcgacga tttatctggt ctgtttctgc ttctttaagc aactggcgat gatttttatg      1200     tctgtactgg cgggcaatat gtatgaaagc atcggtttcc agggcgctta tctggtgctg      1260     ggtctggtgg cgctgggctt caccttaatt tccgtgttca cgcttagcgg ccccggcccg      1320     ctttccctgc tgcgtcgtca ggtgaatgaa gtcgcttaag caatcaatgt cggatgcggc      1380     gcgacgctta tccgaccaac atatcataac ggagtgatcg cattgaacat gccaatgacc      1440     gaaagaataa gagcaggcaa gctatttacc gatatgtgcg aaggcttacc ggaaaaaaga      1500//  Database entry: tembl:eclacaID   ECLACA     standard; DNA; PRO; 1832 BP.XXAC   X51872;XXSV   X51872.1XXDT   17-APR-1990 (Rel. 23, Created)DT   05-JUL-1999 (Rel. 60, Last updated, Version 5)XXDE   Escherichia coli lacA gene for thiogalactoside transacetylaseXXKW   lac operon; lacA gene; lacY gene; thiogalactoside transacetylase.XXOS   Escherichia coliOC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;OC   Escherichia.XXRN   [1]RC   (1-1832)RP   1-1832
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -