📄 est2genome.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12
XXCC   IMPORTANT:  This sequence is not the entire insert of cloneCC   NFG9.  It may be shorter because we only sequence overlappingCC   sections once, or longer because we arrange for a smallCC   overlap between neighbouring submissions.XXCC   The true left end of clone NFG9 is at 1 in this sequence.CC   The true left end of clone RA36 is at 25872.XXCC   NFG9 is from a 280kb clone contig extending from the telomere of 16p.CC   Higgs D.R., Flint J. unpublished. MRC Molecular Haematology Unit,CC   Institute of Molecular Medicine, Oxford.CC   NFG9 is from the library CV007K. Choo et al.,(1986) Gene 46. 277-286.XXFH   Key             Location/QualifiersFHFT   source          1..33760FT                   /chromosome="16"FT                   /db_xref="taxon:9606"FT                   /organism="Homo sapiens"FT                   /map="16p13.3"FT                   /clone_lib="CV007K"  [Part of this file has been deleted for brevity]     gagacagcag agtgctcagc tcatgaagga ggcaccagcc gccatgcctc tacatccagg     30840     tctcctgggg ttcccacctc cacaaaaacc cccactgcta ggagtgcagg caggagggga     30900     cctgagaacc gacagttata ggtcctgcgg gtgggcagtg ctgggtgttc tggtctgccc     30960     cacccctgtg tgcctagatc cccatctggg cctcaagtgg gtgggattcc aaaggaagag     31020     ccggagtagg cgtggggagg ggcaggccca ggctggacaa agagtctggc cagggagcgg     31080     cacattgccc tcccagagac agtggctcag tgtccaggcc ttccccaggc gcacagtggg     31140     ctcttgttcc cagaaagccc ctcgggggga tccaaacagt gtctccccca ccccgctgac     31200     ccctcagtgt atggggaaac cgtggcccac ggaaggcctc actgcctggg gtcacacagc     31260     atctgagtca ctgcagcagc ctcacagctg ccagcccagg cccagcccca tcaggagaca     31320     cccaaagcca cagtgcatcc caggaccagc tgggggggct gcgggcagga ctctcgatga     31380     ggctgaggga cgaggagggt caagggagcc actggcgcca tgcatgctga cgtcccctct     31440     ggctgcctgc agagcctggt gtggaagggc tgagtggggg atggtggaga gtcctgttaa     31500     ctcaggtttc tgctctgggg atgtctgggc acccatcaag ctggccgcgt gcacaggtgc     31560     agggagagcc agaaagcagg agccgatgca gggaggccac tggggacagc ccaggctgat     31620     gcttgggccc catgtgtctc caccacctac aaccctaagc aagcctcagc tttcccatct     31680     ggaaatcagg ggtcacagca gtgcctggca cagtagcagc ggctgactcc atcacagggt     31740     ggtgtagcct gtgggtactt ggcactctct gaggggcagg agctgggggg tgaaaggacc     31800     ctagagcata tgcaacaaga gggcagccct ggggacacct ggggacagaa ccctccaaag     31860     gtgtcgagtt tgggaagaga ctagagagaa gctctggcca gtccaggcat agacagtggc     31920     cacagccagt ggagagctgc atcctcaggt gtgagcagca accacctctg tactcaggcc     31980     tgccctgcac actcacagga ccatgctggc agggacaact ggcggcggag ttgactgcca     32040     accccggggc cagaaccatc aagcctgggc tctgctccgc ccaaggaact gcctgctgcc     32100     gaggtcagct ggagcaaggg gcctcacccc gggacacctt cccagacgtg tcctcagctc     32160     acatgagcct catcccaggg ggatgtggct cctccagcat ccccacccac acgctgctct     32220     ctgaccctca gtcttctgtt tgactcctaa tctgaagctc aatcctagat ctcccttgag     32280     aagggggtca ccagctgtct ggcagcccag cctccaggtc ttctggatta atgaagggaa     32340     agtcacctgg cctctctgcc ttgtctatta atggcatcat gctgagaatg atatttgcta     32400     ggccctttgc aaaccccaaa gtgctcttca accctcccag tgaagcctct tcttttctgt     32460     ggaagaaatg aggttcaggg tggagcaggg caggcctgag acctttgcag ggttctctcc     32520     aggtccccag caggacagac tggcaccctg cctcccctca tcaccctaga caaggagaca     32580     gaacaagagg ttccctgcta caggccatct gtgagggaag ccgccctagg gcctgtagac     32640     acaggaatcc ctgaggacct gacctgtgag ggtagtgcac aaaggggcca gcacttggca     32700     ggaggggggg gggcactgcc ccaaggctca gctagcaaat gtggcacagg ggtcaccaga     32760     gctaaacccc tgactcagtt gggtctgaca ggggctgaca tggcagacac acccaggaat     32820     caggggacac caagtgcagc tcagggcacc tgtccaggcc acacagtcag aaaggggatg     32880     gcagcaagga cttagctaca ctagattctg ggggtaaact gcctggtatg ctggtcactg     32940     ctagtcccca gtctggagtc tagctgggtc tcaggagtta ggcgaaaaca ccctccccag     33000     gctgcaggtg ggagaggccc acatcccctg cacacgtctg gccagaggac agatgggcag     33060     cccagtcacc agtcagagcc ctccagaggt gtccctgact gaccctacac acatgcaccc     33120     aggtgcccag gcacccttgg gctcagcaac cctgcaaccc cctcccagga cccaccagaa     33180     gcaggatagg actagagagg ccacaggagg gaaaccaagt cagagcagaa atggcttcgg     33240     tcctcagcag cctggctcag cttcctcaaa ccagatcctg actgatcaca ctggtctgtc     33300     taacccctgg gaggggtcct ctgtatccat cttacagata aggaaactga ggctcagaga     33360     agcccatcac tgcctaaggt cccagggcct ataagggagc tcaaagcctt gggccaggtc     33420     tgcccaggag ctgcagtgga agggaccctg tctgcagacc cccagaagac aaggcagacc     33480     acctgggttc ttcagccttg tggctgtgga cggctgtcag acccttctaa gaccccttgc     33540     cacctgctcc atcaggggca tctcagttga agaaggaagg actcaccccc aaaatcgtcc     33600     aactcagaaa aaaaggcaga agccaaggaa tccaatcact gggcaaaatg tgatcctggc     33660     acagacactg aggtggggga actggagccg gtgtggcgga ggccctcaca gccaagagca     33720     actgggggtg ccctgggcag ggactgtagc tgggaagatc                           33760//Output file format  Output files for usage example  File: hs989235.est2genomeNote Best alignment is between forward est and forward genome, but splice sites imply REVERSED GENEExon       163  91.8 25685 25874 HSNFG9           1   193 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.-Intron    -20   0.0 25875 26278 HSNFG9Exon       207  98.1 26279 26492 HSNFG9         194   407 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.-Intron    -20   0.0 26493 27390 HSNFG9Exon        63  86.4 27391 27476 HSNFG9         408   494 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Span       393  93.6 25685 27476 HSNFG9           1   494 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     14  83.3 25685 25702 HSNFG9           1    18 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     28  85.7 25703 25737 HSNFG9          20    54 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment      4 100.0 25738 25741 HSNFG9          56    59 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     13 100.0 25742 25754 HSNFG9          61    73 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment      4 100.0 25756 25759 HSNFG9          74    77 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment    110  97.4 25760 25874 HSNFG9          79   193 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     37 100.0 26279 26315 HSNFG9         194   230 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment    162  98.8 26317 26480 HSNFG9         231   394 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     12 100.0 26481 26492 HSNFG9         396   407 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     16 100.0 27391 27406 HSNFG9         408   423 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     10  91.7 27407 27418 HSNFG9         425   436 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     19  95.2 27419 27439 HSNFG9         438   458 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.Segment     24  80.6 27441 27476 HSNFG9         459   494 HS989235      yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNAsequence.  MSP type segments   There are four types of segment,    1. each gapped Exon    2. each Intron (marked with a ? if it does not start GT and end AG)    3. the complete alignment Span    4. individual ungapped matching Segments.   The score for Exon segments is the alignment score excluding flanking   intron penalties. The Span score is the total including the intron   costs.   The coordinates of the genomic sequence always refer to the positive   strand, but are swapped if the est has been reversed. The splice   direction of Introns are indicated as +Intron (forward, splice sites   GT/AG) or -Intron (reverse, splice sites CT/AC), or ?Intron (unknown   direction). Segment entries give the alignment as a series of ungapped   matching segments.  Full alignment   You get the alignment if the -align switch is set. The alignment   includes the first and last 5 bases of each intron, together with the   intron width. The direction of splicing is indicated by >>>> (forward)   or <<<< (reverse) or ???? (unknown)Data files   NoneNotes   est2genome uses a linear-space dynamic-programming algorithm. It has   the following parameters:parameter               default         descriptionmatch                   1               score for matching two basesmismatch                1               cost for mismatching two basesgap_penalty             2               cost for deleting a single base in                                        either sequence,                                        excluding intronsintron_penalty          40              cost for an intron, independent of                                        length.splice_penalty          20              cost for an intron, independent of                                        length and starting/ending on                                        donor-acceptor sites.space                   10              Space threshold (in  megabytes)                                        for linear-space recursion. If the                                        product of the two sequence                                        lengths divided by 4 exceeds this then                                        a divide-and-conquer strategy is used                                        to control the memory requirements.                                        In this way very long sequences can                                        be aligned.                                        If you have a machine with plenty of                                        memory you can raise this parameter                                        (but do not exceed the machine's                                        physical RAM)                                        However, normally you should not need                                        to change this parameter.   There is no gap initiation cost for short gaps, just a penalty   proportional to the length of the gap. Thus the cost of inserting a   gap of length L in the EST is L*gap_penalty   and the cost in the genome ismin { L*gap_penalty, intron_penalty } ormin { L*gap_penalty, splice_penalty } if the gap starts with GT and ends with AG                                     (or CT/AC if splice direction reversed)   Introns are not allowed in the EST. The difference between the   intron_penalty and splice_penalty allows for some slack in marking the   intron end-points. It is often the case that the best intron   boundaries, from the point of view of minimising mismatches, will not   coincide exactly with the splice consensus, so provided the difference   between the intron/splice penalties outweighs the extra mismatch/indel   costs the alignment will respect the proper boundaries. If the   alignment still prefers boundaries which don't start and end with the   splice consensus then this may indicate errors in the sequences.   The default parameters work well, except for very short exons (length   less than the splice_penalty, approx) which may be skipped. The intron   penalties should not be set to less that the maximum expected random   match between the sequences (typically 10-15 bp) in order to avoid   spurious matches. The algorithm has the following steps:    1. A first-pass Smith-Waterman scan is done to locate the score,       start and end of the maximal scoring segment (including introns of       course). No other alignment information is retained.    2. Subsequences corresponding to the maximal-scoring segments are       extracted. If the product of these subsequences' lengths is less       than the area parameter then the segments are re-aligned using the       Needleman-Wunsch algorithm, which in this instance will give the       same result as the Smith-Waterman since they are guaranteed to       align end-to-end.    3. If the product of lengths exceeds the area threshold then the       alignment is recursively broken down by splitting the EST in half       and finding the genome position which aligns with the EST       mid-point. The problem then reduces to aligning the left-hand and       right-hand portions of the sequences separately and merging the       result.   The worst-case run-time for the algorithm is about 3 times as long as   would be taken to align using a quadratic-space program. In practice   the maximal-scoring segment is often much shorter than the full genome   length so the program runs only about 1.5 times slower.References    1. Mott R. (1997) EST_GENOME: a program to align spliced DNA       sequences to unspliced genomic DNA. Comput. Applic. 13:477-478    2. Huang X (1994) On global sequence alignment. Comput. Applic.       Biosci. 10:227-235.    3. Myers, EW and Miller, W (1988) Optimal alignments in linear space.       Comput. Applic. Biosci. 4:11-17    4. Smith, TE and Waterman, MS (1981) Identification of common       molecular subsequences. J. Mol. Biol. 147:195-197Warnings   None.Diagnostic Error Messages   None.Exit status   It returns 0 unless an error occurs.Known bugs   None.See also   Program name                      Description   needle       Needleman-Wunsch global alignment   stretcher    Finds the best global alignment between two sequencesAuthor(s)   This application was modified for inclusion in EMBOSS by Peter Rice   (pmr
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -