⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 est2genome.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
                                est2genome Function   Align EST and genomic DNA sequencesDescription   est2genome is a software tool to aid the prediction of genes by   sequence homology. The program will align a set of spliced nucleotide   sequences (ESTs cDNAs or mRNAs) to an unspliced genomic DNA sequence,   inserting introns of arbitrary length when needed. In addition, where   feasible introns start and stop at the splice consensus dinucleotides   GT and AG.   Unless instructed otherwise, the program makes three alignments: First   it compares both stands of the spliced sequence against the forward   strand of the genomic, assuming the splice consensus GT/AG (ie in the   forward gene direction). The maximum-scoring orientation is then   realigned assuming the splice consensus CT/AC (ie in the reversed gene   direction). Only the overall maximum-scoring alignment is reported.   The program outputs a list of the exons and introns it has found. The   format is like that of MSPcrunch, ie a list of matching segments. This   format is easy to parse into other software. The program also   indicates, based on the splice site information, the gene's predicted   direction of transcription. Optionally the full sequence alignment is   printed as well (see the example).Algorithm   The program uses a linear-space divide-and-conquer strategy (Myers and   Miller, 1988; Huang, 1994) to limit memory use:   1. A first pass Smith-Waterman local alignment scan is done to find   the start and end of the maximally scoring segments.   2. Subsequences corresponding to these segments are extracted   3a. If the product of the subsequences' lengths is less than a   user-defined threshold (i.e. they will fit in memory) the segments are   realigned using the Needleman-Wunsch global alignment algorithm, which   will give the same result as the Smith-Waterman since the subsequences   are guaranteed to align end-to-end.   3b. If the product of the lengths exceeds the threshold (a full   alignment will not fit in memory) the alignment is made recursively by   splitting the spliced (EST) sequence in half and finding the genome   sequence position which aligns with the mid-point. The process is   repeated until the product of gthe lengths is less than the threshold.   The divided sequences are aligned separately and then merged.   4. The genome sequence is searched against the forward and reverse   strands of the spliced (EST) sequence, assuming a forward gene   splicing direction (i.e. GT/AG consensus).   5. Then the best-scoring orientation is realigned assuming reverse   splicing (CT/AC consensus). The overall best alignment is reported.Usage   Here is a sample session with est2genome% est2genome Align EST and genomic DNA sequencesSpliced EST nucleotide sequence(s): tembl:hs989235Unspliced genomic nucleotide sequence: tembl:hsnfg9Output file [hs989235.est2genome]:    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-estsequence]       seqall     Spliced EST nucleotide sequence(s)  [-genomesequence]    sequence   Unspliced genomic nucleotide sequence  [-outfile]           outfile    [*.est2genome] Output file name   Additional (Optional) qualifiers:   -match              integer    [1] Score for matching two bases (Any                                  integer value)   -mismatch           integer    [1] Cost for mismatching two bases (Any                                  integer value)   -gappenalty         integer    [2] Cost for deleting a single base in                                  either sequence, excluding introns (Any                                  integer value)   -intronpenalty      integer    [40] Cost for an intron, independent of                                  length. (Any integer value)   -splicepenalty      integer    [20] Cost for an intron, independent of                                  length and starting/ending on donor-acceptor                                  sites (Any integer value)   -minscore           integer    [30] Exclude alignments with scores below                                  this threshold score. (Any integer value)   Advanced (Unprompted) qualifiers:   -reverse            boolean    Reverse the orientation of the EST sequence   -[no]splice         boolean    [Y] Use donor and acceptor splice sites. If                                  you want to ignore donor-acceptor sites then                                  set this to be false.   -mode               menu       [both] This determines the comparion mode.                                  The default value is 'both', in which case                                  both strands of the est are compared                                  assuming a forward gene direction (ie GT/AG                                  splice sites), and the best comparsion                                  redone assuming a reversed (CT/AC) gene                                  splicing direction. The other allowed modes                                  are 'forward', when just the forward strand                                  is searched, and 'reverse', ditto for the                                  reverse strand. (Values: both (Both                                  strands); forward (Forward strand only);                                  reverse (Reverse strand only))   -[no]best           boolean    [Y] You can print out all comparisons                                  instead of just the best one by setting this                                  to be false.   -space              float      [10.0] For linear-space recursion. If                                  product of sequence lengths divided by 4                                  exceeds this then a divide-and-conquer                                  strategy is used to control the memory                                  requirements. In this way very long                                  sequences can be aligned.                                  If you have a machine with plenty of memory                                  you can raise this parameter (but do not                                  exceed the machine's physical RAM) (Any                                  numeric value)   -shuffle            integer    [0] Shuffle (Any integer value)   -seed               integer    [20825] Random number seed (Any integer                                  value)   -align              boolean    Show the alignment. The alignment includes                                  the first and last 5 bases of each intron,                                  together with the intron width. The                                  direction of splicing is indicated by angle                                  brackets (forward or reverse) or ????                                  (unknown).   -width              integer    [50] Alignment width (Any integer value)   Associated qualifiers:   "-estsequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-genomesequence" associated qualifiers   -sbegin2            integer    Start of the sequence to be used   -send2              integer    End of the sequence to be used   -sreverse2          boolean    Reverse (if DNA)   -sask2              boolean    Ask for begin/end/reverse   -snucleotide2       boolean    Sequence is nucleotide   -sprotein2          boolean    Sequence is protein   -slower2            boolean    Make lower case   -supper2            boolean    Make upper case   -sformat2           string     Input sequence format   -sdbname2           string     Database name   -sid2               string     Entryname   -ufo2               string     UFO features   -fformat2           string     Features format   -fopenfile2         string     Features file name   "-outfile" associated qualifiers   -odirectory3        string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format   est2genome reads two nucleotide sequences. The first is an EST   sequence (a single read or a finished cDNA). The second is a genomic   finished sequence.  Input files for usage example   'tembl:hs989235' is a sequence entry in the example nucleic acid   database 'tembl'  Database entry: tembl:hs989235ID   HS989235   standard; RNA; EST; 495 BP.XXAC   H45989;XXSV   H45989.1XXDT   18-NOV-1995 (Rel. 45, Created)DT   04-MAR-2000 (Rel. 63, Last updated, Version 2)XXDE   yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA cloneDE   IMAGE:177794 3', mRNA sequence.XXKW   EST.XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-495RA   Hillier L., Clark N., Dubuque T., Elliston K., Hawkins M., Holman M.,RA   Hultman M., Kucaba T., Le M., Lennon G., Marra M., Parsons J., Rifkin L.,RA   Rohlfing T., Soares M., Tan F., Trevaskis E., Waterston R., Williamson A.,RA   Wohldmann P., Wilson R.;RT   "The WashU-Merck EST Project";RL   Unpublished.XXDR   RZPD; IMAGp998F03326; IMAGp998F03326.XXCC   On May 8, 1995 this sequence version replaced gi:800819.CC   Contact: Wilson RKCC   Washington University School of MedicineCC   4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108CC   Tel: 314 286 1800CC   Fax: 314 286 1810CC   Email: est@watson.wustl.eduCC   Insert Size: 544CC   High quality sequence stops: 265CC   Source: IMAGE Consortium, LLNLCC   This clone is available royalty-free through LLNL ; contact theCC   IMAGE Consortium (info@image.llnl.gov) for further information.CC   Possible reversed clone: polyT not foundCC   Insert Length: 544   Std Error: 0.00CC   Seq primer: SP6CC   High quality sequence stop: 265.XXFH   Key             Location/QualifiersFHFT   source          1..495FT                   /db_xref="taxon:9606"FT                   /db_xref="ESTLIB:300"FT                   /db_xref="RZPD:IMAGp998F03326"FT                   /note="Organ: brain; Vector: pT7T3D (Pharmacia) with aFT                   modified polylinker; Site_1: Not I; Site_2: Eco RI; 1stFT                   strand cDNA was primed with a Not I - oligo(dT) primer [5'FT                   TGTTACCAATCTGAAGTGGGAGCGGCCGCGCTTTTTTTTTTTTTTTTTTT 3'],FT                   double-stranded cDNA was size selected, ligated to Eco RIFT                   adapters (Pharmacia), digested with Not I and cloned intoFT                   the Not I and Eco RI sites of a modified pT7T3 vectorFT                   (Pharmacia). Library went through one round ofFT                   normalization to a Cot = 53. Library constructed by BentoFT                   Soares and M.Fatima Bonaldo. The adult brain RNA wasFT                   provided by Dr. Donald H. Gilden. Tissue was acquired 17-18FT                   hours after death which occurred in consequence of aFT                   ruptured aortic aneurysm. RNA was prepared from a pool ofFT                   tissues representing the following areas of the brain:FT                   frontal, parietal, temporal and occipital cortex from theFT                   left and right hemispheres, subcortical white matter, basalFT                   ganglia, thalamus, cerebellum, midbrain, pons and medulla."FT                   /sex="Male"FT                   /organism="Homo sapiens"FT                   /clone="IMAGE:177794"FT                   /clone_lib="Soares adult brain N2b5HB55Y"FT                   /dev_stage="55-year old"FT                   /lab_host="DH10B (ampicillin resistant)"XXSQ   Sequence 495 BP; 73 A; 135 C; 169 G; 104 T; 14 other;     ccggnaagct cancttggac caccgactct cgantgnntc gccgcgggag ccggntggan        60     aacctgagcg ggactggnag aaggagcaga gggaggcagc acccggcgtg acggnagtgt       120     gtggggcact caggccttcc gcagtgtcat ctgccacacg gaaggcacgg ccacgggcag       180     gggggtctat gatcttctgc atgcccagct ggcatggccc cacgtagagt ggnntggcgt       240     ctcggtgctg gtcagcgaca cgttgtcctg gctgggcagg tccagctccc ggaggacctg       300     gggcttcagc ttcccgtagc gctggctgca gtgacggatg ctcttgcgct gccatttctg       360     ggtgctgtca ctgtccttgc tcactccaaa ccagttcggc ggtccccctg cggatggtct       420     gtgttgatgg acgtttgggc tttgcagcac cggccgccga gttcatggtn gggtnaagag       480     atttgggttt tttcn                                                        495//  Database entry: tembl:hsnfg9ID   HSNFG9     standard; DNA; HUM; 33760 BP.XXAC   Z69719;XXSV   Z69719.1XXDT   26-FEB-1996 (Rel. 46, Created)DT   22-NOV-1999 (Rel. 61, Last updated, Version 3)XXDE   Human DNA sequence from cosmid NFG9 from a contig from the tip of the shortDE   arm of chromosome 16, spanning 2Mb of 16p13.3. Contains Interleukin 9DE   Receptor Pseudogene, repeat polymorphism, ESTs, CpG islands and endogenousDE   retroviral DNA.XXKW   16p13.3; CpG island; Interleukin 9 Receptor Pseudogene;KW   repeat polymorphism.XXOS   Homo sapiens (human)OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN   [1]RP   1-33760RA   Kershaw J.;RT   ;RL   Submitted (22-FEB-1996) to the EMBL/GenBank/DDBJ databases.RL   Sanger Centre, Hinxton, Cambridgeshire, CB10 1RQ, England. E-mail enquires:RL   humquery@sanger.ac.uk

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -