📄 marscan.txt
字号:
marscan Function Finds MAR/SAR sites in nucleic sequencesDescription Matrix/scaffold attachment regions (MARs/SARs) are genomic elements thought to delineate the structural and functional organisation of the eukaryotic genome. Originally, MARs and SARs were identified through their ability to bind to the nuclear matrix or scaffold. Binding cannot be assigned to a unique sequence element, but is dispersed over a region of several hundred base pairs. These elements are found flanking a gene or a small cluster of genes and are located often in the vicinity of cis-regulatory sequences. This has led to the suggestion that they contribute to higher order regulation of transcription by defining boundaries of independently controlled chromatin domains. There is indirect evidence to support this notion. In transgenic experiments MARs/SARs dampen position effects by shielding the transgene from the effects of the chromatin structure at the site of integration. Furthermore, they may act as boundary elements for enhancers, restricting their long range effect to only the promoters that are located in the same chromatin domain. marscan finds a bipartite sequence element that is unique for a large group of eukaryotic MARs/SARs. This MAR/SAR recognition signature (MRS) comprises two individual sequence elements that are <200 bp apart and may be aligned on positioned nucleosomes in MARs. The MRS can be used to correctly predict the position of MARs/SARs in plants and animals, based on genomic DNA sequence information alone. Experimental evidence from the analysis of >300 kb of sequence data from several eukaryotic organisms show that wherever a MRS is observed in the DNA sequence, the corresponding genomic fragment is a biochemically identifiable SAR. The MRS is a bipartite sequence element that consists of two individual sequences of 8 (AATAAYAA) and 16 bp (AWWRTAANNWWGNNNC) within a 200 bp distance from each other. One mismatch is allowed in the 16 bp pattern. The patterns can occur on either strand of the DNA with respect to each other. The 8 bp and the 16 bp sites can overlap. Where there are many possible MRS sites caused by many 8 bp and/or 16 bp pattern sites located within 200 bp of each other, then only the 8 bp site and the 16 bp site that occur closest to each other are reported. Once a MRS has been reported, no more sites will be looked for within 200 bp of that site. This reduces (but maybe will not totally eliminate) over-reporting of the clusters of MRS's that tend to occur within a MAR/SAR. Not all SARs contain a MRS. Analysis of >300 kb of genomic sequence from a variety of eukaryotic organisms shows that the MRS faithfully predicts 80% of MARs and SARs, suggesting that at least one other type of MAR/SAR may exist which does not contain a MRS. It it still not at all clear whether MAR/SARs are real biological phenomena or just experimental artefacts. The problem of how to define and find MARs is still being actively invetsigated. For a recent evaluation of this method and others, see reference 3.Usage Here is a sample session with marscan% marscan Finds MAR/SAR sites in nucleic sequencesInput nucleotide sequence(s): tembl:u01317Output report [hshbb.marscan]: Go to the input files for this example Go to the output files for this exampleCommand line arguments Standard (Mandatory) qualifiers: [-sequence] seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) [-outfile] report [*.marscan] File for output of MAR/SAR recognition signature (MRS) regions. This contains details of the MRS in normal GFF format. The MRS consists of two recognition sites, one of 8 bp and one of 16 bp on either sense strand of the genomic DNA, within 200 bp of each other. Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messagesInput file format marscan reads a normal genomic DNA USA. Input files for usage example 'tembl:u01317' is a sequence entry in the example nucleic acid database 'tembl' Database entry: tembl:u01317ID HSHBB standard; DNA; HUM; 73308 BP.XXAC U01317; J00093; J00094; J00096; J00158; J00159; J00160; J00161; J00162;AC J00163; J00164; J00165; J00166; J00167; J00168; J00169; J00170; J00171;AC J00172; J00173; J00174; J00175; J00177; J00178; J00179; K01239; K01890;AC K02544; M18047; M19067; M24868; M24886;XXSV U01317.1XXDT 19-MAR-1994 (Rel. 39, Created)DT 31-MAR-2001 (Rel. 67, Last updated, Version 28)XXDE Human beta globin region on chromosome 11.XXKW allelic variation; alternate cap site; Alu repeat; beta-1 pseudogene;KW beta-globin; delta-globin; epsilon-globin; gamma-globin; gene duplication;KW globin; HPFH; KpnI repetitive sequence; polymorphism; promoter mutation;KW pseudogene; repetitive sequence; RNA polymerase III; thalassemia.XXOS Homo sapiens (human)OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;OC Eutheria; Primates; Catarrhini; Hominidae; Homo.XXRN [1]RP 62409-62631, 63482-63610RX MEDLINE; 74275150.RA Marotta C.A., Forget B.G., Weissman S.M., Verma I.M., McCaffrey R.P.,RA Baltimore D.;RT "Nucleotide sequences of human globin messenger RNA";RL Proc. Natl. Acad. Sci. U.S.A. 71:2300-2304(1974).XXRN [2]RP 63602-63646RX MEDLINE; 76053173.RA Forget B.G., Marotta C.A., Weissman S.M., Cohen-Solal M.M.;RT "Nucleotide sequences of the 3'-terminal untranslated region of messengerRT RNA for human beta globin chain";RL Proc. Natl. Acad. Sci. U.S.A. 72:3614-3618(1975).XXRN [3]RP 63593-63626RX MEDLINE; 77022845.RA Proudfoot N.J., Brownlee G.G.;RT "Nucleotide sequences of globin messenger RNA";RL Br. Med. Bull. 32:251-256(1976).XXRN [4]RP 63673-63743RX MEDLINE; 77114153.RA Proudfoot N.J., Longley J.I.;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -