📄 moss.html
字号:
in percent of the size (number of molecules) of the correspondingsubset. That is, the minimal support in the focus is the percentageof molecules in the focus set, the maximal support in the complementthe percentage of molecules in the complement set. Hence the basesfor the percentages are different and therefore they cannot be compareddirectly. Alternatively, these support values may be specified asabsolute numbers of molecules, namely by using negative numbers.</p><p>The last to fields allow to constrain the substructures that arereported by their size, which is measured by the number of atoms thatare contained in a substructure. If the maximal size is 0, this meansthat substructures of arbitrary size may be reported. Restricting thesubstructure sizes is not recommended for novice users, as this canhave surprising horizon effects.</p><p>The options on the third tab control how atoms and bonds arematched:</p><p><img src="match.png"></p><p>By default, aromatic bonds are treated as an extra type, but theymay also be downgraded (that is, treated as equivalent to single bonds)or upgraded (that is, treated as equivalent to double bonds). The typeof bonds is usually taken into account, but it may be ignored in ringsor even for all bonds. Similarly, the type of atoms (chemical element)is usually taken into account, but it may be ignored in rings or evenfor all atoms. Other properties of atoms that may be matched (apartfrom their type, i.e., chemical element) are their charge and theiraromaticity (i.e., whether they are part of an aromatic ring, whichis determined by checking whether they are incident to an aromaticbond).</p><p>One may specify to excluded certain atom types from the search(meaning that no substructures containing them are to be reported).They can be listed in the next input field and have to be specifiedas a molecule, using the notation language that was specified for theseed structure (see first tab). The bonds that are used to specify themolecule are ignored. Nevertheless it may be best to use the null bond"." for all bonds.</p><p>Finally, atom types that should not be considered as seeds (whensearching without an explicit seed) can be listed. These types are notgenerally excluded from the search, as they may appear with other seeds.It is guaranteed that only substructures that consist only of the atomtypes listed here (and no other types) will be discarded.</p><p>On the fourth tab parameters connected to ring and chain extensionscan be specified:</p><p><img src="rings.png"></p><p>There are two different representations of aromatic rings, namelyusing actual aromatic bonds or so-called Kekule representations, whichuse alternating single and double bonds. Since this is, of course, ahindrance to matching aromatic rings, it is recommended to convertKekule representations to actual aromatic bonds. Deactivate thisoption only if you know what you are doing.</p><p>It can be useful to distinguish bonds in rings from other bonds,namely if one wants to be sure that a bond in a fragment is eithera ring bond or a non-ring bond in all supporting molecules (and nota ring bond in some, but a non-ring bond in others). To distinguishring bonds from other bonds, a range of ring sizes (measured as thenumber of bonds of the ring) has to be specified, since only ringsin the specified range will actually be considered as rings. For mostchemical applications a range from 5 to 6 bonds is best.</p><p>Ring extensions can be very useful if one is not interested infragments that contain only part of a ring. There are three flavorsto do this, which differ only in the efficiency of the search, notin the result. Usually full ring extensions are by far the fastestmethod and it is therefore recommended to use them, if any. (Notethat distinguishing ring bonds from other bonds also has an effectif ring extensions are not used.)</p><p>Finding variable length carbon chains can be useful if one isinterested in considering substructures as equivalent that only differin the length of carbon chains between their parts.</p><p>The next tab contains miscellaneous options, which are of realimportance only for an expert user:</p><p><img src="misc.png"></p><p>The options on this tab do not affect the output of the program(provided there is no bug), but only affect the performance in termsof execution time and memory usage. It is recommended not to changethe default settings. Explanations about the pruning methods can befound in <a href="#borgelt_et_al_2004">[Borgelt et al. 2004]</a> and<a href="#borgelt_2005">[Borgelt 2005]</a>.</p><p>The last tab only states the version of the MoSS program, andprovides contact information and copying rules:</p><p><img src="about.png"></p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right> <a href="#top">back to the top</a> </td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="invoke">Command Line Program</a></h3><p>The general form of invoking the command line version of theMoSS program is:</p><pre>java -classpath moss.jar moss.Miner [options] <seed> [<in>] [<out>] [<ids>]</pre>for the Java archive or<pre>java moss.Miner [options] <seed> [<in>] [<out>] [<ids>]</pre><p>for the compiled source code (assuming in the latter case that thecurrent working directory is the parent directory of the <tt>moss</tt>source directory or that the <tt>CLASSPATH</tt> environment variablehas been set appropriately).</p><p>The meaning of the arguments is:</p><p><table border=0 cellpadding=0 cellspacing=0><tr><td valign="top"><tt><seed></tt> </td> <td>is the description of a seed structure to start the search from. It is the only mandatory argument. By default it is expected to be in SMILES format, but this may be changed with the option <tt>-f</tt> to SLN format (see <a href="#langs">Molecule Description Languages</a> and <a href="#options">Options</a>). An empty seed may be specified by using an empty string or the special seeds "<tt>~</tt>" or "<tt>*<tt>".</td></tr><tr><td valign="top"><tt><in></tt></td> <td>is the input file with the molecular database to analyze. It is expected to be in the format described <a href="#input">here</a>. By default (i.e., if no input file name is given) the name "moss.dat" will be used for this file.</td></tr><tr><td valign="top"><tt><out></tt></td> <td>is the output file into which the found substructures are written, together with some additional information about them. By default (i.e., if no output file name is given) this file will be named "moss.sub". The format of this file is described <a href="#output">here</a>.</td></tr><tr><td valign="top"><tt><ids></tt></td> <td>is the output file into which lists of containing molecules are written for each found substructure. It is optional and is not written if no file name is provided for it. The format of this file is described <a href="#output">here</a>.</td></tr></table></p><p>The possible options, by which the search can be controlled,are described <a href="#options">here</a>. If MoSS is invoked withoutany arguments, it prints a list of possible options together with ashort description of their meaning.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right> <a href="#top">back to the top</a> </td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="options">Command Line Options</a></h3><p>MoSS supports a variety of options, with which the search forfrequent substructures can be controlled. For the command line version,these options may be specified in any place on the command line (before,between, or after the normal program arguments). In the graphical userinterface, which is described <a href="#gui">here</a>, these optionsare specified in the dialog window.</p><p><b>Format Options</b> control the format of the moleculedescriptions for the seed and in the input and output files(see <a href="#langs">Molecule Description Languages</a>).</p><p><table><tr><td valign="top"><tt>-f# </tt></td><td> </td> <td>seed format (smiles or sln) (default: smiles)</td></tr><tr><td valign="top"><tt>-i#</tt></td><td> </td> <td>input format (smiles or sln) (default: smiles)</td></tr><tr><td valign="top"><tt>-o#</tt></td><td> </td> <td>output format (smiles or sln) (default: smiles)</td></tr></table></p><p><b>Split Options</b> control how the molecular database is splitinto the focus part and the complement part.</p><p><table><tr><td valign="top"><tt>-t# </tt></td><td> </td> <td>threshold value for the split (default: 0.5)</td></tr><tr><td valign="top"><tt>-z</tt></td><td> </td> <td>invert split (> versus ≤ instead of ≤ versus >) </td></tr></table></p><p>If one does not want to find discriminative fragments, the complementpart should be empty. This can be achieved by specifying a thresholdthat is larger than any value associated with a molecule in thedatabase.</p><p><b>Support Options</b> control the minimum frequency for thefocus and the maximum frequency for the complement with which asubstructure must occur to be reported.</p><p><table><tr><td valign="top"><tt>-s# </tt></td><td> </td> <td>minimum support in focus (default: 10.0%)</td></tr><tr><td valign="top"><tt>-S#</tt></td><td> </td> <td>maximum support in complement (default: 2.0%)</td></tr></table></p><p><b>Size Options</b> control the size a substructure must havein order to be reported.</b><p><table><tr><td valign="top"><tt>-m# </tt></td><td> </td> <td>minimum size of a substructure (default: 1 atom(s))</td></tr><tr><td valign="top"><tt>-n#</tt></td><td> </td> <td>maximum size of a substructure (default: no limit)</td></tr></table></p><p><b>Matching Options</b> control how atoms and bonds in the moleculesare matched by substructures. In particular, they control which atomand bond types are seen as equivalent.</p><p><table><tr><td valign="top"><tt>+/-a </tt></td><td> </td> <td>match/ignore aromaticity of atoms (default: ignore)</td></tr><tr><td valign="top"><tt>+/-c</tt></td><td> </td> <td>match/ignore charge of atoms (default: ignore)</td></tr><tr><td valign="top"><tt>+/-d</tt></td><td> </td> <td>match/ignore atom type (default: match)</td></tr><tr><td valign="top"><tt>+/-D</tt></td><td> </td> <td>match/ignore atom type in rings (default: match)</td></tr><tr><td valign="top"><tt>+/-:</tt></td><td> </td> <td>upgrade/downgrade aromatic bonds (default: extra type)</td></tr><tr><td valign="top"><tt>+/-b</tt></td><td> </td> <td>match/ignore bond type (default: match)</td></tr><tr><td valign="top"><tt>+/-B</tt></td><td> </td> <td>match/ignore bond type in rings (default: match)</td></tr></table></p><p>An atom is aromatic if it is part of an aromatic ring. Downgradingan aromatic bond means treating it as a single bond, upgrading meanstreating it as a double bond. The option <tt>-B</tt> only has an effectif rings are marked with the option <tt>-r</tt> (see below) and thenonly for the marked rings.</p><p><b>Exclusion Options</b> allow to restrict the set of atoms types(chemical elements) that are considered in the search</p><p><table><tr><td valign="top"><tt>-x# </tt></td><td> </td> <td>atom types to exclude (as a molecule, in seed format)</td></tr><tr><td valign="top"><tt>-y#</tt></td><td> </td> <td>atom types to exclude (as a molecule, in seed format)</td></tr></table></p><p><b>Ring Mining Options</b> lead to rings (or at least ring bonds)being treated differently from other bonds. In addition, they allowswitching to extensions by full rings rather than individual bonds inone step (see <a href="#hofer_et_al_2004">[Hofer et al. 2004]</a>).</b><p><table><tr><td valign="top"><tt>-r#:#</tt></td><td> </td> <td>mark rings of size # to # bonds (default: no marking)</td></tr><tr><td valign="top"><tt>-R</tt></td><td> </td> <td>extend with rings of marked sizes (default: indiv. bonds)</td></tr>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -