📄 00readme
字号:
Conversion of MDL SDF files to DOCKing databases D.A.GschwendOverview This conversion scheme was developed to be easy to use andaccurate in its atom-typing and partial charge computation. To date, ithas been tested (visually) on several thousand compounds. Because of itsease of use and robustness, this route will likely be supplanting thedbprocess scheme which has been used in the past (here at UCSF, anyway).We provide you with both versions so that you may choose the method thatsuits you, and for backward compatibility. The conversion scheme described here consists of two programs run sequentially, one written in Fortran (sdf2mol2) and the other (sybdb) in Sybyl's programming language, SPL. Taken together, one can convert an MDL SDF-format database into a SYBYL MOL2 format database which has appropriate Sybyl atom types assigned, hydrogens added, and partial charges computed. NOTE: The second phase conversion requires Tripos' SYBYL for hydrogen addition and charge computation. The former program, sdf2mol2, may still be of some use to users who do not have Sybyl but have molecular modeling packages than can read MOL2 format (e.g. Insight). Hydrogen addition, substructure removal, and charge computation must then be performed within the context of your own modeling package. sdf2mol2DescriptionThis program takes as input an MDL SDF database file and writes out a Sybylmulti-MOL2 file. This step types all atoms using Sybyl's fairly versatileand descriptive atom types. Usage sdf2mol2 -i <SDfile> -o <MOL2file> [-b <start_at> <stop_at>] where <SDfile> is the name of the input SDF file <MOL2file> is the name of the output multi-MOL2 file and <start_at> and <stop_at> are optional bounds for starting and ending structure numbers, e.g. to process only the first hundred structures, use sdf2mol2 -i <SDfile> -o <MOL2file> -b 1 100Speed~10 min for 100,000 structures (R4400 Indigo2)Method0. Read in SDF structure. Obtain connectivity and bond orders. Define hybridization of each atom based on the highest bond order.1. Search for rings - a breadth first search is used to find the smallest number of smallest rings.2. Assign generic atom types. Atoms other than C, O, N, S, and P receive an atom type the same as the atom name - this is useful for atoms which have only one possible hybridization state (e.g. halogens) and atoms such as metals. All phosphorous atoms become P.3 as this is the only possible phosphorous atom type. Atoms C, O, N, and S get assigned types based on their hybridization as inferred from the bond orders. e.g. doubly-bonded nitrogens become N.2, triply-bonded carbons become C.1, etc. Nitrogens with either zero or 4 neighbors are automatically assigned N.4.3. Detect aromaticity. All rings which contain only X.2 or X.ar atoms are considered aromatic, subject to the following constraint. Please note that the Huckel 4n+2 rule is NOT employed. Rings which have all X.2 atoms but only as a result of exocyclic double bonds, e.g. quinones, are not assigned as aromatic.4. Treat specific functionalities. o Carboxylate-like oxygens are typed. This includes carboxylates, sulf(on,in)ates, phosph(on,in)ates, and nitros. For purposes of this discussion, a singly-connected atom refers to one which has only one neighbor atom, regardless of the bond order of that bond. A carbon with 2 or more singly-connected oxygens, or, a sulfur with 3 or more singly-connected oxygens, or, a terminal sulfur with 2 or more singly-connected oxygens, or, a phosphorous with 2 or more singly-connected oxygens: the oxygens in these groups all considered O.co2's with single bonds. A nitrogen with two or more singly-connected oxygens is considered a nitro - the oxygens are both assigned O.2 with double-bonds, and the nitrogen is given N.pl3 status (this is the way Sybyl does it). o Nitrogen functionalities. Any nitrogen alpha to an olefin is N.pl3. Any nitrogen alpha to an X=O or X=S group is considered an N.am (amide). o Sulfur functionalities. If the number of singly-connected oxygens is one, this is a sulfoxide (S given S.O type); if two, this is a sulfone (S given S.O2 type); otherwise, the sulfur becomes S.2 if it has a double bond, S.3 in all other cases.5. Consider functionalities which may depend on completion of step 4 entirely. o Guanidyls and amidines. A carbon with three neighbors is considered an amidine if at least one neighbor is a C.ar and the other two neighbors are non-aromatic nitrogens. If the central carbon is in a ring, the two nitrogens may not be members of this ring. A guanidyl is any carbon with three non-aromatic nitrogen neighbors. Nitrogens in amidines and guanidyls are given N.pl3, the central carbon is assigned C.cat to insure a formal charge of +1 (again, this is the way Sybyl does it; cf. arginine). Finally, for both amidines and guanidyls, if any of the nitrogens themselves have heteroatom neighbors, this functionality is considered too electron-deficient to be charged and hence NOT an amidine or guanidyl. 6. Insure correct protonation state. Tetrahedral (N.3) nitrogens which do not have heteroatoms or olefins as neighbors are considered protonated and hence promoted to N.4.7. Lone atoms are removed (e.g. single-atom counterions).8. Atoms are renumbered sequentially and atom names made uppercase. Spaces in molecule name converted to underscores.9. The structure is written out.10. Return to 0. sybdbDescriptionThis program is a shell script which creates a Sybyl command file whichcreates and runs a Sybyl SPL macro. Input is an sdf2mol2 output multi-MOL2file, output is a multi-MOL2 file. The program removes all but the largestcovalently bonded substructure, adds hydrogens, adjusts formal charges asappropriate, and computes partial charges using the method of Gasteiger & Marsili.Usage Customization: before the first use, please update inside the sybdb script the location of Sybyl. This will require modifying the variables TA_ROOT, which specifies the root directory for your version of sybyl, and sybcommand, which stores the actual command used to access Sybyl at your site. sybdb <inputMOL2file> <outputMOL2file> where <inputMOL2file> is the output from sdf2mol2 and <outputMOL2file> is the cleaned up multi-MOL2 format file A log file called sybdb.log is also written which includes the name of each molecule processed, formal charges, modifications to formal charges by the script, and any other warnings that Sybyl may have generated. The file sybdb.out contains a record of the entire Sybyl session so that all actions may be examined.NOTE 1: Due to memory limitations, you will in all likelihood need to run sybdb on multi-MOL2 files containing fewer molecules (e.g. 1000). Please use the accompanying "chunks" script for this purpose. This script will allow you to process a database of any size by splitting it into manageable pieces, processing each piece, then catenating all results.NOTE 2: The Gasteiger Marsili charges within Sybyl do NOT have parameters for the very common sulfoxide and sulfone functional groups. If you do not supply parameters for these types of sulfurs, charges on the sulfur and accompanying oxygens will be 0! What we do here at UCSF is to copy the S.3 parameters to S.O and S.O2 so that at least *something* gets used. To do this, you should edit the file $TA_ROOT/sybylbase/tables/gastpar.tab and add the following 4 lines exactly as shown here between dotted lines:............................................................................. S 29 2.3900 10.1400 20.6500 SO copied from S3 P 29 0.0000 6.6000 20.6400 SO copied from S3 S 30 2.3900 10.1400 20.6500 SO2 copied from S3 P 30 0.0000 6.6000 20.6400 SO2 copied from S3............................................................................. You may use an altered gastpar.tab for temporary use only be placing it in your working directory. If no gastpar.tab is found in the working directory, the default file specified above will be used. For further details, see "Appendix 1: Parameter Tables: Charges" in the Sybyl Theory manual. (This is section A-1.7 beginning on page A-444 in the Sybyl 6.1 8/94 documentation.)Speed~3-4 hours for 100,000 structures (R4400 Indigo2)Method0. Setup: Add new bond parameters. A bogus N.ar-H bond type and length is assigned. Sybyl can not seem to accomodate a tertiary (and hence charged) aromatic nitrogen. It would prefer to add a hydrogen and so needs bonding information for the N.ar-H bond. By setting the bond type to nc (not connected), this hydrogen atom never really gets added, but the +1 formal charge is indeed now recognized. A bogus N.1-H bond type and length is also assigned, for similar reasons. Set up to use Gasteiger-Marsili pi charges (off by default). Load metal parameters. This is to insure that Sybyl does not assign dummy types to unrecognizable atoms.1. Read in all molecules, then loop over each one as follows.2. Remove all but the largest substructure.3. Add hydrogens.4. Remove any dummy atoms.5. Rename atoms sequentially - this insures that added hydrogens will have names, as the "fillvalence" command adds hydrogens without giving them names.6. Check for functionalities for which sybyl incorrectly adds hydrogens. Tri-alkyl phosphines incorrectly get an additional hydrogen on the phosphorous. These hydrogens are stripped. Isocyanates (-N%C) should be net neutral (i.e. +1 on nitrogen, -1 on carbon), so the hydrogen normally added to the carbon is removed.7. Adjust formal charges. Sulf(on)ates and phosph(on)ates get their formal charge distributed evenly about the O.co2 oxygens.8. Compute partial charges with the modified formal charges.9. Return to 2 using next structure. Further processingmol2dbTo convert the multi-MOL2 database resulting from sybdb to a dock database,run mol2db but be sure to say NO to charge adjustment, as this has alreadybeen done within the sybdb program.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -