📄 q-dm.tex
字号:
\chapter{DM Tools}\label{DUA:dmtools}\pgm{DM tools} are a set of shell scripts which provide some simple bulkdata management functions using DAP. The tools have the followingcharacteristics.\begin {itemize}\item They obviate such skulduggery asediting EDB files on the DSA machine.\item They provide some ability to add, modify and delete entries andattributes.\item They will play a part in managing data from multiple sources, butthere are several limitations (see caveats later).\item They will not handle large numbers of thousands of entries in one go,but have been used with success with a few thousand entries.\item Based on DISH commands with lashings of shell and [gn]awk.\end{itemize}\section {How the Tools Work}The tools are driven by data in a syntax very similar to the EDB files. Aspecial-purpose difference tool is used to work out differences between thecurrent version of the data and the previous version. Another toolprocesses the resultant differences (which may, of course, be the originalfile the first time round) and translates this data into a shell script ofthe DISH commands required to update the directory appropriately. Run theresultant shell script to apply the modifications.\section {The Bulk Data Format --- dmformat}This is very similar to the EDB format. The differences are as follows:\\%%%\renewcommand{\arraystretch}{2}\begin {tabular}{|l|l|}\hlineEDB & DMFORMAT \\\hline\hlineDIT hierarchy mapped & Flat file with embedded info \\onto UNIX directory & saying where entries should be \\structure & loaded in the DIT \\& \\Files start with: & File don't start with ... \\MASTER & \\date in UTC format & \\& File contains "rootedAt" info \\& \\& Syntax includes mechanism for \\& specifying deletion of an entry / \\& attribute \\& \\Can only represent one set& Can represent information \\of sibling entries & in an entire subtree or \\& collection of subtrees \\\hline\end {tabular}\\Comments may be interspersed throughout the file. A comment line beginswith a ``\#'' character.rootedAt indicates the parent node in the DIT for subsequent entries in thefile. Separate a rootedAt line from entries by one or more blank lines.A set of entries follows a rootedAt line. These are formatted in the sameway as in an EDB file: i.e., an entry is a sequence of attribute type-valuepairs, where the first pair is the RDN for the entry.Entries are separated from other entries by blank lines.In addition to the conventional syntax it is possible to specify deletion ofentries and attributes.\begin{itemize}\item Specify entry deletion by prefixing the RDN with the ``!'' character.\item Specify attribute value deletion by prefixing the attribute type=valueline with a ``!'' character.\end{itemize}A file can contain information for many DIT subtrees by including morerootedAt lines.\section{dmformat --- An Example}\begin{verbatim}#subsequent entries are relative to this point# in the DITrootedAt= c=gb@o=UCL@ou=CS# add this entry with these attributes# if it doesn't already exist# try to add in these attribute values if# the entry already existscn=Paul Barkersurname=BarkertelephoneNumber=+44 71 380 7366objectClass=organizationalPerson & quipuObject & ...# Add the first telephone number attribute# value and delete the secondcn=Steve KilletelephoneNumber=+44 71 380 7294!telephoneNumber=+44 71 380 1234# Delete this entry!cn=Colin Robbins# don't have to supply attributes, but can# if you like!telephoneNumber=+44 71 387 7050 x3688#subsequent entries are relative to this point# in the DITrootedAt= c=gb@o=UCL@ou=Physics\end{verbatim}\section{Using the Tools}The tools can be used to load the database initially as follows:\begin{itemize}\item Produce a file ``newfile'' of entries to be loaded\item Make a file of DISH operations to effect the update\verb|crmods < newfile|\item Apply the updates\verb|sh modfile|\end{itemize}It can also be used for subsequent amendments\begin{itemize}\item Create a file of difference data\verb|dmdiff oldfile newfile > difffile|\item Create a shell/DISH script to do the update\verb|crmods < difffile|\item Apply the updates\verb|sh modfile|\end{itemize}There are examples of using the tools and sample Makefiles in the READMEfile accompanying the software.\section{Preparing Data for use with DM Tools}The tools will work more efficiently if the following guidelines arefollowed:\begin{itemize}\item Attribute type strings in DM files should be the same as thosewritten out by DISH when using ``showentry -edb''In practice this means using the abbreviated attribute names as specified in\$(ETCDIR)/oidtable.at. E.g., use ``cn'' rather than ``commonName'', and``mail'' rather than ``rfc822Mailbox''.\item Be consistent with capitalisation and case in general between DM filesproduced from the various sources.\item Attribute values with DN syntax should have the country name partrepresented in capitals, as in ``c=GB''. This is because QUIPU alwayswrites them out that way. In all other cases, QUIPU maintains the casewith which entries' attributes are created.\end{itemize}\section{Some Specific Shortcomings of the DM Tools}\begin{itemize}\item Scale --- the shell script, {\tt modfile,} which crmods produces, isvery large for substantial amounts of data or data differencesIt may be more manageable to split data into a set of department files, asfor EDBs, and apply set of updates.\item Matching of attribute types and attribute values is case-sensitive,whereas almost always it should be case-independent.In practice this is not too much of a problem\begin{itemize}\item At worst, it means that too many ``differences'' are discovered\item QUIPU does the ``right thing'' anyway\end{itemize}\item No explicit mechanism for renaming entries --- achieved by deletingentry with old name and creating a new entry.You may thus discard attribute information which has been loaded fromanother source.\item Tools have no knowledge that entries may be mastered by morethan one source.If an entry is deleted from one source, it will be deleted from theDirectory even if the entry still exists in another source. This may, ormay not, be want you want!\item No explicit support for maintenance of seeAlso, roleOccupant and otherattributes which have DN syntax.All necessary management to avoid ``dangling pointers'' must beachieved externally\item No support for management of aliases\item Updating over DAP can be rather slow for entries with large numbers ofsiblings (in QUIPU terms, in a large EDB file).There is a solution --- use the TURBO\_DISK option when compilingQUIPU. This makes use of GNU's gdbm package. Consider this if you doa lot of updating and you have large EDB files.\item There are some known bugs. Inherited attributes are not always handled correctly, and problems with eDBInfo have been reported.%%% Fixes gratefully received --- send them to \verb+<p.barker@cs.ucl.ac.uk>.+\end{itemize}\section {General Data Management Problems Not Catered For}\begin{itemize}\item Management of data from multiple sources is very difficult --- no supportfor merging data from different sources, or for consistent deletion.\item No framework for discrimination between quality of data sources --- thismust be handled manually\item Relying on diffs not really satisfactory --- need to rebuild databaseperiodically from source data\item Naming of entries --- DM tools offer no help with naming to personmaintaining the Directorydatabase. This administrator should be aware of at least the followingproblems\begin{itemize}\item Two sources may name an entity differently\begin{verbatim}source one: P Barkersource two: Paul Barker\end{verbatim}\item Need to be careful that no duplicate RDNs are formed when processingthe source data into EDBs or DM files.\begin{itemize}\item If building EDBs, QUIPU will detect multiple RDNs as it loads itsdatabase.\item DM tools will perform multiple updates on a single entry\end{itemize}\item Even in case where one is loading from a single source, the name whichis systematically derivable may be unsatisfactory. E.g.,\verb|PHYS & ASTRO|rather than\verb|Physics and Astronomy|\item A source's vies of what constitutes a department may be parochial,suiting particular requirements. For example, the UCL telephone directorydatabase has the following two departments\begin{verbatim}BIOLOGY (DARWIN)BIOLOGY (MEDAWAR)\end{verbatim}whereas the University view, which must be represented, is that there isjust a single ``Biology'' department\item Need to be careful when joining departments in this way that no RDNclashes occur. If they do occur, a solution is to name entries withmultiple value RDN.cn=Fred Bloggs\%ou=Biology (Medawar)\end{itemize}\end{itemize}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -