📄 ch14.htm
字号:
archive. This policy should be both comprehensive and understandable by anyone who
will participate in the creation of the archive or the modification of its contents.</P>
<P>A policy for change can be as formal as you like. In general, the following items
should be considered when designing the policy:
<UL>
<LI>Creation--The process of creating new elements or the hierarchy (directories)
within the archive.
<P>
<LI>Updating--The process of revising the archive's elements or hierarchy.
<P>
<LI>History--Retaining older versions of documents for future reference or consideration,
including the purpose of reverting back to a previous version.
<P>
<LI>Accountability--Who installed or updated the document, when did he/she perform
this action, and why?
</UL>
<P>The level of complexity in a policy for change that can arise out of just these
four items might surprise you. For instance, when dealing with source code and documentation
for a given software product, some organizations implement a multi-tiered structure
of committees, forms, and checklists which any change or addition must pass through
before being applied to the released document(s) or code. At some point, productivity
may suffer if the process becomes too complex. The idea is to find a happy medium
between no policy at all and one that bogs you down. <B><TT>A Policy for Change--Motivation</TT></B>
Plenty of things can go wrong when you're populating an archive or updating its elements.
In the best cases, the Webmaster or Web team is immediately made aware of the problem
and is able to deal with it. On the other hand, some minor errors in documents or
functionality may go unnoticed for a long time, potentially becoming a permanent
problem due to the number of copies of the documents that were distributed with the
error. With the number of indexers, auto-mirrors, archiving proxies, and other forms
of duplication that exist today on the Web, the proliferation of errors can be almost
immediate and quite difficult to overcome.</P>
<P>Obviously, the best policy is one in which no documents would be distributed with
errors or misrepresentations. However, implementing such a policy is quite difficult,
even if you already are using a sound policy for change. If you don't have a change
policy, then the difficult becomes practically impossible. A number of situations
can lead to errors; let's consider a few of them.
<UL>
<LI>Multiple Versions--A document, perhaps an image, script, or applet, may exist
in several locations within the archive. A copy is updated, but the changes may not
propagate to the other installed copies. Because no single copy is designated as
the master copy, changes also may occur independently to the copies, causing additional
problems.
<P>
<LI>Simultaneous Updates--An archive or one of its components may be managed by a
group of people. This inevitably leads to simultaneous changes in some element, if
some form of revision control is not used. Suppose one person copies a document,
starts making changes to it, and before he/she is finished, someone else makes another
copy of the same document and starts making changes. The inevitable outcome is that
one or the other's work will be lost, depending on who copies the changed document
back into the archive last.
<P>
<LI>Security/Access--Some operating systems provide a means to restrict access to
files based on the UserID or group. These mechanisms lack the functionality necessary
for a dynamic, effective policy for allowing a particular person to perform a particular
task on a given element in an archive. Such tasks may need to be performed on a repetitive
basis, or possibly only once, by a given Web team member or other individual. A need
also may exist to allow certain types of access (for example, reading), but disallow
others (such as updating) on a given element in the archive based on the local UserID.
Some file systems have this sort of functionality built in, via Access Control Lists
(ACLs), but these mechanisms may be still inadequate and are rarely enforceable across
networked file systems or different architectures.
<P>
<LI>Accountability/Audits--If more than one person has the ability to make changes
to the archive, then tracking the changes and who made them becomes difficult. In
case of an error or omission, it may be desirable to learn who made the error. Most
ordinary file systems don't give you the ability to track changes and who made them.
Ideally, each element should have its own history or record of changes made to it,
and who made the changes to it during its life in the archive.
<P>
<LI>Creation/Population--If you, as the Webmaster, have carefully thought through
the issues outlined in this chapter and have implemented a policy for change, and
then someone decides to create a new directory or other element in the archive without
being aware of the policy, you might find this action a bit irksome. The lack of
consideration of the plan on the part of this person probably means that you will
have to go in and fix things to restore the original order. Allowing others to create
new elements in the archive should imply that they understand the issues involved
and practices/policies for doing so.
</UL>
<P>There are other potential problem scenarios I haven't mentioned, but these should
give you the general idea. In order to properly maintain an archive, especially as
a group or team representing a company, it's essential to use some form of revision
control, and to have a well-understood policy for change. <B><TT>A Policy for Change--Solutions</TT></B>
A variety of tools and systems are available to implement revision control. Some
of them are available for free, and others are commercially available and well-supported.
An organization may also wish to implement a home-grown solution, perhaps using Perl
and some other tool or tools. We're not going to attempt to implement such a tool
in this book, but the following list should give you an idea of what tools are available.
I'm also not going to try to give a comprehensive overview here; I'll just cover
some of the most popular solutions. <TT>ftp://ftp.cs.purdue.edu/pub/RCS/ </TT>
<TABLE BORDER="0">
<TR ALIGN="LEFT" rowspan="1">
<TD WIDTH="54" ALIGN="LEFT" VALIGN="TOP">RCS/CVS</TD>
<TD ALIGN="LEFT">This toolset is probably the most widely used tool for revision control on UNIX operating
systems. It's a GNU tool, originally created at Purdue University. RCS/CVS has had
contributions, bugfixes, and patches like other GNU software from caring individuals
all across the Internet. RCS stands for Revision Control System. CVS is a front-end
to RCS, which adds functionality and implements additional features to RCS. CVS extends
the functionality of RCS by providing the ability to create a private copy of an
entire suite of documents, and then optionally lock, modify, and check-in a given
document. Each document's changes (deltas) are kept in a storage container corresponding
to the name of the document. Ports of RCS/CVS are also available for Macintosh and
DOS/Windows. It is freely available, well understood, and help is fairly easy to
find via the documentation, Usenet, or mailing lists. It operates primarily on text
files. RCS/CVS is available at most standard Usenet sources archive sites and always
at Purdue: <A HREF="javascript:if(confirm('ftp://ftp.cs.purdue.edu/pub/RCS \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='ftp://ftp.cs.purdue.edu/pub/RCS'" tppabs="ftp://ftp.cs.purdue.edu/pub/RCS"><TT>ftp://ftp.cs.purdue.edu/pub/RCS</TT></A>.</TD>
</TR>
</TABLE>
<A HREF="javascript:if(confirm('http://www.atria.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.atria.com/'" tppabs="http://www.atria.com/"><TT>http://www.atria.com </A></TT>
<TABLE BORDER="0">
<TR ALIGN="LEFT" rowspan="1">
<TD ALIGN="LEFT" VALIGN="TOP">ClearCase</TD>
<TD ALIGN="LEFT">This toolset, available through Atria, Inc., actually implements a complete file
system and is possibly the most powerful, complex, and configurable of any other
configuration management tool available. It's primarily used for source code control
and software project management but makes a very nice archive management tool as
well. I keep the chapters and sample code that comprise this book under ClearCase.
ClearCase lacks a Macintosh interface, but it can export its files via NFS. It operates
on text files, binaries, images, and even directories, along with any other filetype
you wish to configure. It is available through the Pure-Atria sales staff at Pure-Atria,
Inc., and through the Web site at <A HREF="javascript:if(confirm('http://www.pureatria.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.pureatria.com/'" tppabs="http://www.pureatria.com/"><TT>http://www.pureatria.com</TT></A>.</TD>
</TR>
</TABLE>
<A HREF="javascript:if(confirm('http://www.microsoft.com/SSAFE/Default.html SourceSafe \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.microsoft.com/SSAFE/Default.html%20SourceSafe'" tppabs="http://www.microsoft.com/SSAFE/Default.html%20SourceSafe"><TT>http://www.microsoft.com/SSAFE/Default.html</TT>
SourceSafe </A>
<TABLE BORDER="0">
<TR ALIGN="LEFT" rowspan="1">
<TD WIDTH="53" ALIGN="LEFT"></TD>
<TD ALIGN="LEFT">SourceSafe is another toolset available as a commercial product. In terms of functionality,
it looks and feels much like CVS, but it implements a database for its internal references
to revisions and history and has additional features and user interfaces. SourceSafe
operates on text, binaries, and images. I haven't used the SourceSafe toolset in
the role of archive management, but it seems to have the necessary functionality.
Microsoft also seems to be actively adding functionality and support since it acquired
the SourceSafe product. Implementations of SourceSafe are available for UNIX, Macintosh,
and Windows.</TD>
</TR>
</TABLE>
<A HREF="javascript:if(confirm('http://www.mks.com mks toolkit/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mks.com mks toolkit/'" tppabs="http://www.mks.com mks toolkit/"><TT>http://www.mks.com</TT> MKS Toolkit
</A>
<TABLE BORDER="0">
<TR ALIGN="LEFT" rowspan="1">
<TD WIDTH="55" ALIGN="LEFT"></TD>
<TD ALIGN="LEFT">The MKS Source Integrity toolset is another revision control system. I haven't actually
seen this implementation, but because it's from MKS, you can bet it has an implementation
for Windows. Contact MKS through its Web page at <A HREF="javascript:if(confirm('http://www.mks.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mks.com/'" tppabs="http://www.mks.com/"><TT>http://www.mks.com</TT></A>.</TD>
</TR>
</TABLE>
Each of these tools has advantages and disadvantages, and there are certainly other
tools available that I'm not aware of. Investigate as many systems as you can, then
choose one and stick with it. The process of checking the archives' elements in and
out each time you wish to update them might seem a bit rigorous at first, especially
for those who've never used a revision control system, but in the long run, revision
control always pays off, and you'll be glad you took the time to implement it.
<CENTER>
<H4><A NAME="Heading6"></A><FONT COLOR="#000077">Summary of Archive Management Issues</FONT></H4>
</CENTER>
<P>The topics we've discussed so far form the basis of the important issues and considerations
for managing an archive on the Net. As I mentioned earlier, the needs and requirements
for a configuration management plan vary from site to site. Even the simplest plan
should include instructions and policy for the following actions, derived from the
policies and problems listed earlier in this chapter:
<UL>
<LI>Naming Conventions--Choosing long names versus ISO9660 names, and implementing
appropriate and uniform extension naming.
<P>
<LI>Archive Layout--Creating hierarchy that is easily navigable but usable and useful
for classification of elements.
<P>
<LI>Version Control--The ability to reconstruct any previous version of any of the
archives element, at any given time.
<P>
<LI>Access Control--Access to elements based on dynamic needs and changing personnel
duties, allowing possibly for off-site modification. Configuring the elements' permissions
appropriately.
<P>
<LI>Sequential Changes--One and only one change to an archive element at a time.
<P>
<LI>Updates--A specific process to merge changed elements back into the archive as
the new release. May include approvals or a consensus, and ideally should be automated.
<P>
<LI>Creation--Based on the layout plan and organization of the current elements and
directory structure.
<P>
<LI>Accountability--Must have the ability to verify who made a change or creation,
what was changed, and when, where, and why a change or creation occurred.
<P>
<LI>Verification/Testing--Manual or automated verification of the correctness of
the new release, and that it hasn't affected any other component of the archive's
functionality.
<P>
<LI>Reporting--The ability to report to anyone who might be interested, and has a
right to know, regarding the usage and access of, and changes to, the archive. This
task might possibly be automated, sending out reports on a nightly basis, for instance.
</UL>
<P>The rest of this chapter focuses on the specific tasks within these topics that
you might face in the day-to-day management of the archive. The focus also now returns
to how you can use Perl to implement these tasks.
<CENTER>
<H3><A NAME="Heading7"></A><FONT COLOR="#000077">Parsing, Converting, Editing, and
Verifying HTML with Perl</FONT></H3>
</CENTER>
<P>One of the more important, but less well-documented duties of the Webmaster is
updating and verifying the HTML in the archive's documents. Aside from the need for
revision control, which we've already mentioned, how does one actually go about making
changes, potentially en masse, to the archive's HTML documents? Once the changes
have been performed, how does the Webmaster verify that they have not affected any
other component of the archive? Fortunately, text management is one of the great
strengths of Perl, and there are a number of modules and tools for accomplishing
this task.
<CENTER>
<H4><A NAME="Heading8"></A><FONT COLOR="#000077">General Parsing Issues</FONT></H4>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -