📄 ch14.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<!-- This document was created from RTF source by rtftohtml version 3.0.1 -->
<META NAME="GENERATOR" Content="Symantec Visual Page 1.0">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=iso-8859-1">
<TITLE>Without a title - Title</TITLE>
</HEAD>
<BODY BACKGROUND="r2harch.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/r2harch.gif" TEXT="#000000" BGCOLOR="#FFFFFF">
<H2 ALIGN="CENTER"><A HREF="ch13.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/ch13.htm"><IMG SRC="blanprev.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blanprev.gif" WIDTH="37" HEIGHT="37"
ALIGN="BOTTOM" BORDER="2"></A><A HREF="index-1.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/index-1.htm"><IMG SRC="blantoc.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blantoc.gif" WIDTH="42"
HEIGHT="37" ALIGN="BOTTOM" BORDER="2"></A><A HREF="ch15.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/ch15.htm"><IMG SRC="blannext.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blannext.gif"
WIDTH="45" HEIGHT="37" ALIGN="BOTTOM" BORDER="2"></A><BR>
<BR>
<FONT COLOR="#0000AA">14</FONT><BR>
<A NAME="Heading1"></A><FONT COLOR="#000077">Archive and Document Management<BR>
</FONT>
<HR>
</H2>
<UL>
<LI><A HREF="#Heading1">Archive and Document Management</A>
<UL>
<LI><A HREF="#Heading2">General Archive Management Considerations</A>
<UL>
<LI><A HREF="#Heading3">Planning, Design, and Layout</A>
<LI><A HREF="#Heading5">Revision Control</A>
<LI><A HREF="#Heading6">Summary of Archive Management Issues</A>
</UL>
<LI><A HREF="#Heading7">Parsing, Converting, Editing, and Verifying HTML with Perl</A>
<UL>
<LI><A HREF="#Heading8">General Parsing Issues</A>
</UL>
<LI><A HREF="#Heading9">Listing 14.1. simpleparse.</A>
<LI><A HREF="#Heading10">Listing 14.2. simpleparse-net.</A>
<UL>
<LI><A HREF="#Heading11">Editing and Verifying HTML</A>
</UL>
<LI><A HREF="#Heading12">Listing 14.3. relativize.</A>
<UL>
<LI><A HREF="#Heading13">Parsing HTTP Logfiles</A>
</UL>
<LI><A HREF="#Heading14">Listing 14.4. GD_Logfile.pm.</A>
<LI><A HREF="#Heading15">Listing 14.5. GD_Logfile.test.</A>
<UL>
<LI><A HREF="#Heading16">Converting Existing Documentation to HTML</A>
<LI><A HREF="#Heading18">Converting HTML to Other Formats</A>
<LI><A HREF="#Heading19">Making Existing Archives Available via HTTP</A>
</UL>
<LI><A HREF="#Heading21">Summary</A>
</UL>
</UL>
<P>
<HR>
</P>
<UL>
<LI>General Archive Management Considerations
<P>
<LI>HTML with Perl
</UL>
<P>The typical Webmaster is often challenged by tasks other than creating HTML or
writing CGI programs. He or she also must be familiar with many other techniques
and practices that are commonly used to build and maintain a networked archive and
its components. In this chapter, we'll discuss a number of those tasks and provide
you with some tools to help accomplish them.
<H3 ALIGN="CENTER"><A NAME="Heading2"></A><FONT COLOR="#000077">General Archive Management
Considerations</FONT></H3>
<P>The art and philosophy of archive management on a network predates the Web by
a long time. One of the primary intents of the Internet was, and still is, to allow
the sharing of documents. Some of the early protocols and tools for sharing electronic
resources are still in wide use today, including FTP, NFS, and even Gopher.</P>
<P>When making resources available via any type of server, you need to consider a
number of tactics and practices. Some of these are related to security and are explored
in Chapter 3, "Security on the Web." There are many others, and as far
as I know, a document which covers them all does not exist. The collective experience
of the many thousands of administrators who have contributed to and defined this
body of knowledge would be difficult to summarize in a library, much less a single
chapter in a book.</P>
<P>There are, however, a number of general issues that you become aware of as you
develop an archive and explore the work that others have done. I hope to cover many
of the important issues and their associated tasks in this chapter. Again, and as
always, you can explore other resources, including Usenet, various Web sites, and
possibly even individual administrators who you feel have done things the way you
believe might work for you. I suggest that if you find such a site, you might try
dropping a line to the administrator, asking him or her to share a few tips. Of course,
you may be completely ignored, but you may also be rewarded with a buried bone or
two, which might save you time and energy in the future.</P>
<P>You'll notice in this chapter that Perl isn't the primary topic on every page.
As we've said, the intent of this book is to show and teach you how to use Perl in
your Web programming duties and tasks. On the other hand, in other works we've studied,
the coverage of the issues and topics in this chapter seems to be rather minimal.
I'm covering some of the topics in this chapter primar-<BR>
ily for the sake of completeness.
<H4 ALIGN="CENTER"><A NAME="Heading3"></A><FONT COLOR="#000077">Planning, Design,
and Layout</FONT></H4>
<P>The structure and layout of your archive is one of the important decisions you'll
make if you're just starting out. There are a number of issues to consider, and decisions
to make, when you're first laying out your archive. After you've made these decisions,
it won't be quite so easy to make changes to the structure and/or layout. You should
plan carefully and try to consider all of the possibilities for what may happen to
your archive in the future--before you ever create the first directory or file. Let's
consider some of the most important issues now. <B><TT>Document Naming</TT></B> The
names that you give to your documents and directories are important for several reasons.
First, and possibly most useful to you as the archive maintainer, is to have some
sort of notion of what's inside a document or directory, based on its name. Another
consideration is whether the files and directories you'll create must be usable on
DOS or other architectures that don't support long filenames.
<DL>
<DT></DT>
</DL>
<H3 ALIGN="CENTER">
<HR WIDTH="82%">
<BR>
<FONT COLOR="#000077">NOTE:</FONT></H3>
<BLOCKQUOTE>
<P>There are essentially two schools of thought on naming file system elements. The
first stipulates that one should assign names to the elements within an archive that
allow for the ability to determine the contents of the file or directory based on
the name. The second, also known as the ISO9660 specification, stipulates that the
names should follow the 8.3 format and use only alphanumeric characters. Obviously,
the restriction to only eight characters in the primary component of the name restricts
your ability to assign names based on contents. You should consider whether your
archive will ever need to reside on an operating sys-<BR>
tem that requires the 8.3 format (DOS), or whether you'll ever make it available
via <BR>
CD-ROM. In either case, you'll probably want to choose the ISO9660 naming conventions.
Don't forget the possibility that in the future, you may wish to have your archive
mirrored to a system that doesn't support long names as well. If you're already running
under a file system that handles long names and need to migrate or mirror your archive
to a system that only handles the short names, you might have to make some major
changes in order for everything to work. We'll discuss how to perform this transition
later in this chapter, in the section entitled "Moving an Entire Archive,"
but it's definitely nontrivial.<BR>
<HR>
</BLOCKQUOTE>
<P>In any case, you'll need to reserve the extension component of the filenames for
MIME typing, which allows your server to properly send the browser the appropriate
instructions for how to handle the document. See Chapter 5, "Putting It All
Together," for more details. Be sure to check that your server's mime.types
and srm.conf files follow the standard conventions for extensions, and add configuration
entries to your server for any additional types that you define. <B><TT>Archive Hierarchical
Organization</TT></B> The directory tree that makes up your archive is one that you'll
be "climbing" up and down quite often. You should make its branches easy
to remember and intuitive to understand. Each new resource in your archive will have
to be stored somewhere in this tree. When you use an unambiguous, comprehensive structure
for classification of resources according to their storage location, deciding where
to place things (and where to find them later) will be a lot easier.</P>
<P>After you've decided on a naming convention, you'll want to spend some time planning
the structure of the directories. Naturally, if you're using long names, you can
be pretty creative with your layout; if not, then I recommend that you use some sort
of simple mapping from an ordered list of eight-character names to corresponding
groups or classifications.</P>
<P>You might point out, and you'd be correct, that the structure of the HTML document
already gives the notion of hierarchy to the resources to which it refers. However,
this applies only to the browser and gives no advantage to the maintainer of the
documents. Creating structure, in the form of directories (or folders) in your archive,
makes the HTML a bit more complicated to write but relieves the confusion and intimidation
of having all the files reside in one location.
<DL>
<DT></DT>
</DL>
<P><B><TT>Configuration Management</TT></B>
<BLOCKQUOTE>
<P>"A set of procedures for tracking and managing software throughout its lifecycle"
(Configuration Management for Software, Compton & Connor, 1994, ISBN <BR>
0-442-01746-4).
</BLOCKQUOTE>
<P>This notion of structure also arises from the science of configuration management
in general. We'll be discussing another aspect of configuration management, revision
control, later in this chapter. <B><TT>Access and Security</TT></B> Another advantage
of creating a structured archive is the ability to restrict access on a per-directory
basis of most HTTP servers. Configuring the server to do this has been covered elsewhere,
and I won't go into how it's done here. I point it out only to highlight the added
value of planning and creating a sound directory structure for your archive. Of course,
the implication is that you've planned carefully and created the structure in such
a way as to use this feature selectively, another consideration in the planning stage.
<B><TT>Top-Level Documentation</TT></B> Every archive directory should have some
sort of an explanation of what its purpose is and, ideally, a description of the
contents. Whether this description is intended only for the maintainer and/or for
public access is up to you. Ideally, this file would be located within the directory
that it describes. It could be the <TT>index.html</TT> and thus serve the dual purpose
of describing the contents to the browser and the maintainer. This document (probably
just a text file) will help the person considering a change to the archive's contents
decide whether this location is appropriate for the change or addition.
<H4 ALIGN="CENTER"><A NAME="Heading5"></A><FONT COLOR="#000077">Revision Control</FONT></H4>
<P>The process (and rigor) of revision control is often overlooked or even ignored
when an administrator manages an archive. However, there are some very good reasons
you should use some sort of version control when creating and updating your resources.
<B><TT>A Policy for Change--Description</TT></B> The process of making your documents
available via the Web is really one of publishing. When you, as a representative
of a company, make a document available, you're making a statement that represents
your company. While some of the issues and legalities are still murky, you should
consider the liability that you or your company assumes when making documents available.
The information within the documents should be correct, and insofar as is possible,
verifiable, and free of misrepresentations.</P>
<P>Such considerations give rise to the need for a policy for the management of the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -