首页 › 资源下载 › 其他书籍 › Perl & XML. by Er › 源码查看
ch10_03.htm

来自「Perl & XML. by Erik T. Ray and Jason 」· HTM 代码 · 共 501 行 · 第 1/2 页
HTM
501 行
<html><head><title>Converting XML to HTML with XSLT  (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch10_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch10_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">10.3. Converting XML to HTML with XSLT </h2><p>If<a name="INDEX-777" />you<a name="INDEX-778" />'ve done any web hackingwith Perl before, then you've kinda-sorta used XML,since HTML isn't too far off from thewell-formedness goals of XML, at least in theory. In practice, HTMLis used more frequently as a combination of markup, punctuation,embedded scripts, and a dozen other things that make web pages actnutty (with most popular web browsers being rather forgiving aboutsyntax).</p><p>Currently, and probably for a long time to come, the language of theWeb remains HTML. While you can use bona fide XML in your web pagesby clinging to the W3C's XHTML,<a href="#FOOTNOTE-40">[40]</a> it's far more likely thatyou'll need to turn it into HTML when you want toapply your XML to the Web.</p><blockquote class="footnote"><a name="FOOTNOTE-40" /><p>[40]XHTML comes in two flavors. We prefer the less pendantic"transitional" flavor, whichchooses to look the other way when one commits egregious sins (suchas using the <tt class="literal">&lt;font&gt;</tt> tag instead of thepreferred method of applying cascading stylesheets).</p></blockquote><p>You can go about this in many ways. The most sledgehammery of theseinvolves parsing your document and tossing out the results in a CGIscript. This example reads a local MonkeyML file of my petmonkeys' names, and prints a web page to standardoutput (using Lincoln Stein's ubiquitous CGI moduleto add a bit of syntactic sugar):</p><blockquote><pre class="code">#!/usr/bin/perluse warnings;use strict;use CGI qw(:standard);use XML::LibXML;my $parser = XML::XPath;my $doc = $parser-&gt;parse_file('monkeys.xml');print header;print start_html("My Pet Monkeys");print h1("My Pet Monkeys");print p("I have the following monkeys in my house:");print "&lt;ul&gt;\n";foreach my $name_node ($doc-&gt;documentElement-&gt;findnodes("//mm:name")) {    print "&lt;li&gt;" . $name_node-&gt;firstChild-&gt;getData ."&lt;/li&gt;\n";}print end_html;</pre></blockquote><p>Another approach involves XSLT.</p><p>XSLT is used to translate one type of XML into another. XSLT factorsin strongly here because using XML and the Web often requires thatyou extract all the presentable pieces of information from an XMLdocument and wrap them up in HTML. One very high-level XML-usingapplication, Matt Sergeant's AxKit (<a href="http://www.axkit.org">http://www.axkit.org</a>), bases an entireapplication server framework around this notion, letting you set up aweb site that uses XML as its source files, but whose final output toweb browsers is HTML (and whose final output to other devices iswhatever format best applies to them).</p><a name="perlxml-CHP-10-SECT-3.1" /><div class="sect2"><h3 class="sect2">10.3.1. Example: Apache::DocBook</h3><p>Let's make a little module that converts DocBookfiles into HTML on the fly. Though our goals are not as ambitious asAxKit's, we'll still take a cuefrom that program by basing our code around the Apache<tt class="literal">mod_perl</tt> module. <tt class="literal">mod_perl</tt> dropsa Perl interpreter inside the Apache web server, and thus allows oneto write Perl code that makes all sorts of interesting things happenwith requests to the server.</p><p>We'll use a couple of<tt class="literal">mod_perl</tt>'s basic features here bywriting a Perl module with a <tt class="literal">handler</tt> subroutine,the standard name for <tt class="literal">mod_perl</tt> callbacks; it willbe passed an object representing the Apache request, and from thisobject, we'll determine what (if anything) the usersees.</p><a name="ch10-8-fm2xml" /><blockquote><b>WARNING:</b> A frequent source of frustration for people running Perl and XMLprograms in an Apache environment comes from Apache itself, or atleast the way it behaves if it's not given a fewextra configuration directives when compiled. The standard Apachedistribution comes with a version of the Expat C libraries, which itwill bake into its binary if not explicitly told otherwise.Unfortunately, these libraries often conflict with<tt class="literal">XML::Parser</tt>'s calls to Expatlibraries elsewhere on the system, resulting in nasty errors (such assegmentation faults on Unix) when they collide.</p><p>The Apache development community has reportedly considered quietlyremoving this feature in future releases, but currently, it may benecessary for Perl hackers wishing to invoke Expat (usually by way of<tt class="literal">XML::Parser</tt>) to recompile Apache without it (bysetting the <tt class="literal">EXPAT</tt> configuration option to<tt class="literal">no</tt>).</p><p>The cheaper workaround involves using a low-level parsing module thatdoesn't use Expat, such as<tt class="literal">XML::LibXML</tt> or members of the newer<tt class="literal">XML::SAX</tt> family.</p></blockquote><p>We begin by doing the "starting to type in themodule" dance, and then digging into that callbacksub:</p><blockquote><pre class="code">package Apache::DocBook;use warnings;use strict;use Apache::Constants qw(:common);use XML::LibXML;use XML::LibXSLT;our $xml_path;                        # Document source directoryour $base_path;                        # HTML output directoryour $xslt_file;                        # path to DocBook-to-HTML XSLT stylesheetour $icon_dir;                         # path to icons used in index pagessub handler {  my $r = shift;                # Apache request object  # Get config info from Apache config  $xml_path = $r-&gt;dir_config('doc_dir') or die "doc_dir variable not set.\n";  $base_path = $r-&gt;dir_config('html_dir') or die "html_dir variable not set.\n";  $icon_dir = $r-&gt;dir_config('icon_dir') or die "icon_dir variable not set.\n";  unless (-d $xml_path) {    $r-&gt;log_reason("Can't use an xml_path of $xml_path: $!", $r-&gt;filename);    die;  }  my $filename = $r-&gt;filename;    $filename =~ s/$base_path\/?//;    # Add in path info (the file might not actually exist... YET)  $filename .= $r-&gt;path_info;  $xslt_file = $r-&gt;dir_config('xslt_file') or die "xslt_file Apache variable not set.\n";    # The subroutines we'll call after this will take care of printing  # stuff at the client.  # Is this an index request?    if ( (-d "$xml_path/$filename") or ($filename =~ /index.html?$/) ) {    # Why yes! We whip up an index page from the floating aethers.    my ($dir) = $filename =~ /^(.*)(\/index.html?)?$/;    # Semi-hack: stick trailing slash on URI, maybe.    if (not($2) and $r-&gt;uri !~ /\/$/) {      $r-&gt;uri($r-&gt;uri . '/');    }    make_index_page($r, $dir);    return $r-&gt;status;  } else {    # No, it's a request for some other page.    make_doc_page($r, $filename);    return $r-&gt;status;  }  return $r-&gt;status;}</pre></blockquote><p>This subroutine performs the actual XSLT transformation, given afilename of the original XML source and another filename to which itshould write the transformed HTML output:</p><blockquote><pre class="code">sub transform {  my ($filename, $html_filename) = @_;  # make sure there's a home for this file.  maybe_mkdir($filename);  my $parser = XML::LibXML-&gt;new;  my $xslt = XML::LibXSLT-&gt;new;  # Because libxslt seems a little broken, we have to chdir to the  # XSLT file's directory, else its file includes won't work. ;b  use Cwd;                      # so we can get the current working dir  my $original_dir = cwd;  my $xslt_dir = $xslt_file;  $xslt_dir =~ s/^(.*)\/.*$/$1/;  chdir($xslt_dir) or die "Can't chdir to $xslt_dir: $!";  my $source = $parser-&gt;parse_file("$xml_path/$filename");  my $style_doc = $parser-&gt;parse_file($xslt_file);    my $stylesheet = $xslt-&gt;parse_stylesheet($style_doc);    my $results = $stylesheet-&gt;transform($source);  open (HTML_OUT, "&gt;$base_path/$html_filename");  print HTML_OUT $stylesheet-&gt;output_string($results);  close (HTML_OUT);  # Go back to original dir  chdir($original_dir) or die "Can't chdir to $original_dir: $!";}</pre></blockquote><p>Now we have a pair of subroutines to generate index pages. Unlike thedocument pages, which are the product of an XSLT transformation, wemake the index pages from scratch, the bulk of its content being atable filled with information we grab from the document via XPath(looking first in the appropriate metadata element if present, andfalling back to other bits of information if not).</p><blockquote><pre class="code">sub make_index_page {  my ($r, $dir) = @_;  # If there's no corresponding dir in the XML source, the request  # goes splat  my $xml_dir = "$xml_path/$dir";  unless (-r $xml_dir) {    unless (-d $xml_dir) {      # Whoops, this ain't a directory.      $r-&gt;status( NOT_FOUND );      return;    }    # It's a directory, but we can't read it. Whatever.    $r-&gt;status( FORBIDDEN );
ch10_03.htm - 源码说明

本页面展示了「Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April」中的 ch10_03.htm 源码文件，采用 HTM 编程语言编写，共 501 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与T.相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?