validationconsumer.java

来自「kaffe Java 解释器语言,源码,Java的子集系统,开放源代码」· Java 代码 · 共 1,912 行 · 第 1/4 页

JAVA
1,912
字号
/* * Copyright (C) 1999-2001 David Brownell *  * This file is part of GNU JAXP, a library. * * GNU JAXP is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. *  * GNU JAXP is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the * GNU General Public License for more details. *  * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA * * As a special exception, if you link this library with other files to * produce an executable, this library does not by itself cause the * resulting executable to be covered by the GNU General Public License. * This exception does not however invalidate any other reasons why the * executable file might be covered by the GNU General Public License.  */package gnu.xml.pipeline;import java.io.*;import java.util.EmptyStackException;import java.util.Enumeration;import java.util.Hashtable;import java.util.Stack;import java.util.StringTokenizer;import java.util.Vector;import org.xml.sax.*;import org.xml.sax.ext.*;import org.xml.sax.helpers.XMLReaderFactory;/** * This class checks SAX2 events to report validity errors; it works as * both a filter and a terminus on an event pipeline.  It relies on the * producer of SAX events to:  </p> <ol> * *	<li> Conform to the specification of a non-validating XML parser that *	reads all external entities, reported using SAX2 events. </li> * *	<li> Report ignorable whitespace as such (through the ContentHandler *	interface).  This is, strictly speaking, optional for nonvalidating *	XML processors.  </li> * *	<li> Make SAX2 DeclHandler callbacks, with default *	attribute values already normalized (and without "&lt;").</li> * *	<li> Make SAX2 LexicalHandler startDTD() and endDTD () *	callbacks. </li> * *	<li> Act as if the <em>(URI)/namespace-prefixes</em> property were *	set to true, by providing XML 1.0 names and all <code>xmlns*</code> *	attributes (rather than omitting either or both). </li> * *	</ol> * * <p> At this writing, the major SAX2 parsers (such as &AElig;lfred2, * Crimson, and Xerces) meet these requirements, and this validation * module is used by the optional &AElig;lfred2 validation support. * </p> * * <p> Note that because this is a layered validator, it has to duplicate some * work that the parser is doing; there are also other cost to layering. * However, <em>because of layering it doesn't need a parser</em> in order * to work! You can use it with anything that generates SAX events, such * as an application component that wants to detect invalid content in * a changed area without validating an entire document, or which wants to * ensure that it doesn't write invalid data to a communications partner.</p> * * <p> Also, note that because this is a layered validator, the line numbers * reported for some errors may seem strange.  For example, if an element does * not permit character content, the validator * will use the locator provided to it. * That might reflect the last character of a <em>characters</em> event * callback, rather than the first non-whitespace character. </p> * * <hr /> * * <!-- * <p> Of interest is the fact that unlike most currently known XML validators, * this one can report some cases of non-determinism in element content models. * It is a compile-time option, enabled by default.  This will only report * such XML errors if they relate to content actually appearing in a document; * content models aren't aggressively scanned for non-deterministic structure. * Documents which trigger such non-deterministic transitions may be handled * differently by different validating parsers, without losing conformance * to the XML specification. </p> * --> * * <p> Current limitations of the validation performed are in roughly three * categories.  </p> * * <p> The first category represents constraints which demand violations * of software layering:  exposing lexical details, one of the first things * that <em>application</em> programming interfaces (APIs) hide.  These * invariably relate to XML entity handling, and to historical oddities * of the XML validation semantics.  Curiously, * recent (Autumn 1999) conformance testing showed that these constraints are * among those handled worst by existing XML validating parsers.  Arguments * have been made that each of these VCs should be turned into WFCs (most * of them) or discarded (popular for the standalone declaration); in short, * that these are bugs in the XML specification (not all via SGML): </p><ul> * *	<li> The <em>Proper Declaration/PE Nesting</em> and *	<em>Proper Group/PE Nesting</em> VCs can't be tested because they *	require access to particularly low level lexical level information. *	In essence, the reason XML isn't a simple thing to parse is that *	it's not a context free grammar, and these constraints elevate that *	SGML-derived context sensitivity to the level of a semantic rule. * *	<li> The <em>Standalone Document Declaration</em> VC can't be *	tested.  This is for two reasons.  First, this flag isn't made *	available through SAX2.  Second, it also requires breaking that *	lexical layering boundary.  (If you ever wondered why classes *	in compiler construction or language design barely mention the *	existence of context-sensitive grammars, it's because of messy *	issues like these.) * *	<li> The <em>Entity Declared</em> VC can't be tested, because it *	also requires breaking that lexical layering boundary!  There's also *	another issue: the VC wording (and seemingly intent) is ambiguous. *	(This is still true in the "Second edition" XML spec.) *	Since there is a WFC of the same name, everyone's life would be *	easier if references to undeclared parsed entities were always well *	formedness errors, regardless of whether they're parameter entities *	or not.  (Note that nonvalidating parsers are not required *	to report all such well formedness errors if they don't read external *	parameter entities, although currently most XML parsers read them *	in an attempt to avoid problems from inconsistent parser behavior.) * *	</ul> * * <p> The second category of limitations on this validation represent * constraints associated with information that is not guaranteed to be * available (or in one case, <em>is guaranteed not to be available</em>, * through the SAX2 API: </p><ul> * *	<li> The <em>Unique Element Type Declaration</em> VC may not be *	reportable, if the underlying parser happens not to expose *	multiple declarations.   (&AElig;lfred2 reports these validity *	errors directly.)</li> * *	<li> Similarly, the <em>Unique Notation Name</em> VC, added in the *	14-January-2000 XML spec errata to restrict typing models used by *	elements, may not be reportable.  (&AElig;lfred reports these *	validity errors directly.) </li> * *	</ul> * * <p> A third category relates to ease of implementation.  (Think of this * as "bugs".)  The most notable issue here is character handling.  Rather * than attempting to implement the voluminous character tables in the XML * specification (Appendix B), Unicode rules are used directly from * the java.lang.Character class.  Recent JVMs have begun to diverge from * the original specification for that class (Unicode 2.0), meaning that * different JVMs may handle that aspect of conformance differently. * </p> * * <p> Note that for some of the validity errors that SAX2 does not * expose, a nonvalidating parser is permitted (by the XML specification) * to report validity errors.  When used with a parser that does so for * the validity constraints mentioned above (or any other SAX2 event * stream producer that does the same thing), overall conformance is * substantially improved. * * @see gnu.xml.aelfred2.SAXDriver * @see gnu.xml.aelfred2.XmlReader * * @author David Brownell */public final class ValidationConsumer extends EventFilter{    // report error if we happen to notice a non-deterministic choice?    // we won't report buggy content models; just buggy instances    private static final boolean	warnNonDeterministic = false;    // for tracking active content models    private String		rootName;    private Stack		contentStack = new Stack ();    // flags for "saved DTD" processing    private boolean		disableDeclarations;    private boolean		disableReset;    //    // most VCs get tested when we see element start tags.  the per-element    // info (including attributes) recorded here duplicates that found inside    // many nonvalidating parsers, hence dual lookups etc ... that's why a    // layered validator isn't going to be as fast as a non-layered one.    //    // key = element name; value = ElementInfo    private Hashtable		elements = new Hashtable ();    // some VCs relate to ID/IDREF/IDREFS attributes    // key = id; value = boolean true (defd) or false (refd)    private Hashtable		ids = new Hashtable ();    // we just record declared notation and unparsed entity names.    // the implementation here is simple/slow; these features    // are seldom used, one hopes they'll wither away soon    private Vector		notations = new Vector (5, 5);    private Vector		nDeferred = new Vector (5, 5);    private Vector		unparsed = new Vector (5, 5);    private Vector		uDeferred = new Vector (5, 5);		// note: DocBk 3.1.7 XML defines over 2 dozen notations,	// used when defining unparsed entities for graphics	// (and maybe in other places)        /**     * Creates a pipeline terminus which consumes all events passed to     * it; this will report validity errors as if they were fatal errors,     * unless an error handler is assigned.     *     * @see #setErrorHandler     */	// constructor used by PipelineFactory	    // ... and want one taking system ID of an external subset    public ValidationConsumer ()    {	this (null);    }    /**     * Creates a pipeline filter which reports validity errors and then     * passes events on to the next consumer if they were not fatal.     *     * @see #setErrorHandler     */	// constructor used by PipelineFactory	    // ... and want one taking system ID of an external subset	    // (which won't send declaration events)    public ValidationConsumer (EventConsumer next)    {	super (next);	setContentHandler (this);	setDTDHandler (this);	try { setProperty (DECL_HANDLER, this); }	catch (Exception e) { /* "can't happen" */ }	try { setProperty (LEXICAL_HANDLER, this); }	catch (Exception e) { /* "can't happen" */ }    }        private static final String	fakeRootName	= ":Nobody:in:their_Right.Mind_would:use:this-name:1x:";        /**     * Creates a validation consumer which is preloaded with the DTD provided.     * It does this by constructing a document with that DTD, then parsing     * that document and recording its DTD declarations.  Then it arranges     * not to modify that information.     *     * <p> The resulting validation consumer will only validate against     * the specified DTD, regardless of whether some other DTD is found     * in a document being parsed.     *     * @param rootName The name of the required root element; if this is     *	null, any root element name will be accepted.     * @param publicId If non-null and there is a non-null systemId, this     *	identifier provides an alternate access identifier for the DTD's     *	external subset.     * @param systemId If non-null, this is a URI (normally URL) that     *	may be used to access the DTD's external subset.     * @param internalSubset If non-null, holds literal markup declarations     *	comprising the DTD's internal subset.     * @param resolver If non-null, this will be provided to the parser for     *	use when resolving parameter entities (including any external subset).     * @param resolver If non-null, this will be provided to the parser for     *	use when resolving parameter entities (including any external subset).     * @param minimalElement If non-null, a minimal valid document.     *     * @exception SAXNotSupportedException If the default SAX parser does     *	not support the standard lexical or declaration handlers.     * @exception SAXParseException If the specified DTD has either     *	well-formedness or validity errors     * @exception IOException If the specified DTD can't be read for     *	some reason     */    public ValidationConsumer (	String		rootName,	String		publicId,	String		systemId,	String		internalSubset,	EntityResolver	resolver,	String		minimalDocument    ) throws SAXException, IOException    {	this (null);	disableReset = true;	if (rootName == null)	    rootName = fakeRootName;	//	// Synthesize document with that DTD; is it possible to do	// better for the declaration of the root element?	//	// NOTE:  can't use SAX2 to write internal subsets.	//	StringWriter	writer = new StringWriter ();	writer.write ("<!DOCTYPE ");	writer.write (rootName);	if (systemId != null) {	    writer.write ("\n  ");	    if (publicId != null) {		writer.write ("PUBLIC '");		writer.write (publicId);		writer.write ("'\n\t'");	    } else		writer.write ("SYSTEM '");	    writer.write (systemId);	    writer.write ("'");	}	writer.write (" [ ");	if (rootName == fakeRootName) {	    writer.write ("\n<!ELEMENT ");	    writer.write (rootName);	    writer.write (" EMPTY>");	}	if (internalSubset != null)	    writer.write (internalSubset);	writer.write ("\n ]>");	if (minimalDocument != null) {	    writer.write ("\n");	    writer.write (minimalDocument);	    writer.write ("\n");	} else {	    writer.write (" <");	    writer.write (rootName);	    writer.write ("/>\n");	}	minimalDocument = writer.toString ();	//	// OK, load it	//	XMLReader	producer;	producer = XMLReaderFactory.createXMLReader ();	bind (producer, this);	if (resolver != null)	    producer.setEntityResolver (resolver);	InputSource	in;		in = new InputSource (new StringReader (minimalDocument));	producer.parse (in);	disableDeclarations = true;	if (rootName == fakeRootName)	    this.rootName = null;    }    private void resetState ()    {	if (!disableReset) {	    rootName = null;	    contentStack.removeAllElements ();	    elements.clear ();	    ids.clear ();	    notations.removeAllElements ();	    nDeferred.removeAllElements ();	    unparsed.removeAllElements ();	    uDeferred.removeAllElements ();	}    }    private void warning (String description)    throws SAXException    {	ErrorHandler		errHandler = getErrorHandler ();	Locator			locator = getDocumentLocator ();	SAXParseException	err;	if (errHandler == null)	    return;	if (locator == null)	    err = new SAXParseException (description, null, null, -1, -1);	else	    err = new SAXParseException (description, locator);	errHandler.warning (err);    }    // package private (for ChildrenRecognizer)    private void error (String description)    throws SAXException    {	ErrorHandler		errHandler = getErrorHandler ();	Locator			locator = getDocumentLocator ();	SAXParseException	err;	if (locator == null)	    err = new SAXParseException (description, null, null, -1, -1);	else	    err = new SAXParseException (description, locator);	if (errHandler != null)	    errHandler.error (err);	else	// else we always treat it as fatal!	    throw err;    }    private void fatalError (String description)    throws SAXException    {	ErrorHandler		errHandler = getErrorHandler ();	Locator			locator = getDocumentLocator ();	SAXParseException	err;	if (locator != null)	    err = new SAXParseException (description, locator);	else	    err = new SAXParseException (description, null, null, -1, -1);	if (errHandler != null)	    errHandler.fatalError (err);	// we always treat this as fatal, regardless of the handler	throw err;    }    private static boolean isExtender (char c)    {	// [88] Extender ::= ...	return c == 0x00b7 || c == 0x02d0 || c == 0x02d1 || c == 0x0387	       || c == 0x0640 || c == 0x0e46 || c == 0x0ec6 || c == 0x3005	       || (c >= 0x3031 && c <= 0x3035)	       || (c >= 0x309d && c <= 0x309e)	       || (c >= 0x30fc && c <= 0x30fe);    }    // use augmented Unicode rules, not full XML rules    private boolean isName (String name, String context, String id)    throws SAXException    {	char	buf [] = name.toCharArray ();	boolean	pass = true;	if (!Character.isUnicodeIdentifierStart (buf [0])		&& ":_".indexOf (buf [0]) == -1)	    pass = false;	else {	    int max = buf.length;	    for (int i = 1; pass && i < max; i++) {		char c = buf [i];		if (!Character.isUnicodeIdentifierPart (c)			&& ":-_.".indexOf (c) == -1			&& !isExtender (c))		    pass = false;	    }	}	if (!pass)	    error ("In " + context + " for " + id		+ ", '" + name + "' is not a name");	return pass;	// true == OK    }

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?