⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 xmlreader.java

📁 本程序用于对页面信息进行提取并分析
💻 JAVA
📖 第 1 页 / 共 2 页
字号:
// HTMLParser Library $Name: v1_6_20060319 $ - A java-based parser for HTML// http://sourceforge.org/projects/htmlparser// Copyright (C) 2004 Derrick Oswald//// Revision Control Information//// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax/XMLReader.java,v $// $Author: derrickoswald $// $Date: 2005/05/13 10:44:15 $// $Revision: 1.3 $//// This library is free software; you can redistribute it and/or// modify it under the terms of the GNU Lesser General Public// License as published by the Free Software Foundation; either// version 2.1 of the License, or (at your option) any later version.//// This library is distributed in the hope that it will be useful,// but WITHOUT ANY WARRANTY; without even the implied warranty of// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU// Lesser General Public License for more details.//// You should have received a copy of the GNU Lesser General Public// License along with this library; if not, write to the Free Software// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA//package org.htmlparser.sax;import java.io.IOException;import org.htmlparser.lexer.Lexer;import org.htmlparser.lexer.Page;import org.xml.sax.ContentHandler;import org.xml.sax.DTDHandler;import org.xml.sax.EntityResolver;import org.xml.sax.ErrorHandler;import org.xml.sax.InputSource;import org.xml.sax.SAXException;import org.xml.sax.SAXNotRecognizedException;import org.xml.sax.SAXNotSupportedException;import org.xml.sax.SAXParseException;import org.xml.sax.helpers.NamespaceSupport;import org.htmlparser.Node;import org.htmlparser.Parser;import org.htmlparser.Remark;import org.htmlparser.Tag;import org.htmlparser.Text;import org.htmlparser.util.DefaultParserFeedback;import org.htmlparser.util.NodeIterator;import org.htmlparser.util.NodeList;import org.htmlparser.util.ParserException;import org.htmlparser.util.ParserFeedback;/** * SAX parser. * Generates callbacks on the {@link ContentHandler} based on encountered nodes. * <br><em>Preliminary</em>. * <pre> * org.xml.sax.XMLReader reader = org.xml.sax.helpers.XMLReaderFactory.createXMLReader ("org.htmlparser.sax.XMLReader"); * org.xml.sax.ContentHandler content = new MyContentHandler (); * reader.setContentHandler (content); * org.xml.sax.ErrorHandler errors = new MyErrorHandler (); * reader.setErrorHandler (errors); * reader.parse ("http://cbc.ca"); * </pre> */public class XMLReader    implements        org.xml.sax.XMLReader{    /**     * Determines if namespace handling is on.     * All XMLReaders are required to recognize the feature names:     * <ul>     * <li><code>http://xml.org/sax/features/namespaces</code> -     *     a value of "true" indicates namespace URIs and unprefixed     *     local names for element and attribute names will be available</li>     * <li><code>http://xml.org/sax/features/namespace-prefixes</code> -     *     a value of "true" indicates that XML qualified names (with     *     prefixes) and attributes (including xmlns* attributes) will     *     be available.     * </ul>     */    protected boolean mNameSpaces; // namespaces    /**     * Determines if namespace prefix handling is on.     * @see #mNameSpaces     */    protected boolean mNameSpacePrefixes; // namespace-prefixes    /**     * <em> not implemented</em>     */    protected EntityResolver mEntityResolver;    /**     * <em> not implemented</em>     */    protected DTDHandler mDTDHandler;    /**     * The content callback object.     */    protected ContentHandler mContentHandler;    /**     * The error handler object.     */    protected ErrorHandler mErrorHandler;    /**     * The underlying DOM parser.     */    protected Parser mParser;    /**     * Namspace utility object.     */    protected NamespaceSupport mSupport;    /**     * Qualified name parts.     */    protected String mParts[];    /**     * Create an SAX parser.     */    public XMLReader ()    {        mNameSpaces = true;        mNameSpacePrefixes = false;                mEntityResolver = null;        mDTDHandler = null;        mContentHandler = null;        mErrorHandler = null;        mSupport = new NamespaceSupport ();        mSupport.pushContext ();        mSupport.declarePrefix ("", "http://www.w3.org/TR/REC-html40");        // todo:        //    xmlns:html='http://www.w3.org/TR/REC-html40'        // or xmlns:html='http://www.w3.org/1999/xhtml'        mParts = new String[3];    }        ////////////////////////////////////////////////////////////////////    // Configuration.    ////////////////////////////////////////////////////////////////////    /**     * Look up the value of a feature flag.     *     * <p>The feature name is any fully-qualified URI.  It is     * possible for an XMLReader to recognize a feature name but     * temporarily be unable to return its value.     * Some feature values may be available only in specific     * contexts, such as before, during, or after a parse.     * Also, some feature values may not be programmatically accessible.     * (In the case of an adapter for SAX1 {@link Parser}, there is no     * implementation-independent way to expose whether the underlying     * parser is performing validation, expanding external entities,     * and so forth.) </p>     *     * <p>All XMLReaders are required to recognize the     * http://xml.org/sax/features/namespaces and the     * http://xml.org/sax/features/namespace-prefixes feature names.</p>     *     * <p>Typical usage is something like this:</p>     *     * <pre>     * XMLReader r = new MySAXDriver();     *     *                         // try to activate validation     * try {     *   r.setFeature("http://xml.org/sax/features/validation", true);     * } catch (SAXException e) {     *   System.err.println("Cannot activate validation.");      * }     *     *                         // register event handlers     * r.setContentHandler(new MyContentHandler());     * r.setErrorHandler(new MyErrorHandler());     *     *                         // parse the first document     * try {     *   r.parse("http://www.foo.com/mydoc.xml");     * } catch (IOException e) {     *   System.err.println("I/O exception reading XML document");     * } catch (SAXException e) {     *   System.err.println("XML exception reading document.");     * }     * </pre>     *     * <p>Implementors are free (and encouraged) to invent their own features,     * using names built on their own URIs.</p>     *     * @param name The feature name, which is a fully-qualified URI.     * @return The current value of the feature (true or false).     * @exception org.xml.sax.SAXNotRecognizedException If the feature     *            value can't be assigned or retrieved.     * @exception org.xml.sax.SAXNotSupportedException When the     *            XMLReader recognizes the feature name but      *            cannot determine its value at this time.     * @see #setFeature     */    public boolean getFeature (String name)        throws SAXNotRecognizedException, SAXNotSupportedException    {        boolean ret;        if (name.equals ("http://xml.org/sax/features/namespaces"))            ret = mNameSpaces;        else if (name.equals ("http://xml.org/sax/features/namespace-prefixes"))            ret = mNameSpacePrefixes;        else            throw new SAXNotSupportedException (name + " not yet understood");        return (ret);    }    /**     * Set the value of a feature flag.     *     * <p>The feature name is any fully-qualified URI.  It is     * possible for an XMLReader to expose a feature value but     * to be unable to change the current value.     * Some feature values may be immutable or mutable only      * in specific contexts, such as before, during, or after      * a parse.</p>     *     * <p>All XMLReaders are required to support setting     * http://xml.org/sax/features/namespaces to true and     * http://xml.org/sax/features/namespace-prefixes to false.</p>     *     * @param name The feature name, which is a fully-qualified URI.     * @param value The requested value of the feature (true or false).     * @exception org.xml.sax.SAXNotRecognizedException If the feature     *            value can't be assigned or retrieved.     * @exception org.xml.sax.SAXNotSupportedException When the     *            XMLReader recognizes the feature name but      *            cannot set the requested value.     * @see #getFeature     */    public void setFeature (String name, boolean value)	throws SAXNotRecognizedException, SAXNotSupportedException    {        if (name.equals ("http://xml.org/sax/features/namespaces"))            mNameSpaces = value;        else if (name.equals ("http://xml.org/sax/features/namespace-prefixes"))            mNameSpacePrefixes = value;        else            throw new SAXNotSupportedException (name + " not yet understood");    }    /**     * Look up the value of a property.     *     * <p>The property name is any fully-qualified URI.  It is     * possible for an XMLReader to recognize a property name but     * temporarily be unable to return its value.     * Some property values may be available only in specific     * contexts, such as before, during, or after a parse.</p>     *     * <p>XMLReaders are not required to recognize any specific     * property names, though an initial core set is documented for     * SAX2.</p>     *     * <p>Implementors are free (and encouraged) to invent their own properties,     * using names built on their own URIs.</p>     *     * @param name The property name, which is a fully-qualified URI.     * @return The current value of the property.     * @exception org.xml.sax.SAXNotRecognizedException If the property     *            value can't be assigned or retrieved.     * @exception org.xml.sax.SAXNotSupportedException When the     *            XMLReader recognizes the property name but      *            cannot determine its value at this time.     * @see #setProperty     */    public Object getProperty (String name)	throws SAXNotRecognizedException, SAXNotSupportedException    {        throw new SAXNotSupportedException (name + " not yet understood");    }    /**     * Set the value of a property.     *     * <p>The property name is any fully-qualified URI.  It is     * possible for an XMLReader to recognize a property name but     * to be unable to change the current value.     * Some property values may be immutable or mutable only      * in specific contexts, such as before, during, or after      * a parse.</p>     *     * <p>XMLReaders are not required to recognize setting     * any specific property names, though a core set is defined by      * SAX2.</p>     *     * <p>This method is also the standard mechanism for setting     * extended handlers.</p>     *     * @param name The property name, which is a fully-qualified URI.     * @param value The requested value for the property.     * @exception org.xml.sax.SAXNotRecognizedException If the property     *            value can't be assigned or retrieved.     * @exception org.xml.sax.SAXNotSupportedException When the     *            XMLReader recognizes the property name but      *            cannot set the requested value.     */    public void setProperty (String name, Object value)	throws SAXNotRecognizedException, SAXNotSupportedException    {        throw new SAXNotSupportedException (name + " not yet understood");    }    ////////////////////////////////////////////////////////////////////    // Event handlers.    ////////////////////////////////////////////////////////////////////    /**     * Allow an application to register an entity resolver.     *     * <p>If the application does not register an entity resolver,     * the XMLReader will perform its own default resolution.</p>     *     * <p>Applications may register a new or different resolver in the     * middle of a parse, and the SAX parser must begin using the new     * resolver immediately.</p>     *     * @param resolver The entity resolver.     * @see #getEntityResolver     */    public void setEntityResolver (EntityResolver resolver)    {        mEntityResolver = resolver;    }

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -