⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 tagtype.java

📁 HTML解析器是一个Java库
💻 JAVA
📖 第 1 页 / 共 3 页
字号:
// Jericho HTML Parser - Java based library for analysing and manipulating HTML
// Version 3.0
// Copyright (C) 2007 Martin Jericho
// http://jerichohtml.sourceforge.net/
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of either one of the following licences:
//
// 1. The Eclipse Public License (EPL) version 1.0,
// included in this distribution in the file licence-epl-1.0.html
// or available at http://www.eclipse.org/legal/epl-v10.html
//
// 2. The GNU Lesser General Public License (LGPL) version 2.1 or later,
// included in this distribution in the file licence-lgpl-2.1.txt
// or available at http://www.gnu.org/licenses/lgpl.txt
//
// This library is distributed on an "AS IS" basis,
// WITHOUT WARRANTY OF ANY KIND, either express or implied.
// See the individual licence texts for more details.

package net.htmlparser.jericho;

import java.util.*;

/**
 * Defines the syntax for a tag type that can be recognised by the parser.
 * <p>
 * This class is the root abstract class common to all tag types, and contains methods to {@linkplain #register() register}
 * and {@linkplain #deregister() deregister} tag types as well as various methods to aid in their implementation.
 * <p>
 * Every tag type is represented by a singleton instance of a class that must be a subclass of either 
 * {@link StartTagType} or {@link EndTagType}.  These two abstract classes, the only direct descendants of this class,
 * represent the two major classifications under which every tag type exists.
 * <p>
 * Because all <code>TagType</code> instaces must be singletons, the '<code>==</code>' operator can be used to test for a particular tag type
 * instead of the <code>equals(Object)</code> method.
 * <p>
 * The term <i><a name="Predefined">predefined tag type</a></i> refers to any of the tag types defined in this library,
 * including both <a href="#Standard">standard</a> and <a href="#Extended">extended</a> tag types.
 * <p>
 * The term <i><a name="Standard">standard tag type</a></i> refers to any of the tag types represented by instances
 * in static fields of the {@link StartTagType} and {@link EndTagType} subclasses.
 * Standard tag types are registered by default, and define the tags most commonly found in HTML documents.
 * <p>
 * The term <i><a name="Extended">extended tag type</a></i> refers to any <a href="#Predefined">predefined</a> tag type
 * that is not a <a href="#Standard">standard</a> tag type.
 * The {@link PHPTagTypes} and {@link MasonTagTypes} classes contain extended tag types related to their respective server platforms.
 * The tag types defined within them must be registered by the user before they are recognised by the parser.
 * <p>
 * The term <i><a name="Custom">custom tag type</a></i> refers to any user-defined tag type, or any tag type that is
 * not a <a href="#Predefined">predefined</a> tag type.
 * <p>
 * The tag recognition process of the parser gives each tag type a <i><a name="Precedence">precedence</a></i> level,
 * which is primarily determined by the length of its {@linkplain #getStartDelimiter() start delimiter}.
 * A tag type with a more specific start delimiter is chosen in preference to one with a less specific start delimiter,
 * assuming they both share the same prefix.  If two tag types have exactly the same start delimiter, the one which was
 * {@linkplain #register() registered} later has the higher precedence.
 * <p>
 * The two special tag types {@link StartTagType#UNREGISTERED} and {@link EndTagType#UNREGISTERED} represent
 * tags that do not match the syntax of any other tag type.  They have the lowest <a href="#Precedence">precedence</a> 
 * of all the tag types.  The {@link Tag#isUnregistered()} method provides a detailed explanation of unregistered tags.
 * <p>
 * See the documentation of the <a href="Tag.html#ParsingProcess">tag parsing process</a> for more information
 * on how each tag is identified by the parser.
 * <p>
 * <a name="Normal"></a>Note that the standard {@linkplain HTMLElementName HTML element names} do not represent different
 * tag <i>types</i>.  All standard HTML tags have a tag type of {@link StartTagType#NORMAL} or {@link EndTagType#NORMAL},
 * and are also referred to as <i>normal</i> tags.
 * <p>
 * Apart from the <a href="#RegistrationRelated">registration related</a> methods, all of the methods in this class and its
 * subclasses relate to the implementation of <a href="#Custom">custom tag types</a> and are not relevant to the majority of users 
 * who just use the <a href="#Predefined">predefined tag types</a>.
 * <p>
 * For perfomance reasons, this library only allows tag types that {@linkplain #getStartDelimiter() start}
 * with a '<code>&lt;</code>' character.
 * The character following this defines the immediate subclass of the tag type.
 * An {@link EndTagType} always has a slash ('<code>/</code>') as the second character, while a {@link StartTagType}
 * has any character other than a slash as the second character.
 * This definition means that tag types which are not intuitively classified as either start tag types or end tag types
 * (such as an HTML {@linkplain StartTagType#COMMENT comment}) are mostly classified as start tag types.
 * <p>
 * Every method in this and the {@link StartTagType} and {@link EndTagType} abstract classes can be categorised
 * as one of the following:
 * <dl>
 *  <dt><a name="Property">Properties:</a>
 *   <dd>Simple properties (marked final) that were either specified as parameters
 *    during construction or are derived from those parameters.
 *  <dt><a name="AbstractImplementation">Abstract implementation methods:</a>
 *   <dd>Methods that must be implemented in a subclass.
 *  <dt><a name="DefaultImplementation">Default implementation methods:</a>
 *   <dd>Methods (not marked final) that implement common behaviour, but may be overridden in a subclass.
 *  <dt><a name="ImplementationAssistance">Implementation assistance methods:</a>
 *   <dd>Protected methods that provide low-level functionality and are only of use within other implementation methods.
 *  <dt><a name="RegistrationRelated">Registration related methods:</a>
 *   <dd>Utility methods (marked final) relating to the {@linkplain #register() registration} of tag type instances.
 * </dl>
 */
public abstract class TagType {
	private final String description;
	private final String startDelimiter;
	private final char[] startDelimiterCharArray;
	private final String closingDelimiter;
	private final boolean isServerTag;
	private final String namePrefix;
	final String startDelimiterPrefix;

	TagType(final String description, final String startDelimiter, final String closingDelimiter, final boolean isServerTag, final String startDelimiterPrefix) {
		// startDelimiterPrefix is either "<" or "</"
		this.description=description;
		this.startDelimiter=startDelimiter;
		startDelimiterCharArray=startDelimiter.toCharArray();
		this.closingDelimiter=closingDelimiter;
		this.isServerTag=isServerTag;
		this.namePrefix=startDelimiter.substring(startDelimiterPrefix.length());
		this.startDelimiterPrefix=startDelimiterPrefix;
	}

	/**
	 * Registers this tag type for recognition by the parser.
	 * <br />(<a href="TagType.html#RegistrationRelated">registration related</a> method)
	 * <p>
	 * The order of registration affects the <a href="TagType.html#Precedence">precedence</a> of the tag type when a potential tag is being parsed.
	 *
	 * @see #deregister()
	 */
	public final void register() {
		TagTypeRegister.add(this);
	}
	
	/**
	 * Deregisters this tag type.
	 * <br />(<a href="TagType.html#RegistrationRelated">registration related</a> method)
	 *
	 * @see #register()
	 */
	public final void deregister() {
		TagTypeRegister.remove(this);
	}

	/**
	 * Returns a list of all the currently registered tag types in order of lowest to highest <a href="TagType.html#Precedence">precedence</a>.
	 * <br />(<a href="TagType.html#RegistrationRelated">registration related</a> method)
	 * @return a list of all the currently registered tag types in order of lowest to highest <a href="TagType.html#Precedence">precedence</a>.
	 */
	public static final List<TagType> getRegisteredTagTypes() {
		return TagTypeRegister.getList();
	}

	/**
	 * Returns a description of this tag type useful for debugging purposes. 
	 * <br />(<a href="TagType.html#Property">property</a> method)
	 *
	 * @return a description of this tag type useful for debugging purposes.
	 */
	public final String getDescription() {
		return description;
	}

	/**
	 * Returns the character sequence that marks the start of the tag.
	 * <br />(<a href="TagType.html#Property">property</a> method)
	 * <p>
	 * The character sequence must be all in lower case.
	 * <p>
	 * The first character in this property <b>must</b> be '<code>&lt;</code>'.
	 * This is a deliberate limitation of the system which is necessary to retain reasonable performance.
	 * <p>
	 * The second character in this property must be '<code>/</code>' if the implementing class is an {@link EndTagType}.
	 * It must <b>not</b> be '<code>/</code>' if the implementing class is a {@link StartTagType}.
	 * <p>
	 * <dl>
	 *  <dt>Standard Tag Type Values:</dt>
	 *   <dd>
	 *    <table class="bordered" style="margin: 15px" cellspacing="0">
	 *     <tr><th>Tag Type<th>Start Delimiter
	 *     <tr><td>{@link StartTagType#UNREGISTERED}<td><code>&lt;</code>
	 *     <tr><td>{@link StartTagType#NORMAL}<td><code>&lt;</code>
	 *     <tr><td>{@link StartTagType#COMMENT}<td><code>&lt;!--</code>
	 *     <tr><td>{@link StartTagType#XML_DECLARATION}<td><code>&lt;?xml</code>
	 *     <tr><td>{@link StartTagType#XML_PROCESSING_INSTRUCTION}<td><code>&lt;?</code>
	 *     <tr><td>{@link StartTagType#DOCTYPE_DECLARATION}<td><code>&lt;!doctype</code>
	 *     <tr><td>{@link StartTagType#MARKUP_DECLARATION}<td><code>&lt;!</code>
	 *     <tr><td>{@link StartTagType#CDATA_SECTION}<td><code>&lt;![cdata[</code>
	 *     <tr><td>{@link StartTagType#SERVER_COMMON}<td><code>&lt;%</code>
	 *     <tr><td>{@link EndTagType#UNREGISTERED}<td><code>&lt;/</code>
	 *     <tr><td>{@link EndTagType#NORMAL}<td><code>&lt;/</code>
	 *    </table>
	 * </dl>
	 * <dl>
	 *  <dt>Extended Tag Type Values:</dt>
	 *   <dd>
	 *    <table class="bordered" style="margin: 15px" cellspacing="0">
	 *     <tr><th>Tag Type<th>Start Delimiter
	 *     <tr><td>{@link MicrosoftTagTypes#DOWNLEVEL_REVEALED_CONDITIONAL_COMMENT}<td><code>&lt;![</code>
	 *     <tr><td>{@link PHPTagTypes#PHP_SCRIPT}<td><code>&lt;script</code>
	 *     <tr><td>{@link PHPTagTypes#PHP_SHORT}<td><code>&lt;?</code>
	 *     <tr><td>{@link PHPTagTypes#PHP_STANDARD}<td><code>&lt;?php</code>
	 *     <tr><td>{@link MasonTagTypes#MASON_COMPONENT_CALL}<td><code>&lt;&amp;</code>
	 *     <tr><td>{@link MasonTagTypes#MASON_COMPONENT_CALLED_WITH_CONTENT}<td><code>&lt;&amp;|</code>
	 *     <tr><td>{@link MasonTagTypes#MASON_COMPONENT_CALLED_WITH_CONTENT_END}<td><code>&lt;/&amp;</code>
	 *     <tr><td>{@link MasonTagTypes#MASON_NAMED_BLOCK}<td><code>&lt;%</code>
	 *     <tr><td>{@link MasonTagTypes#MASON_NAMED_BLOCK_END}<td><code>&lt;/%</code>
	 *    </table>
	 * </dl>
	 *
	 * @return the character sequence that marks the start of the tag.
	 */
	public final String getStartDelimiter() {
		return startDelimiter;
	}

	/**
	 * Returns the character sequence that marks the end of the tag.
	 * <br />(<a href="TagType.html#Property">property</a> method)
	 * <p>
	 * The character sequence must be all in lower case.
	 * <p>
	 * In a {@link StartTag} of a {@linkplain StartTagType type} that {@linkplain StartTagType#hasAttributes() has attributes},
	 * characters appearing inside a quoted attribute value are ignored when determining the location of the closing delimiter.
	 * <p>
	 * Note that the optional '<code>/</code>' character preceding the closing '<code>&gt;</code>' in an
	 * {@linkplain StartTag#isEmptyElementTag() empty-element tag} is not considered part of the end delimiter.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -