📄 tagtype.java
字号:
// Jericho HTML Parser - Java based library for analysing and manipulating HTML
// Version 3.0
// Copyright (C) 2007 Martin Jericho
// http://jerichohtml.sourceforge.net/
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of either one of the following licences:
//
// 1. The Eclipse Public License (EPL) version 1.0,
// included in this distribution in the file licence-epl-1.0.html
// or available at http://www.eclipse.org/legal/epl-v10.html
//
// 2. The GNU Lesser General Public License (LGPL) version 2.1 or later,
// included in this distribution in the file licence-lgpl-2.1.txt
// or available at http://www.gnu.org/licenses/lgpl.txt
//
// This library is distributed on an "AS IS" basis,
// WITHOUT WARRANTY OF ANY KIND, either express or implied.
// See the individual licence texts for more details.
package net.htmlparser.jericho;
import java.util.*;
/**
* Defines the syntax for a tag type that can be recognised by the parser.
* <p>
* This class is the root abstract class common to all tag types, and contains methods to {@linkplain #register() register}
* and {@linkplain #deregister() deregister} tag types as well as various methods to aid in their implementation.
* <p>
* Every tag type is represented by a singleton instance of a class that must be a subclass of either
* {@link StartTagType} or {@link EndTagType}. These two abstract classes, the only direct descendants of this class,
* represent the two major classifications under which every tag type exists.
* <p>
* Because all <code>TagType</code> instaces must be singletons, the '<code>==</code>' operator can be used to test for a particular tag type
* instead of the <code>equals(Object)</code> method.
* <p>
* The term <i><a name="Predefined">predefined tag type</a></i> refers to any of the tag types defined in this library,
* including both <a href="#Standard">standard</a> and <a href="#Extended">extended</a> tag types.
* <p>
* The term <i><a name="Standard">standard tag type</a></i> refers to any of the tag types represented by instances
* in static fields of the {@link StartTagType} and {@link EndTagType} subclasses.
* Standard tag types are registered by default, and define the tags most commonly found in HTML documents.
* <p>
* The term <i><a name="Extended">extended tag type</a></i> refers to any <a href="#Predefined">predefined</a> tag type
* that is not a <a href="#Standard">standard</a> tag type.
* The {@link PHPTagTypes} and {@link MasonTagTypes} classes contain extended tag types related to their respective server platforms.
* The tag types defined within them must be registered by the user before they are recognised by the parser.
* <p>
* The term <i><a name="Custom">custom tag type</a></i> refers to any user-defined tag type, or any tag type that is
* not a <a href="#Predefined">predefined</a> tag type.
* <p>
* The tag recognition process of the parser gives each tag type a <i><a name="Precedence">precedence</a></i> level,
* which is primarily determined by the length of its {@linkplain #getStartDelimiter() start delimiter}.
* A tag type with a more specific start delimiter is chosen in preference to one with a less specific start delimiter,
* assuming they both share the same prefix. If two tag types have exactly the same start delimiter, the one which was
* {@linkplain #register() registered} later has the higher precedence.
* <p>
* The two special tag types {@link StartTagType#UNREGISTERED} and {@link EndTagType#UNREGISTERED} represent
* tags that do not match the syntax of any other tag type. They have the lowest <a href="#Precedence">precedence</a>
* of all the tag types. The {@link Tag#isUnregistered()} method provides a detailed explanation of unregistered tags.
* <p>
* See the documentation of the <a href="Tag.html#ParsingProcess">tag parsing process</a> for more information
* on how each tag is identified by the parser.
* <p>
* <a name="Normal"></a>Note that the standard {@linkplain HTMLElementName HTML element names} do not represent different
* tag <i>types</i>. All standard HTML tags have a tag type of {@link StartTagType#NORMAL} or {@link EndTagType#NORMAL},
* and are also referred to as <i>normal</i> tags.
* <p>
* Apart from the <a href="#RegistrationRelated">registration related</a> methods, all of the methods in this class and its
* subclasses relate to the implementation of <a href="#Custom">custom tag types</a> and are not relevant to the majority of users
* who just use the <a href="#Predefined">predefined tag types</a>.
* <p>
* For perfomance reasons, this library only allows tag types that {@linkplain #getStartDelimiter() start}
* with a '<code><</code>' character.
* The character following this defines the immediate subclass of the tag type.
* An {@link EndTagType} always has a slash ('<code>/</code>') as the second character, while a {@link StartTagType}
* has any character other than a slash as the second character.
* This definition means that tag types which are not intuitively classified as either start tag types or end tag types
* (such as an HTML {@linkplain StartTagType#COMMENT comment}) are mostly classified as start tag types.
* <p>
* Every method in this and the {@link StartTagType} and {@link EndTagType} abstract classes can be categorised
* as one of the following:
* <dl>
* <dt><a name="Property">Properties:</a>
* <dd>Simple properties (marked final) that were either specified as parameters
* during construction or are derived from those parameters.
* <dt><a name="AbstractImplementation">Abstract implementation methods:</a>
* <dd>Methods that must be implemented in a subclass.
* <dt><a name="DefaultImplementation">Default implementation methods:</a>
* <dd>Methods (not marked final) that implement common behaviour, but may be overridden in a subclass.
* <dt><a name="ImplementationAssistance">Implementation assistance methods:</a>
* <dd>Protected methods that provide low-level functionality and are only of use within other implementation methods.
* <dt><a name="RegistrationRelated">Registration related methods:</a>
* <dd>Utility methods (marked final) relating to the {@linkplain #register() registration} of tag type instances.
* </dl>
*/
public abstract class TagType {
private final String description;
private final String startDelimiter;
private final char[] startDelimiterCharArray;
private final String closingDelimiter;
private final boolean isServerTag;
private final String namePrefix;
final String startDelimiterPrefix;
TagType(final String description, final String startDelimiter, final String closingDelimiter, final boolean isServerTag, final String startDelimiterPrefix) {
// startDelimiterPrefix is either "<" or "</"
this.description=description;
this.startDelimiter=startDelimiter;
startDelimiterCharArray=startDelimiter.toCharArray();
this.closingDelimiter=closingDelimiter;
this.isServerTag=isServerTag;
this.namePrefix=startDelimiter.substring(startDelimiterPrefix.length());
this.startDelimiterPrefix=startDelimiterPrefix;
}
/**
* Registers this tag type for recognition by the parser.
* <br />(<a href="TagType.html#RegistrationRelated">registration related</a> method)
* <p>
* The order of registration affects the <a href="TagType.html#Precedence">precedence</a> of the tag type when a potential tag is being parsed.
*
* @see #deregister()
*/
public final void register() {
TagTypeRegister.add(this);
}
/**
* Deregisters this tag type.
* <br />(<a href="TagType.html#RegistrationRelated">registration related</a> method)
*
* @see #register()
*/
public final void deregister() {
TagTypeRegister.remove(this);
}
/**
* Returns a list of all the currently registered tag types in order of lowest to highest <a href="TagType.html#Precedence">precedence</a>.
* <br />(<a href="TagType.html#RegistrationRelated">registration related</a> method)
* @return a list of all the currently registered tag types in order of lowest to highest <a href="TagType.html#Precedence">precedence</a>.
*/
public static final List<TagType> getRegisteredTagTypes() {
return TagTypeRegister.getList();
}
/**
* Returns a description of this tag type useful for debugging purposes.
* <br />(<a href="TagType.html#Property">property</a> method)
*
* @return a description of this tag type useful for debugging purposes.
*/
public final String getDescription() {
return description;
}
/**
* Returns the character sequence that marks the start of the tag.
* <br />(<a href="TagType.html#Property">property</a> method)
* <p>
* The character sequence must be all in lower case.
* <p>
* The first character in this property <b>must</b> be '<code><</code>'.
* This is a deliberate limitation of the system which is necessary to retain reasonable performance.
* <p>
* The second character in this property must be '<code>/</code>' if the implementing class is an {@link EndTagType}.
* It must <b>not</b> be '<code>/</code>' if the implementing class is a {@link StartTagType}.
* <p>
* <dl>
* <dt>Standard Tag Type Values:</dt>
* <dd>
* <table class="bordered" style="margin: 15px" cellspacing="0">
* <tr><th>Tag Type<th>Start Delimiter
* <tr><td>{@link StartTagType#UNREGISTERED}<td><code><</code>
* <tr><td>{@link StartTagType#NORMAL}<td><code><</code>
* <tr><td>{@link StartTagType#COMMENT}<td><code><!--</code>
* <tr><td>{@link StartTagType#XML_DECLARATION}<td><code><?xml</code>
* <tr><td>{@link StartTagType#XML_PROCESSING_INSTRUCTION}<td><code><?</code>
* <tr><td>{@link StartTagType#DOCTYPE_DECLARATION}<td><code><!doctype</code>
* <tr><td>{@link StartTagType#MARKUP_DECLARATION}<td><code><!</code>
* <tr><td>{@link StartTagType#CDATA_SECTION}<td><code><![cdata[</code>
* <tr><td>{@link StartTagType#SERVER_COMMON}<td><code><%</code>
* <tr><td>{@link EndTagType#UNREGISTERED}<td><code></</code>
* <tr><td>{@link EndTagType#NORMAL}<td><code></</code>
* </table>
* </dl>
* <dl>
* <dt>Extended Tag Type Values:</dt>
* <dd>
* <table class="bordered" style="margin: 15px" cellspacing="0">
* <tr><th>Tag Type<th>Start Delimiter
* <tr><td>{@link MicrosoftTagTypes#DOWNLEVEL_REVEALED_CONDITIONAL_COMMENT}<td><code><![</code>
* <tr><td>{@link PHPTagTypes#PHP_SCRIPT}<td><code><script</code>
* <tr><td>{@link PHPTagTypes#PHP_SHORT}<td><code><?</code>
* <tr><td>{@link PHPTagTypes#PHP_STANDARD}<td><code><?php</code>
* <tr><td>{@link MasonTagTypes#MASON_COMPONENT_CALL}<td><code><&</code>
* <tr><td>{@link MasonTagTypes#MASON_COMPONENT_CALLED_WITH_CONTENT}<td><code><&|</code>
* <tr><td>{@link MasonTagTypes#MASON_COMPONENT_CALLED_WITH_CONTENT_END}<td><code></&</code>
* <tr><td>{@link MasonTagTypes#MASON_NAMED_BLOCK}<td><code><%</code>
* <tr><td>{@link MasonTagTypes#MASON_NAMED_BLOCK_END}<td><code></%</code>
* </table>
* </dl>
*
* @return the character sequence that marks the start of the tag.
*/
public final String getStartDelimiter() {
return startDelimiter;
}
/**
* Returns the character sequence that marks the end of the tag.
* <br />(<a href="TagType.html#Property">property</a> method)
* <p>
* The character sequence must be all in lower case.
* <p>
* In a {@link StartTag} of a {@linkplain StartTagType type} that {@linkplain StartTagType#hasAttributes() has attributes},
* characters appearing inside a quoted attribute value are ignored when determining the location of the closing delimiter.
* <p>
* Note that the optional '<code>/</code>' character preceding the closing '<code>></code>' in an
* {@linkplain StartTag#isEmptyElementTag() empty-element tag} is not considered part of the end delimiter.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -