📄 perlpodspec.pod
字号:
abort the parse. A Pod parser may allow a way for particularapplications to add to the above list of known commands, and tostipulate, for each additional command, whether formattingcodes should be processed.Future versions of this specification may add additionalcommands.=head1 Pod Formatting Codes(Note that in previous drafts of this document and of perlpod,formatting codes were referred to as "interior sequences", andthis term may still be found in the documentation for Pod parsers,and in error messages from Pod processors.)There are two syntaxes for formatting codes:=over=item *A formatting code starts with a capital letter (just US-ASCII [A-Z])followed by a "<", any number of characters, and ending with the firstmatching ">". Examples: That's what I<you> think! What's C<dump()> for? X<C<chmod> and C<unlink()> Under Different Operating Systems>=item *A formatting code starts with a capital letter (just US-ASCII [A-Z])followed by two or more "<"'s, one or more whitespace characters,any number of characters, one or more whitespace characters,and ending with the first matching sequence of two or more ">"'s, wherethe number of ">"'s equals the number of "<"'s in the opening of thisformatting code. Examples: That's what I<< you >> think! C<<< open(X, ">>thing.dat") || die $! >>> B<< $foo->bar(); >>With this syntax, the whitespace character(s) after the "CE<lt><<"and before the ">>" (or whatever letter) are I<not> renderable -- theydo not signify whitespace, are merely part of the formatting codesthemselves. That is, these are all synonymous: C<thing> C<< thing >> C<< thing >> C<<< thing >>> C<<<< thing >>>>and so on.=backIn parsing Pod, a notably tricky part is the correct parsing of(potentially nested!) formatting codes. Implementors shouldconsult the code in the C<parse_text> routine in Pod::Parser as anexample of a correct implementation.=over=item C<IE<lt>textE<gt>> -- italic textSee the brief discussion in L<perlpod/"Formatting Codes">.=item C<BE<lt>textE<gt>> -- bold textSee the brief discussion in L<perlpod/"Formatting Codes">.=item C<CE<lt>codeE<gt>> -- code textSee the brief discussion in L<perlpod/"Formatting Codes">.=item C<FE<lt>filenameE<gt>> -- style for filenamesSee the brief discussion in L<perlpod/"Formatting Codes">.=item C<XE<lt>topic nameE<gt>> -- an index entrySee the brief discussion in L<perlpod/"Formatting Codes">.This code is unusual in that most formatters completely discardthis code and its content. Other formatters will render it withinvisible codes that can be used in building an index ofthe current document.=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting codeDiscussed briefly in L<perlpod/"Formatting Codes">.This code is unusual is that it should have no content. That is,a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whetheror not it complains, the I<potatoes> text should ignored.=item C<LE<lt>nameE<gt>> -- a hyperlinkThe complicated syntaxes of this code are discussed at length inL<perlpod/"Formatting Codes">, and implementation details arediscussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing thecontents of LE<lt>content> is tricky. Notably, the content has to bechecked for whether it looks like a URL, or whether it has to be spliton literal "|" and/or "/" (in the right order!), and so on,I<before> EE<lt>...> codes are resolved.=item C<EE<lt>escapeE<gt>> -- a character escapeSee L<perlpod/"Formatting Codes">, and several points inL</Notes on Implementing Pod Processors>.=item C<SE<lt>textE<gt>> -- text contains non-breaking spacesThis formatting code is syntactically simple, but semanticallycomplex. What it means is that each space in the printablecontent of this code signifies a non-breaking space.Consider: C<$x ? $y : $z> S<C<$x ? $y : $z>>Both signify the monospace (c[ode] style) text consisting of"$x", one space, "?", one space, ":", one space, "$z". Thedifference is that in the latter, with the S code, those spacesare not "normal" spaces, but instead are non-breaking spaces.=backIf a Pod processor sees any formatting code other than the oneslisted above (as in "NE<lt>...>", or "QE<lt>...>", etc.), thatprocessor must by default treat this as an error.A Pod parser may allow a way for particularapplications to add to the above list of known formatting codes;a Pod parser might even allow a way to stipulate, for each additionalcommand, whether it requires some form of special processing, asLE<lt>...> does.Future versions of this specification may add additionalformatting codes.Historical note: A few older Pod processors would not see a ">" asclosing a "CE<lt>" code, if the ">" was immediately preceded bya "-". This was so that this: C<$foo->bar>would parse as equivalent to this: C<$foo-E<gt>bar>instead of as equivalent to a "C" formatting code containing only "$foo-", and then a "bar>" outside the "C" formatting code. Thisproblem has since been solved by the addition of syntaxes like this: C<< $foo->bar >>Compliant parsers must not treat "->" as special.Formatting codes absolutely cannot span paragraphs. If a code isopened in one paragraph, and no closing code is found by the end ofthat paragraph, the Pod parser must close that formatting code,and should complain (as in "Unterminated I code in the paragraphstarting at line 123: 'Time objects are not...'"). So thesetwo paragraphs: I<I told you not to do this! Don't make me say it again!>...must I<not> be parsed as two paragraphs in italics (with the Icode starting in one paragraph and starting in another.) Instead,the first paragraph should generate a warning, but that aside, theabove code must parse as if it were: I<I told you not to do this!> Don't make me say it again!E<gt>(In SGMLish jargon, all Pod commands are like block-levelelements, whereas all Pod formatting codes are like inline-levelelements.)=head1 Notes on Implementing Pod ProcessorsThe following is a long section of miscellaneous requirementsand suggestions to do with Pod processing.=over=item *Pod formatters should tolerate lines in verbatim blocks that are ofany length, even if that means having to break them (possibly severaltimes, for very long lines) to avoid text running off the side of thepage. Pod formatters may warn of such line-breaking. Such warningsare particularly appropriate for lines are over 100 characters long, whichare usually not intentional.=item *Pod parsers must recognize I<all> of the three well-known newlineformats: CR, LF, and CRLF. See L<perlport|perlport>.=item *Pod parsers should accept input lines that are of any length.=item *Since Perl recognizes a Unicode Byte Order Mark at the start of filesas signaling that the file is Unicode encoded as in UTF-16 (whetherbig-endian or little-endian) or UTF-8, Pod parsers should do thesame. Otherwise, the character encoding should be understood asbeing UTF-8 if the first highbit byte sequence in the file seemsvalid as a UTF-8 sequence, or otherwise as Latin-1.Future versions of this specification may specifyhow Pod can accept other encodings. Presumably treatment of otherencodings in Pod parsing would be as in XML parsing: whatever theencoding declared by a particular Pod file, content is to bestored in memory as Unicode characters.=item *The well known Unicode Byte Order Marks are as follows: if thefile begins with the two literal byte values 0xFE 0xFF, this isthe BOM for big-endian UTF-16. If the file begins with the twoliteral byte value 0xFF 0xFE, this is the BOM for little-endianUTF-16. If the file begins with the three literal byte values0xEF 0xBB 0xBF, this is the BOM for UTF-8.=for comment use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; 0xEF 0xBB 0xBF=for comment If toke.c is modified to support UTF-32, add mention of those here.=item *A naive but sufficient heuristic for testing the first highbitbyte-sequence in a BOM-less file (whether in code or in Pod!), to seewhether that sequence is valid as UTF-8 (RFC 2279) is to check whetherthat the first byte in the sequence is in the range 0xC0 - 0xFDI<and> whether the next byte is in the range0x80 - 0xBF. If so, the parser may conclude that this file is inUTF-8, and all highbit sequences in the file should be assumed tobe UTF-8. Otherwise the parser should treat the file as beingin Latin-1. In the unlikely circumstance that the first highbitsequence in a truly non-UTF-8 file happens to appear to be UTF-8, onecan cater to our heuristic (as well as any more intelligent heuristic)by prefacing that line with a comment line containing a highbitsequence that is clearly I<not> valid as UTF-8. A line consistingof simply "#", an e-acute, and any non-highbit byte,is sufficient to establish this file's encoding.=for comment If/WHEN some brave soul makes these heuristics into a generic text-file class (or PerlIO layer?), we can presumably delete mention of these icky details from this file, and can instead tell people to just use appropriate class/layer. Auto-recognition of newline sequences would be another desirable feature of such a class/layer. HINT HINT HINT.=for comment "The probability that a string of characters in any other encoding appears as valid UTF-8 is low" - RFC2279=item *This document's requirements and suggestions about encodingsdo not apply to Pod processors running on non-ASCII platforms,notably EBCDIC platforms.=item *Pod processors must treat a "=for [label] [content...]" paragraph asmeaning the same thing as a "=begin [label]" paragraph, content, andan "=end [label]" paragraph. (The parser may conflate these twoconstructs, or may leave them distinct, in the expectation that theformatter will nevertheless treat them the same.)=item *When rendering Pod to a format that allows comments (i.e., to nearlyany format other than plaintext), a Pod formatter must insert commenttext identifying its name and version number, and the name andversion numbers of any modules it might be using to process the Pod.Minimal examples: %% POD::Pod2PS v3.14159, using POD::Parser v1.92 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} .\" Pod::Man version 3.14159, using POD::Parser version 1.92Formatters may also insert additional comments, including: therelease date of the Pod formatter program, the contact address forthe author(s) of the formatter, the current time, the name of inputfile, the formatting options in effect, version of Perl used, etc.Formatters may also choose to note errors/warnings as comments,besides or instead of emitting them otherwise (as in messages toSTDERR, or C<die>ing).=item *Pod parsers I<may> emit warnings or error messages ("Unknown E codeEE<lt>zslig>!") to STDERR (whether through printing to STDERR, orC<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allowsuppressing all such STDERR output, and instead allow an option forreporting errors/warningsin some other way, whether by triggering a callback, or noting errorsin some attribute of the document object, or some similarly unobtrusivemechanism -- or even by appending a "Pod Errors" section to the end ofthe parsed form of the document.=item *In cases of exceptionally aberrant documents, Pod parsers may abort theparse. Even then, using C<die>ing/C<croak>ing is to be avoided; wherepossible, the parser library may simply close the input fileand add text like "*** Formatting Aborted ***" to the end of the(partial) in-memory document.=item *In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>)are understood (i.e., I<not> verbatim paragraphs, but I<including>ordinary paragraphs, and command paragraphs that produce renderabletext, like "=head1"), literal whitespace should generally be considered"insignificant", in that one literal space has the same meaning as any(nonzero) number of literal spaces, literal newlines, and literal tabs(as long as this produces no blank lines, since those would terminatethe paragraph). Pod parsers should compact literal whitespace in eachprocessed paragraph, but may provide an option for overriding this(since some processing tasks do not require it), or may followadditional special rules (for example, specially treatingperiod-space-space or period-newline sequences).=item *Pod parsers should not, by default, try to coerce apostrophe (') andquote (") into smart quotes (little 9's, 66's, 99's, etc), nor try toturn backtick (`) into anything else but a single backtick character(distinct from an open quote character!), nor "--" into anything buttwo minus signs. They I<must never> do any of those things to text
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -