📄 balanced.pm
字号:
would return the same result, since all sets of both types of specifieddelimiter brackets are correctly nested and balanced.However, the call in: @result = extract_bracketed( $text, '{([<' );would fail, returning: ( undef , "{ an '[irregularly :-(] {} parenthesized >:-)' string }" );because the embedded pairs of C<'(..)'>s and C<'[..]'>s are "cross-nested" andthe embedded C<'E<gt>'> is unbalanced. (In a scalar context, this call wouldreturn an empty string. In a void context, C<$text> would be unchanged.)Note that the embedded single-quotes in the string don't help in thiscase, since they have not been specified as acceptable delimiters and aretherefore treated as non-delimiter characters (and ignored).However, if a particular species of quote character is included in thedelimiter specification, then that type of quote will be correctly handled.for example, if C<$text> is: $text = '<A HREF=">>>>">link</A>';then @result = extract_bracketed( $text, '<">' );returns: ( '<A HREF=">>>>">', 'link</A>', "" )as expected. Without the specification of C<"> as an embedded quoter: @result = extract_bracketed( $text, '<>' );the result would be: ( '<A HREF=">', '>>>">link</A>', "" )In addition to the quote delimiters C<'>, C<">, and C<`>, full Perl quote-likequoting (i.e. q{string}, qq{string}, etc) can be specified by including theletter 'q' as a delimiter. Hence: @result = extract_bracketed( $text, '<q>' );would correctly match something like this: $text = '<leftop: conj /and/ conj>';See also: C<"extract_quotelike"> and C<"extract_codeblock">.=head2 C<extract_variable>C<extract_variable> extracts any valid Perl variable orvariable-involved expression, including scalars, arrays, hashes, arrayaccesses, hash look-ups, method calls through objects, subroutine callsthrough subroutine references, etc.The subroutine takes up to two optional arguments:=over 4=item 1.A string to be processed (C<$_> if the string is omitted or C<undef>)=item 2.A string specifying a pattern to be matched as a prefix (which is to beskipped). If omitted, optional whitespace is skipped.=backOn success in a list context, an array of 3 elements is returned. Theelements are:=over 4=item [0]the extracted variable, or variablish expression=item [1]the remainder of the input text,=item [2]the prefix substring (if any),=backOn failure, all of these values (except the remaining text) are C<undef>.In a scalar context, C<extract_variable> returns just the completesubstring that matched a variablish expression. C<undef> is returned onfailure. In addition, the original input text has the returned substring(and any prefix) removed from it.In a void context, the input text just has the matched substring (andany specified prefix) removed.=head2 C<extract_tagged>C<extract_tagged> extracts and segments text between (balanced)specified tags. The subroutine takes up to five optional arguments:=over 4=item 1.A string to be processed (C<$_> if the string is omitted or C<undef>)=item 2.A string specifying a pattern to be matched as the opening tag.If the pattern string is omitted (or C<undef>) then a patternthat matches any standard XML tag is used.=item 3.A string specifying a pattern to be matched at the closing tag. If the pattern string is omitted (or C<undef>) then the closingtag is constructed by inserting a C</> after any leading bracketcharacters in the actual opening tag that was matched (I<not> the patternthat matched the tag). For example, if the opening tag patternis specified as C<'{{\w+}}'> and actually matched the opening tag C<"{{DATA}}">, then the constructed closing tag would be C<"{{/DATA}}">.=item 4.A string specifying a pattern to be matched as a prefix (which is to beskipped). If omitted, optional whitespace is skipped.=item 5.A hash reference containing various parsing options (see below)=backThe various options that can be specified are:=over 4=item C<reject =E<gt> $listref>The list reference contains one or more strings specifying patternsthat must I<not> appear within the tagged text.For example, to extractan HTML link (which should not contain nested links) use: extract_tagged($text, '<A>', '</A>', undef, {reject => ['<A>']} );=item C<ignore =E<gt> $listref>The list reference contains one or more strings specifying patternsthat are I<not> be be treated as nested tags within the tagged text(even if they would match the start tag pattern).For example, to extract an arbitrary XML tag, but ignore "empty" elements: extract_tagged($text, undef, undef, undef, {ignore => ['<[^>]*/>']} );(also see L<"gen_delimited_pat"> below).=item C<fail =E<gt> $str>The C<fail> option indicates the action to be taken if a matching endtag is not encountered (i.e. before the end of the string or someC<reject> pattern matches). By default, a failure to match a closingtag causes C<extract_tagged> to immediately fail.However, if the string value associated with <reject> is "MAX", thenC<extract_tagged> returns the complete text up to the point of failure.If the string is "PARA", C<extract_tagged> returns only the first paragraphafter the tag (up to the first line that is either empty or containsonly whitespace characters).If the string is "", the the default behaviour (i.e. failure) is reinstated.For example, suppose the start tag "/para" introduces a paragraph, which thencontinues until the next "/endpara" tag or until another "/para" tag isencountered: $text = "/para line 1\n\nline 3\n/para line 4"; extract_tagged($text, '/para', '/endpara', undef, {reject => '/para', fail => MAX ); # EXTRACTED: "/para line 1\n\nline 3\n"Suppose instead, that if no matching "/endpara" tag is found, the "/para"tag refers only to the immediately following paragraph: $text = "/para line 1\n\nline 3\n/para line 4"; extract_tagged($text, '/para', '/endpara', undef, {reject => '/para', fail => MAX ); # EXTRACTED: "/para line 1\n"Note that the specified C<fail> behaviour applies to nested tags as well.=backOn success in a list context, an array of 6 elements is returned. The elements are:=over 4=item [0]the extracted tagged substring (including the outermost tags),=item [1]the remainder of the input text,=item [2]the prefix substring (if any),=item [3]the opening tag=item [4]the text between the opening and closing tags=item [5]the closing tag (or "" if no closing tag was found)=backOn failure, all of these values (except the remaining text) are C<undef>.In a scalar context, C<extract_tagged> returns just the completesubstring that matched a tagged text (including the start and endtags). C<undef> is returned on failure. In addition, the original inputtext has the returned substring (and any prefix) removed from it.In a void context, the input text just has the matched substring (andany specified prefix) removed.=head2 C<gen_extract_tagged>(Note: This subroutine is only available under Perl5.005)C<gen_extract_tagged> generates a new anonymous subroutine whichextracts text between (balanced) specified tags. In other words,it generates a function identical in function to C<extract_tagged>.The difference between C<extract_tagged> and the anonymoussubroutines generated byC<gen_extract_tagged>, is that those generated subroutines:=over 4=item * do not have to reparse tag specification or parsing options every timethey are called (whereas C<extract_tagged> has to effectively rebuildits tag parser on every call);=item *make use of the new qr// construct to pre-compile the regexes they use(whereas C<extract_tagged> uses standard string variable interpolation to create tag-matching patterns).=backThe subroutine takes up to four optional arguments (the same set asC<extract_tagged> except for the string to be processed). It returnsa reference to a subroutine which in turn takes a single argument (the text tobe extracted from).In other words, the implementation of C<extract_tagged> is exactlyequivalent to: sub extract_tagged { my $text = shift; $extractor = gen_extract_tagged(@_); return $extractor->($text); }(although C<extract_tagged> is not currently implemented that way, in orderto preserve pre-5.005 compatibility).Using C<gen_extract_tagged> to create extraction functions for specific tags is a good idea if those functions are going to be called more than once, sincetheir performance is typically twice as good as the more general-purposeC<extract_tagged>.=head2 C<extract_quotelike>C<extract_quotelike> attempts to recognize, extract, and segment anyone of the various Perl quotes and quotelike operators (seeL<perlop(3)>) Nested backslashed delimiters, embedded balanced bracketdelimiters (for the quotelike operators), and trailing modifiers areall caught. For example, in: extract_quotelike 'q # an octothorpe: \# (not the end of the q!) #' extract_quotelike ' "You said, \"Use sed\"." ' extract_quotelike ' s{([A-Z]{1,8}\.[A-Z]{3})} /\L$1\E/; ' extract_quotelike ' tr/\\\/\\\\/\\\//ds; 'the full Perl quotelike operations are all extracted correctly.Note too that, when using the /x modifier on a regex, any commentcontaining the current pattern delimiter will cause the regex to beimmediately terminated. In other words: 'm / (?i) # CASE INSENSITIVE [a-z_] # LEADING ALPHABETIC/UNDERSCORE [a-z0-9]* # FOLLOWED BY ANY NUMBER OF ALPHANUMERICS /x'will be extracted as if it were: 'm / (?i) # CASE INSENSITIVE [a-z_] # LEADING ALPHABETIC/'This behaviour is identical to that of the actual compiler.C<extract_quotelike> takes two arguments: the text to be processed anda prefix to be matched at the very beginning of the text. If no prefix is specified, optional whitespace is the default. If no text is given,C<$_> is used.In a list context, an array of 11 elements is returned. The elements are:=over 4=item [0]the extracted quotelike substring (including trailing modifiers),=item [1]the remainder of the input text,=item [2]the prefix substring (if any),=item [3]the name of the quotelike operator (if any),=item [4]the left delimiter of the first block of the operation,=item [5]the text of the first block of the operation(that is, the contents ofa quote, the regex of a match or substitution or the target list of atranslation),=item [6]the right delimiter of the first block of the operation,=item [7]the left delimiter of the second block of the operation(that is, if it is a C<s>, C<tr>, or C<y>),=item [8]the text of the second block of the operation (that is, the replacement of a substitution or the translation listof a translation),=item [9]the right delimiter of the second block of the operation (if any),=item [10]the trailing modifiers on the operation (if any).=backFor each of the fields marked "(if any)" the default value on success isan empty string.On failure, all of these values (except the remaining text) are C<undef>.In a scalar context, C<extract_quotelike> returns just the complete substringthat matched a quotelike operation (or C<undef> on failure). In a scalar orvoid context, the input text has the same substring (and any specifiedprefix) removed.Examples: # Remove the first quotelike literal that appears in text $quotelike = extract_quotelike($text,'.*?'); # Replace one or more leading whitespace-separated quotelike # literals in $_ with "<QLL>" do { $_ = join '<QLL>', (extract_quotelike)[2,1] } until $@; # Isolate the search pattern in a quotelike operation from $text ($op,$pat) = (extract_quotelike $text)[3,5]; if ($op =~ /[ms]/) { print "search pattern: $pat\n"; } else { print "$op is not a pattern matching operation\n"; }=head2 C<extract_quotelike> and "here documents"C<extract_quotelike> can successfully extract "here documents" from an inputstring, but with an important caveat in list contexts.Unlike other types of quote-like literals, a here document is rarelya contiguous substring. For example, a typical piece of code usinghere document might look like this: <<'EOMSG' || die;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -