📄 balanced.pm
字号:
This is the message. EOMSG exit;Given this as an input string in a scalar context, C<extract_quotelike>would correctly return the string "<<'EOMSG'\nThis is the message.\nEOMSG",leaving the string " || die;\nexit;" in the original variable. In other words,the two separate pieces of the here document are successfully extracted andconcatenated.In a list context, C<extract_quotelike> would return the list=over 4=item [0]"<<'EOMSG'\nThis is the message.\nEOMSG\n" (i.e. the full extracted here document,including fore and aft delimiters),=item [1]" || die;\nexit;" (i.e. the remainder of the input text, concatenated),=item [2]"" (i.e. the prefix substring -- trivial in this case),=item [3]"<<" (i.e. the "name" of the quotelike operator)=item [4]"'EOMSG'" (i.e. the left delimiter of the here document, including any quotes),=item [5]"This is the message.\n" (i.e. the text of the here document),=item [6]"EOMSG" (i.e. the right delimiter of the here document),=item [7..10]"" (a here document has no second left delimiter, second text, second rightdelimiter, or trailing modifiers).=backHowever, the matching position of the input variable would be set to"exit;" (i.e. I<after> the closing delimiter of the here document),which would cause the earlier " || die;\nexit;" to be skipped in anysequence of code fragment extractions.To avoid this problem, when it encounters a here document whilstextracting from a modifiable string, C<extract_quotelike> silentlyrearranges the string to an equivalent piece of Perl: <<'EOMSG' This is the message. EOMSG || die; exit;in which the here document I<is> contiguous. It still leaves thematching position after the here document, but now the rest of the lineon which the here document starts is not skipped.To prevent <extract_quotelike> from mucking about with the input in this way(this is the only case where a list-context C<extract_quotelike> does so),you can pass the input variable as an interpolated literal: $quotelike = extract_quotelike("$var");=head2 C<extract_codeblock>C<extract_codeblock> attempts to recognize and extract a balancedbracket delimited substring that may contain unbalanced bracketsinside Perl quotes or quotelike operations. That is, C<extract_codeblock>is like a combination of C<"extract_bracketed"> andC<"extract_quotelike">.C<extract_codeblock> takes the same initial three parameters as C<extract_bracketed>:a text to process, a set of delimiter brackets to look for, and a prefix tomatch first. It also takes an optional fourth parameter, which allows theoutermost delimiter brackets to be specified separately (see below).Omitting the first argument (input text) means process C<$_> instead.Omitting the second argument (delimiter brackets) indicates that only C<'{'> is to be used.Omitting the third argument (prefix argument) implies optional whitespace at the start.Omitting the fourth argument (outermost delimiter brackets) indicates that thevalue of the second argument is to be used for the outermost delimiters.Once the prefix an dthe outermost opening delimiter bracket have beenrecognized, code blocks are extracted by stepping through the input text andtrying the following alternatives in sequence:=over 4=item 1.Try and match a closing delimiter bracket. If the bracket was the samespecies as the last opening bracket, return the substring to thatpoint. If the bracket was mismatched, return an error.=item 2.Try to match a quote or quotelike operator. If found, callC<extract_quotelike> to eat it. If C<extract_quotelike> fails, returnthe error it returned. Otherwise go back to step 1.=item 3.Try to match an opening delimiter bracket. If found, callC<extract_codeblock> recursively to eat the embedded block. If therecursive call fails, return an error. Otherwise, go back to step 1.=item 4.Unconditionally match a bareword or any other single character, andthen go back to step 1.=backExamples: # Find a while loop in the text if ($text =~ s/.*?while\s*\{/{/) { $loop = "while " . extract_codeblock($text); } # Remove the first round-bracketed list (which may include # round- or curly-bracketed code blocks or quotelike operators) extract_codeblock $text, "(){}", '[^(]*';The ability to specify a different outermost delimiter bracket is usefulin some circumstances. For example, in the Parse::RecDescent module,parser actions which are to be performed only on a successful parseare specified using a C<E<lt>defer:...E<gt>> directive. For example: sentence: subject verb object <defer: {$::theVerb = $item{verb}} >Parse::RecDescent uses C<extract_codeblock($text, '{}E<lt>E<gt>')> to extract the codewithin the C<E<lt>defer:...E<gt>> directive, but there's a problem.A deferred action like this: <defer: {if ($count>10) {$count--}} >will be incorrectly parsed as: <defer: {if ($count>because the "less than" operator is interpreted as a closing delimiter.But, by extracting the directive usingS<C<extract_codeblock($text, '{}', undef, 'E<lt>E<gt>')>>the '>' character is only treated as a delimited at the outermostlevel of the code block, so the directive is parsed correctly.=head2 C<extract_multiple>The C<extract_multiple> subroutine takes a string to be processed and a list of extractors (subroutines or regular expressions) to apply to that string.In an array context C<extract_multiple> returns an array of substringsof the original string, as extracted by the specified extractors.In a scalar context, C<extract_multiple> returns the firstsubstring successfully extracted from the original string. In bothscalar and void contexts the original string has the first successfullyextracted substring removed from it. In all contextsC<extract_multiple> starts at the current C<pos> of the string, andsets that C<pos> appropriately after it matches.Hence, the aim of of a call to C<extract_multiple> in a list contextis to split the processed string into as many non-overlapping fields aspossible, by repeatedly applying each of the specified extractorsto the remainder of the string. Thus C<extract_multiple> isa generalized form of Perl's C<split> subroutine.The subroutine takes up to four optional arguments:=over 4=item 1.A string to be processed (C<$_> if the string is omitted or C<undef>)=item 2.A reference to a list of subroutine references and/or qr// objects and/orliteral strings and/or hash references, specifying the extractorsto be used to split the string. If this argument is omitted (orC<undef>) the list: [ sub { extract_variable($_[0], '') }, sub { extract_quotelike($_[0],'') }, sub { extract_codeblock($_[0],'{}','') }, ]is used.=item 3.An number specifying the maximum number of fields to return. If thisargument is omitted (or C<undef>), split continues as long as possible.If the third argument is I<N>, then extraction continues until I<N> fieldshave been successfully extracted, or until the string has been completely processed.Note that in scalar and void contexts the value of this argument is automatically reset to 1 (under C<-w>, a warning is issued if the argument has to be reset).=item 4.A value indicating whether unmatched substrings (see below) within thetext should be skipped or returned as fields. If the value is true,such substrings are skipped. Otherwise, they are returned.=backThe extraction process works by applying each extractor insequence to the text string.If the extractor is a subroutine it is called in a list context and isexpected to return a list of a single element, namely the extractedtext. It may optionally also return two further arguments: a stringrepresenting the text left after extraction (like $' for a patternmatch), and a string representing any prefix skipped before theextraction (like $` in a pattern match). Note that this is designedto facilitate the use of other Text::Balanced subroutines withC<extract_multiple>. Note too that the value returned by an extractorsubroutine need not bear any relationship to the corresponding substringof the original text (see examples below).If the extractor is a precompiled regular expression or a string,it is matched against the text in a scalar context with a leading'\G' and the gc modifiers enabled. The extracted value is either$1 if that variable is defined after the match, or else thecomplete match (i.e. $&).If the extractor is a hash reference, it must contain exactly one element.The value of that element is one of theabove extractor types (subroutine reference, regular expression, or string).The key of that element is the name of a class into which the successfulreturn value of the extractor will be blessed.If an extractor returns a defined value, that value is immediatelytreated as the next extracted field and pushed onto the list of fields.If the extractor was specified in a hash reference, the field is alsoblessed into the appropriate class, If the extractor fails to match (in the case of a regex extractor), or returns an empty list or an undefined value (in the case of a subroutine extractor), it isassumed to have failed to extract.If none of the extractor subroutines succeeds, then onecharacter is extracted from the start of the text and the extractionsubroutines reapplied. Characters which are thus removed are accumulated andeventually become the next field (unless the fourth argument is true, in whichcase they are discarded).For example, the following extracts substrings that are valid Perl variables: @fields = extract_multiple($text, [ sub { extract_variable($_[0]) } ], undef, 1);This example separates a text into fields which are quote delimited,curly bracketed, and anything else. The delimited and bracketedparts are also blessed to identify them (the "anything else" is unblessed): @fields = extract_multiple($text, [ { Delim => sub { extract_delimited($_[0],q{'"}) } }, { Brack => sub { extract_bracketed($_[0],'{}') } }, ]);This call extracts the next single substring that is a valid Perl quotelikeoperator (and removes it from $text): $quotelike = extract_multiple($text, [ sub { extract_quotelike($_[0]) }, ], undef, 1);Finally, here is yet another way to do comma-separated value parsing: @fields = extract_multiple($csv_text, [ sub { extract_delimited($_[0],q{'"}) }, qr/([^,]+)(.*)/, ], undef,1);The list in the second argument means:I<"Try and extract a ' or " delimited string, otherwise extract anything up to a comma...">.The undef third argument means:I<"...as many times as possible...">,and the true value in the fourth argument meansI<"...discarding anything else that appears (i.e. the commas)">.If you wanted the commas preserved as separate fields (i.e. like splitdoes if your split pattern has capturing parentheses), you wouldjust make the last parameter undefined (or remove it).=head2 C<gen_delimited_pat>The C<gen_delimited_pat> subroutine takes a single (string) argument and > builds a Friedl-style optimized regex that matches a string delimitedby any one of the characters in the single argument. For example: gen_delimited_pat(q{'"})returns the regex: (?:\"(?:\\\"|(?!\").)*\"|\'(?:\\\'|(?!\').)*\')Note that the specified delimiters are automatically quotemeta'd.A typical use of C<gen_delimited_pat> would be to build special purpose tagsfor C<extract_tagged>. For example, to properly ignore "empty" XML elements(which might contain quoted strings): my $empty_tag = '<(' . gen_delimited_pat(q{'"}) . '|.)+/>'; extract_tagged($text, undef, undef, undef, {ignore => [$empty_tag]} );C<gen_delimited_pat> may also be called with an optional second argument,which specifies the "escape" character(s) to be used for each delimiter.For example to match a Pascal-style string (where ' is the delimiterand '' is a literal ' within the string): gen_delimited_pat(q{'},q{'});Different escape characters can be specified for different delimiters.For example, to specify that '/' is the escape for single quotesand '%' is the escape for double quotes: gen_delimited_pat(q{'"},q{/%});If more delimiters than escape chars are specified, the last escape charis used for the remaining delimiters.If no escape char is specified for a given specified delimiter, '\' is used.=head2 C<delimited_pat>Note that C<gen_delimited_pat> was previously called C<delimited_pat>.That name may still be used, but is now deprecated. =head1 DIAGNOSTICSIn a list context, all the functions return C<(undef,$original_text)>on failure. In a scalar context, failure is indicated by returning C<undef>(in this case the input text is not modified in any way).In addition, on failure in I<any> context, the C<$@> variable is set.Accessing C<$@-E<gt>{error}> returns one of the error diagnostics listedbelow.Accessing C<$@-E<gt>{pos}> returns the offset into the original string atwhich the error was detected (although not necessarily where it occurred!)Printing C<$@> directly produces the error message, with the offset appended.On success, the C<$@> variable is guaranteed to be C<undef>.The available diagnostics are:=over 4=item C<Did not find a suitable bracket: "%s">The delimiter provided to C<extract_bracketed> was not one ofC<'()[]E<lt>E<gt>{}'>.=item C<Did not find prefix: /%s/>A non-optional prefix was specified but wasn't found at the start of the text.=item C<Did not find opening bracket after prefix: "%s">C<extract_bracketed> or C<extract_codeblock> was expecting aparticular kind of bracket at the start of the text, and didn't find it.=item C<No quotelike operator found after prefix: "%s">C<extract_quotelike> didn't find one of the quotelike operators C<q>,C<qq>, C<qw>, C<qx>, C<s>, C<tr> or C<y> at the start of the substringit was extracting.=item C<Unmatched closing bracket: "%c">C<extract_bracketed>, C<extract_quotelike> or C<extract_codeblock> encountereda closing bracket where none was expected.=item C<Unmatched opening bracket(s): "%s">C<extract_bracketed>, C<extract_quotelike> or C<extract_codeblock> ran out of characters in the text before closing one or more levels of nestedbrackets.=item C<Unmatched embedded quote (%s)>C<extract_bracketed> attempted to match an embedded quoted substring, butfailed to find a closing quote to match it.=item C<Did not find closing delimiter to match '%s'>C<extract_quotelike> was unable to find a closing delimiter to match theone that opened the quote-like operation.=item C<Mismatched closing bracket: expected "%c" but found "%s">C<extract_bracketed>, C<extract_quotelike> or C<extract_codeblock> founda valid bracket delimiter, but it was the wrong species. This usuallyindicates a nesting error, but may indicate incorrect quoting or escaping.=item C<No block delimiter found after quotelike "%s">C<extract_quotelike> or C<extract_codeblock> found one of thequotelike operators C<q>, C<qq>, C<qw>, C<qx>, C<s>, C<tr> or C<y>without a suitable block after it.=item C<Did not find leading dereferencer>C<extract_variable> was expecting one of '$', '@', or '%' at the start ofa variable, but didn't find any of them.=item C<Bad identifier after dereferencer>C<extract_variable> found a '$', '@', or '%' indicating a variable, but thatcharacter was not followed by a legal Perl identifier.=item C<Did not find expected opening bracket at %s>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -