re_syntax.tex

来自「Wxpython Implemented on Windows CE, Sou」· TEX 代码 · 共 661 行 · 第 1/3 页

TEX
661
字号

Match lengths are measured in characters,
not collating elements. An empty string is considered longer than no match
at all. For example, {\bf bb*} matches the three middle characters
of `{\bf abbbc}', {\bf (week$|$wee)(night$|$knights)}
matches all ten characters of `{\bf weeknights}', when {\bf (.*).*} is matched against
 {\bf abc} the parenthesized subexpression matches all three characters, and when
 {\bf (a*)*} is matched against {\bf bc} both the whole RE and the parenthesized subexpression
match an empty string. 

If case-independent matching is specified, the effect
is much as if all case distinctions had vanished from the alphabet. When
an alphabetic that exists in multiple cases appears as an ordinary character
outside a bracket expression, it is effectively transformed into a bracket
expression containing both cases, so that {\bf x} becomes `{\bf $[xX]$}'. When it appears
inside a bracket expression, all case counterparts of it are added to the
bracket expression, so that {\bf $[x]$} becomes {\bf $[xX]$} and {\bf $[^x]$} becomes `{\bf $[^xX]$}'. 

If newline-sensitive
matching is specified, {\bf .} and bracket expressions using {\bf \caret} will never match
the newline character (so that matches will never cross newlines unless
the RE explicitly arranges it) and {\bf \caret} and {\bf \$} will match the empty string after
and before a newline respectively, in addition to matching at beginning
and end of string respectively. ARE {\bf $\backslash$A} and {\bf $\backslash$Z} continue to match beginning
or end of string {\it only}. 

If partial newline-sensitive matching is specified,
this affects {\bf .} and bracket expressions as with newline-sensitive matching,
but not {\bf \caret} and `{\bf \$}'. 

If inverse partial newline-sensitive matching is specified,
this affects {\bf \caret} and {\bf \$} as with newline-sensitive matching, but not {\bf .} and bracket
expressions. This isn't very useful but is provided for symmetry. 

\subsection{Limits And Compatibility}\label{relimits}

\helpref{Syntax of the builtin regular expression library}{wxresyn}

No particular limit is imposed on the length of REs. Programs
intended to be highly portable should not employ REs longer than 256 bytes,
as a POSIX-compliant implementation can refuse to accept such REs. 

The only
feature of AREs that is actually incompatible with POSIX EREs is that {\bf $\backslash$}
does not lose its special significance inside bracket expressions. All other
ARE features use syntax which is illegal or has undefined or unspecified
effects in POSIX EREs; the {\bf ***} syntax of directors likewise is outside
the POSIX syntax for both BREs and EREs. 

Many of the ARE extensions are
borrowed from Perl, but some have been changed to clean them up, and a
few Perl extensions are not present. Incompatibilities of note include `{\bf $\backslash$b}',
`{\bf $\backslash$B}', the lack of special treatment for a trailing newline, the addition of
complemented bracket expressions to the things affected by newline-sensitive
matching, the restrictions on parentheses and back references in lookahead
constraints, and the longest/shortest-match (rather than first-match) matching
semantics. 

The matching rules for REs containing both normal and non-greedy
quantifiers have changed since early beta-test versions of this package.
(The new rules are much simpler and cleaner, but don't work as hard at guessing
the user's real intentions.) 

Henry Spencer's original 1986 {\it regexp} package, still in widespread use,
%(e.g., in pre-8.1 releases of Tcl),
implemented an early version of today's EREs. There are four incompatibilities between {\it regexp}'s
near-EREs (`RREs' for short) and AREs. In roughly increasing order of significance:
{\itemize
\item In AREs, {\bf $\backslash$} followed by an alphanumeric character is either an escape or
an error, while in RREs, it was just another way of writing the  alphanumeric.
This should not be a problem because there was no reason to write such
a sequence in RREs. 

\item {\bf \{} followed by a digit in an ARE is the beginning of
a bound, while in RREs, {\bf \{} was always an ordinary character. Such sequences
should be rare, and will often result in an error because following characters
will not look like a valid bound. 

\item In AREs, {\bf $\backslash$} remains a special character
within `{\bf $[]$}', so a literal {\bf $\backslash$} within {\bf $[]$} must be
written `{\bf $\backslash\backslash$}'. {\bf $\backslash\backslash$} also gives a literal
 {\bf $\backslash$} within {\bf $[]$} in RREs, but only truly paranoid programmers routinely doubled
the backslash. 

\item AREs report the longest/shortest match for the RE, rather
than the first found in a specified search order. This may affect some RREs
which were written in the expectation that the first match would be reported.
(The careful crafting of RREs to optimize the search order for fast matching
is obsolete (AREs examine all possible matches in parallel, and their performance
is largely insensitive to their complexity) but cases where the search
order was exploited to deliberately  find a match which was {\it not} the longest/shortest
will need rewriting.)  
}

\subsection{Basic Regular Expressions}\label{wxresynbre}

\helpref{Syntax of the builtin regular expression library}{wxresyn}

BREs differ from EREs in
several respects.  `{\bf $|$}', `{\bf +}', and {\bf ?} are ordinary characters and there is no equivalent
for their functionality. The delimiters for bounds
are {\bf $\backslash$\{} and `{\bf $\backslash$\}}', with {\bf \{} and
 {\bf \}} by themselves ordinary characters. The parentheses for nested subexpressions
are {\bf $\backslash$(} and `{\bf $\backslash$)}', with {\bf (} and {\bf )} by themselves
ordinary characters. {\bf \caret} is an ordinary
character except at the beginning of the RE or the beginning of a parenthesized
subexpression, {\bf \$} is an ordinary character except at the end of the RE or
the end of a parenthesized subexpression, and {\bf *} is an ordinary character
if it appears at the beginning of the RE or the beginning of a parenthesized
subexpression (after a possible leading `{\bf \caret}'). Finally, single-digit back references
are available, and {\bf $\backslash<$} and {\bf $\backslash>$} are synonyms
for {\bf $[[:<:]]$} and {\bf $[[:>:]]$} respectively;
no other escapes are available.  

\subsection{Regular Expression Character Names}\label{wxresynchars}

\helpref{Syntax of the builtin regular expression library}{wxresyn}

Note that the character names are case sensitive.

\begin{twocollist}
\twocolitem{NUL}{'$\backslash$0'}
\twocolitem{SOH}{'$\backslash$001'}
\twocolitem{STX}{'$\backslash$002'}
\twocolitem{ETX}{'$\backslash$003'}
\twocolitem{EOT}{'$\backslash$004'}
\twocolitem{ENQ}{'$\backslash$005'}
\twocolitem{ACK}{'$\backslash$006'}
\twocolitem{BEL}{'$\backslash$007'}
\twocolitem{alert}{'$\backslash$007'}
\twocolitem{BS}{'$\backslash$010'}
\twocolitem{backspace}{'$\backslash$b'}
\twocolitem{HT}{'$\backslash$011'}
\twocolitem{tab}{'$\backslash$t'}
\twocolitem{LF}{'$\backslash$012'}
\twocolitem{newline}{'$\backslash$n'}
\twocolitem{VT}{'$\backslash$013'}
\twocolitem{vertical-tab}{'$\backslash$v'}
\twocolitem{FF}{'$\backslash$014'}
\twocolitem{form-feed}{'$\backslash$f'}
\twocolitem{CR}{'$\backslash$015'}
\twocolitem{carriage-return}{'$\backslash$r'}
\twocolitem{SO}{'$\backslash$016'}
\twocolitem{SI}{'$\backslash$017'}
\twocolitem{DLE}{'$\backslash$020'}
\twocolitem{DC1}{'$\backslash$021'}
\twocolitem{DC2}{'$\backslash$022'}
\twocolitem{DC3}{'$\backslash$023'}
\twocolitem{DC4}{'$\backslash$024'}
\twocolitem{NAK}{'$\backslash$025'}
\twocolitem{SYN}{'$\backslash$026'}
\twocolitem{ETB}{'$\backslash$027'}
\twocolitem{CAN}{'$\backslash$030'}
\twocolitem{EM}{'$\backslash$031'}
\twocolitem{SUB}{'$\backslash$032'}
\twocolitem{ESC}{'$\backslash$033'}
\twocolitem{IS4}{'$\backslash$034'}
\twocolitem{FS}{'$\backslash$034'}
\twocolitem{IS3}{'$\backslash$035'}
\twocolitem{GS}{'$\backslash$035'}
\twocolitem{IS2}{'$\backslash$036'}
\twocolitem{RS}{'$\backslash$036'}
\twocolitem{IS1}{'$\backslash$037'}
\twocolitem{US}{'$\backslash$037'}
\twocolitem{space}{' '}
\twocolitem{exclamation-mark}{'!'}
\twocolitem{quotation-mark}{'"'}
\twocolitem{number-sign}{'\#'}
\twocolitem{dollar-sign}{'\$'}
\twocolitem{percent-sign}{'\%'}
\twocolitem{ampersand}{'\&'}
\twocolitem{apostrophe}{'$\backslash$''}
\twocolitem{left-parenthesis}{'('}
\twocolitem{right-parenthesis}{')'}
\twocolitem{asterisk}{'*'}
\twocolitem{plus-sign}{'+'}
\twocolitem{comma}{','}
\twocolitem{hyphen}{'-'}
\twocolitem{hyphen-minus}{'-'}
\twocolitem{period}{'.'}
\twocolitem{full-stop}{'.'}
\twocolitem{slash}{'/'}
\twocolitem{solidus}{'/'}
\twocolitem{zero}{'0'}
\twocolitem{one}{'1'}
\twocolitem{two}{'2'}
\twocolitem{three}{'3'}
\twocolitem{four}{'4'}
\twocolitem{five}{'5'}
\twocolitem{six}{'6'}
\twocolitem{seven}{'7'}
\twocolitem{eight}{'8'}
\twocolitem{nine}{'9'}
\twocolitem{colon}{':'}
\twocolitem{semicolon}{';'}
\twocolitem{less-than-sign}{'<'}
\twocolitem{equals-sign}{'='}
\twocolitem{greater-than-sign}{'>'}
\twocolitem{question-mark}{'?'}
\twocolitem{commercial-at}{'@'}
\twocolitem{left-square-bracket}{'$[$'}
\twocolitem{backslash}{'$\backslash$'}
\twocolitem{reverse-solidus}{'$\backslash$'}
\twocolitem{right-square-bracket}{'$]$'}
\twocolitem{circumflex}{'\caret'}
\twocolitem{circumflex-accent}{'\caret'}
\twocolitem{underscore}{'\_'}
\twocolitem{low-line}{'\_'}
\twocolitem{grave-accent}{'`'}
\twocolitem{left-brace}{'\{'}
\twocolitem{left-curly-bracket}{'\{'}
\twocolitem{vertical-line}{'$|$'}
\twocolitem{right-brace}{'\}'}
\twocolitem{right-curly-bracket}{'\}'}
\twocolitem{tilde}{'\destruct{}'}
\twocolitem{DEL}{'$\backslash$177'}
\end{twocollist}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?