📄 re_syntax.tex

📁 很牛的GUI源码wxWidgets-2.8.0.zip 可在多种平台下运行.
💻 TEX
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
Match lengths are measured in characters,not collating elements. An empty string is considered longer than no matchat all. For example, {\bf bb*} matches the three middle charactersof `{\bf abbbc}', {\bf (week$|$wee)(night$|$knights)}matches all ten characters of `{\bf weeknights}', when {\bf (.*).*} is matched against {\bf abc} the parenthesized subexpression matches all three characters, and when {\bf (a*)*} is matched against {\bf bc} both the whole RE and the parenthesized subexpressionmatch an empty string. If case-independent matching is specified, the effectis much as if all case distinctions had vanished from the alphabet. Whenan alphabetic that exists in multiple cases appears as an ordinary characteroutside a bracket expression, it is effectively transformed into a bracketexpression containing both cases, so that {\bf x} becomes `{\bf $[xX]$}'. When it appearsinside a bracket expression, all case counterparts of it are added to thebracket expression, so that {\bf $[x]$} becomes {\bf $[xX]$} and {\bf $[^x]$} becomes `{\bf $[^xX]$}'. If newline-sensitivematching is specified, {\bf .} and bracket expressions using {\bf \caret} will never matchthe newline character (so that matches will never cross newlines unlessthe RE explicitly arranges it) and {\bf \caret} and {\bf \$} will match the empty string afterand before a newline respectively, in addition to matching at beginningand end of string respectively. ARE {\bf $\backslash$A} and {\bf $\backslash$Z} continue to match beginningor end of string {\it only}. If partial newline-sensitive matching is specified,this affects {\bf .} and bracket expressions as with newline-sensitive matching,but not {\bf \caret} and `{\bf \$}'. If inverse partial newline-sensitive matching is specified,this affects {\bf \caret} and {\bf \$} as with newline-sensitive matching, but not {\bf .} and bracketexpressions. This isn't very useful but is provided for symmetry. \subsection{Limits And Compatibility}\label{relimits}\helpref{Syntax of the builtin regular expression library}{wxresyn}No particular limit is imposed on the length of REs. Programsintended to be highly portable should not employ REs longer than 256 bytes,as a POSIX-compliant implementation can refuse to accept such REs. The onlyfeature of AREs that is actually incompatible with POSIX EREs is that {\bf $\backslash$}does not lose its special significance inside bracket expressions. All otherARE features use syntax which is illegal or has undefined or unspecifiedeffects in POSIX EREs; the {\bf ***} syntax of directors likewise is outsidethe POSIX syntax for both BREs and EREs. Many of the ARE extensions areborrowed from Perl, but some have been changed to clean them up, and afew Perl extensions are not present. Incompatibilities of note include `{\bf $\backslash$b}',`{\bf $\backslash$B}', the lack of special treatment for a trailing newline, the addition ofcomplemented bracket expressions to the things affected by newline-sensitivematching, the restrictions on parentheses and back references in lookaheadconstraints, and the longest/shortest-match (rather than first-match) matchingsemantics. The matching rules for REs containing both normal and non-greedyquantifiers have changed since early beta-test versions of this package.(The new rules are much simpler and cleaner, but don't work as hard at guessingthe user's real intentions.) Henry Spencer's original 1986 {\it regexp} package, still in widespread use,%(e.g., in pre-8.1 releases of Tcl),implemented an early version of today's EREs. There are four incompatibilities between {\it regexp}'snear-EREs (`RREs' for short) and AREs. In roughly increasing order of significance:{\itemize\item In AREs, {\bf $\backslash$} followed by an alphanumeric character is either an escape oran error, while in RREs, it was just another way of writing the  alphanumeric.This should not be a problem because there was no reason to write sucha sequence in RREs. \item {\bf \{} followed by a digit in an ARE is the beginning ofa bound, while in RREs, {\bf \{} was always an ordinary character. Such sequencesshould be rare, and will often result in an error because following characterswill not look like a valid bound. \item In AREs, {\bf $\backslash$} remains a special characterwithin `{\bf $[]$}', so a literal {\bf $\backslash$} within {\bf $[]$} must bewritten `{\bf $\backslash\backslash$}'. {\bf $\backslash\backslash$} also gives a literal {\bf $\backslash$} within {\bf $[]$} in RREs, but only truly paranoid programmers routinely doubledthe backslash. \item AREs report the longest/shortest match for the RE, ratherthan the first found in a specified search order. This may affect some RREswhich were written in the expectation that the first match would be reported.(The careful crafting of RREs to optimize the search order for fast matchingis obsolete (AREs examine all possible matches in parallel, and their performanceis largely insensitive to their complexity) but cases where the searchorder was exploited to deliberately  find a match which was {\it not} the longest/shortestwill need rewriting.)  }\subsection{Basic Regular Expressions}\label{wxresynbre}\helpref{Syntax of the builtin regular expression library}{wxresyn}BREs differ from EREs inseveral respects.  `{\bf $|$}', `{\bf +}', and {\bf ?} are ordinary characters and there is no equivalentfor their functionality. The delimiters for boundsare {\bf $\backslash$\{} and `{\bf $\backslash$\}}', with {\bf \{} and {\bf \}} by themselves ordinary characters. The parentheses for nested subexpressionsare {\bf $\backslash$(} and `{\bf $\backslash$)}', with {\bf (} and {\bf )} by themselvesordinary characters. {\bf \caret} is an ordinarycharacter except at the beginning of the RE or the beginning of a parenthesizedsubexpression, {\bf \$} is an ordinary character except at the end of the RE orthe end of a parenthesized subexpression, and {\bf *} is an ordinary characterif it appears at the beginning of the RE or the beginning of a parenthesizedsubexpression (after a possible leading `{\bf \caret}'). Finally, single-digit back referencesare available, and {\bf $\backslash<$} and {\bf $\backslash>$} are synonymsfor {\bf $[[:<:]]$} and {\bf $[[:>:]]$} respectively;no other escapes are available.  \subsection{Regular Expression Character Names}\label{wxresynchars}\helpref{Syntax of the builtin regular expression library}{wxresyn}Note that the character names are case sensitive.\begin{twocollist}\twocolitem{NUL}{'$\backslash$0'}\twocolitem{SOH}{'$\backslash$001'}\twocolitem{STX}{'$\backslash$002'}\twocolitem{ETX}{'$\backslash$003'}\twocolitem{EOT}{'$\backslash$004'}\twocolitem{ENQ}{'$\backslash$005'}\twocolitem{ACK}{'$\backslash$006'}\twocolitem{BEL}{'$\backslash$007'}\twocolitem{alert}{'$\backslash$007'}\twocolitem{BS}{'$\backslash$010'}\twocolitem{backspace}{'$\backslash$b'}\twocolitem{HT}{'$\backslash$011'}\twocolitem{tab}{'$\backslash$t'}\twocolitem{LF}{'$\backslash$012'}\twocolitem{newline}{'$\backslash$n'}\twocolitem{VT}{'$\backslash$013'}\twocolitem{vertical-tab}{'$\backslash$v'}\twocolitem{FF}{'$\backslash$014'}\twocolitem{form-feed}{'$\backslash$f'}\twocolitem{CR}{'$\backslash$015'}\twocolitem{carriage-return}{'$\backslash$r'}\twocolitem{SO}{'$\backslash$016'}\twocolitem{SI}{'$\backslash$017'}\twocolitem{DLE}{'$\backslash$020'}\twocolitem{DC1}{'$\backslash$021'}\twocolitem{DC2}{'$\backslash$022'}\twocolitem{DC3}{'$\backslash$023'}\twocolitem{DC4}{'$\backslash$024'}\twocolitem{NAK}{'$\backslash$025'}\twocolitem{SYN}{'$\backslash$026'}\twocolitem{ETB}{'$\backslash$027'}\twocolitem{CAN}{'$\backslash$030'}\twocolitem{EM}{'$\backslash$031'}\twocolitem{SUB}{'$\backslash$032'}\twocolitem{ESC}{'$\backslash$033'}\twocolitem{IS4}{'$\backslash$034'}\twocolitem{FS}{'$\backslash$034'}\twocolitem{IS3}{'$\backslash$035'}\twocolitem{GS}{'$\backslash$035'}\twocolitem{IS2}{'$\backslash$036'}\twocolitem{RS}{'$\backslash$036'}\twocolitem{IS1}{'$\backslash$037'}\twocolitem{US}{'$\backslash$037'}\twocolitem{space}{' '}\twocolitem{exclamation-mark}{'!'}\twocolitem{quotation-mark}{'"'}\twocolitem{number-sign}{'\#'}\twocolitem{dollar-sign}{'\$'}\twocolitem{percent-sign}{'\%'}\twocolitem{ampersand}{'\&'}\twocolitem{apostrophe}{'$\backslash$''}\twocolitem{left-parenthesis}{'('}\twocolitem{right-parenthesis}{')'}\twocolitem{asterisk}{'*'}\twocolitem{plus-sign}{'+'}\twocolitem{comma}{','}\twocolitem{hyphen}{'-'}\twocolitem{hyphen-minus}{'-'}\twocolitem{period}{'.'}\twocolitem{full-stop}{'.'}\twocolitem{slash}{'/'}\twocolitem{solidus}{'/'}\twocolitem{zero}{'0'}\twocolitem{one}{'1'}\twocolitem{two}{'2'}\twocolitem{three}{'3'}\twocolitem{four}{'4'}\twocolitem{five}{'5'}\twocolitem{six}{'6'}\twocolitem{seven}{'7'}\twocolitem{eight}{'8'}\twocolitem{nine}{'9'}\twocolitem{colon}{':'}\twocolitem{semicolon}{';'}\twocolitem{less-than-sign}{'<'}\twocolitem{equals-sign}{'='}\twocolitem{greater-than-sign}{'>'}\twocolitem{question-mark}{'?'}\twocolitem{commercial-at}{'@'}\twocolitem{left-square-bracket}{'$[$'}\twocolitem{backslash}{'$\backslash$'}\twocolitem{reverse-solidus}{'$\backslash$'}\twocolitem{right-square-bracket}{'$]$'}\twocolitem{circumflex}{'\caret'}\twocolitem{circumflex-accent}{'\caret'}\twocolitem{underscore}{'\_'}\twocolitem{low-line}{'\_'}\twocolitem{grave-accent}{'`'}\twocolitem{left-brace}{'\{'}\twocolitem{left-curly-bracket}{'\{'}\twocolitem{vertical-line}{'$|$'}\twocolitem{right-brace}{'\}'}\twocolitem{right-curly-bracket}{'\}'}\twocolitem{tilde}{'\destruct{}'}\twocolitem{DEL}{'$\backslash$177'}\end{twocollist}
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -