unit-regex.html

来自「Scheme跨平台编译器」· HTML 代码 · 共 52 行
HTML
52 行
<html><head><title>CHICKEN User's Manual - Unit regex</title></head><body><p> </p><a name="unit-regex"></a><h1>Unit regex</h1><p>This library unit provides support for regular expressions. The regular  expression package used is <tt>PCRE</tt> (<em>Perl Compatible Regular Expressions</em>)  written by Philip Hazel. See <a href="http://www.pcre.org" class="external">http://www.pcre.org</a> for information about the particular regexp flavor and extensions provided by this library.</p><p>To test that PCRE support has been built into Chicken properly, try:</p><PRE>(require 'regex)(feature? 'pcre) <B><FONT COLOR="#A020F0">=&gt;</FONT></B> #t</PRE><a name="grep"></a><h2>grep</h2><pre>[procedure] (grep REGEX LIST)</pre><p>Returns all items of <tt>LIST</tt> that match the regular expression <tt>REGEX</tt>.  This procedure could be defined as follows:</p><PRE>(<B><FONT COLOR="#A020F0">define</FONT></B> (<B><FONT COLOR="#0000FF">grep</FONT></B> regex lst)  (filter (<B><FONT COLOR="#A020F0">lambda</FONT></B> (x) (string-search regex x)) lst) )</PRE><a name="glob-regexp"></a><h2>glob&rarr;regexp</h2><pre>[procedure] (glob-&gt;regexp PATTERN)</pre><p>Converts the file-pattern <tt>PATTERN</tt> into a regular expression.</p><PRE>(glob-&gt;regexp <B><FONT COLOR="#BC8F8F">&quot;foo.*&quot;</FONT></B>)<B><FONT COLOR="#A020F0">=&gt;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;foo\..*&quot;</FONT></B></PRE><p><tt>PATTERN</tt> should follow "glob" syntax. Allowed wildcards are</p><pre>*[C...][C1-C2][-C...]?</pre><a name="glob"></a><h2>glob?</h2><pre>[procedure] (glob? STRING)</pre><p>Does the <tt>STRING</tt> have any "glob" wildcards?</p><p>A string without any "glob" wildcards does not meet the criteria, even though it technically is a valid "glob" file-pattern.</p><a name="regexp"></a><h2>regexp</h2><pre>[procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])</pre><p>Returns a precompiled regular expression object for <tt>string</tt>. The optional arguments <tt>IGNORECASE</tt>, <tt>IGNORESPACE</tt> and <tt>UTF8</tt> specify whether the regular expression should be matched with case- or whitespace-differences ignored, or whether the string should be treated as containing UTF-8 encoded characters, respectively.</p><a name="regexp"></a><h2>regexp*</h2><pre>[procedure] (regexp* STRING [OPTIONS [TABLES]])</pre><p>Returns a precompiled regular expression object for <tt>string</tt>. The optional argument <tt>OPTIONS</tt> must be a list of option symbols. The optional argument <tt>TABLES</tt> must be a character definitions table (not defined here).</p><p>Option Symbols:</p><dl><dt>caseless</dt><dd>Character case insensitive match</dd><dt>multiline</dt><dd>Equivalent to Perl's /m option</dd><dt>dotall</dt><dd>Equivalent to Perl's /s option</dd><dt>extended</dt><dd>Ignore whitespace</dd><dt>anchored</dt><dd>Anchor pattern match</dd><dt>dollar-endonly</dt><dd>`$' metacharacter in the pattern matches only at the end of the subject string</dd><dt>extra</dt><dd>Currently of very little use</dd><dt>notbol</dt><dd>First character of the string is not the beginning of a line</dd><dt>noteol</dt><dd>End of the string is not the end of a line</dd><dt>ungreedy</dt><dd>Inverts the "greediness" of the quantifiers so that they are not greedy by default</dd><dt>notempty</dt><dd>The empty string is not considered to be a valid match</dd><dt>utf8</dt><dd>UTF-8 encoded characters</dd><dt>no-auto-capture</dt><dd>Disables the use of numbered capturing parentheses</dd><dt>no-utf8-check</dt><dd>Skip valid UTF-8 sequence check</dd><dt>auto-callout</dt><dd>Automatically inserts callout items (not defined here)</dd><dt>partial</dt><dd>Partial match ok</dd><dt>firstline</dt><dd>An unanchored pattern is required to match before or at the first newline</dd><dt>dupnames</dt><dd>Names used to identify capturing subpatterns need not be unique</dd><dt>newline-cr</dt><dd>Newline definition is `\r'</dd><dt>newline-lf</dt><dd>Newline definition is `\n'</dd><dt>newline-crlf</dt><dd>Newline definition is `\r\n'</dd><dt>newline-anycrlf</dt><dd>Newline definition is any of `\r', `\n', or `\r\n'</dd><dt>newline-any</dt><dd>Newline definition is any Unicode newline sequence</dd><dt>bsr-anycrlf</dt><dd>`\R' escape sequence matches only CR, LF, or CRLF</dd><dt>bsr-unicode</dt><dd>`\R' escape sequence matches only Unicode newline sequence</dd><dt>dfa-shortest</dt><dd>Currently unused</dd><dt>dfa-restart</dt><dd>Currently unused</dd></dl><a name="regexp"></a><h2>regexp?</h2><pre>[procedure] (regexp? X)</pre><p>Returns <tt>#t</tt> if <tt>X</tt> is a precompiled regular expression, or <tt>#f</tt> otherwise.</p><a name="regexp-optimize"></a><h2>regexp-optimize</h2><pre>[procedure] (regexp-optimize RX)</pre><p>Perform available optimizations for the precompiled regular expression <tt>RX</tt>. Returns <tt>#t</tt> when optimization performed, and <tt>#f</tt> otherwise.</p><a name="string-match"></a><h2>string-match</h2><a name="string-match-positions"></a><h2>string-match-positions</h2><pre>[procedure] (string-match REGEXP STRING [START])[procedure] (string-match-positions REGEXP STRING [START])</pre><p>Matches the regular expression in <tt>REGEXP</tt> (a string or a precompiled regular expression) with <tt>STRING</tt> and returns either <tt>#f</tt> if the match failed, or a list of matching groups, where the first element is the complete match. If the optional argument <tt>START</tt> is supplied, it specifies the starting position in <tt>STRING</tt>.  For each matching group the result-list contains either: <tt>#f</tt> for a non-matching but optional group; a list of start- and end-position of the match in <tt>STRING</tt> (in the case of <tt>string-match-positions</tt>); or the matching substring (in the case of <tt>string-match</tt>). Note that the exact string is matched. For searching a pattern inside a string, see below. Note also that <tt>string-match</tt> is implemented by calling <tt>string-search</tt> with the regular expression wrapped in <tt>^ ... $</tt>. If invoked with a precompiled regular expression argument (by using <tt>regexp</tt>), <tt>string-match</tt> is identical to <tt>string-search</tt>.</p><a name="string-search"></a><h2>string-search</h2><a name="string-search-positions"></a><h2>string-search-positions</h2><pre>[procedure] (string-search REGEXP STRING [START [RANGE]])[procedure] (string-search-positions REGEXP STRING [START [RANGE]])</pre><p>Searches for the first match of the regular expression in <tt>REGEXP</tt> with <tt>STRING</tt>. The search can be limited to <tt>RANGE</tt> characters.</p><a name="string-split-fields"></a><h2>string-split-fields</h2><pre>[procedure] (string-split-fields REGEXP STRING [MODE [START]])</pre><p>Splits <tt>STRING</tt> into a list of fields according to <tt>MODE</tt>, where <tt>MODE</tt> can be the keyword <tt>#:infix</tt> (<tt>REGEXP</tt> matches field separator), the keyword <tt>#:suffix</tt> (<tt>REGEXP</tt> matches field terminator) or <tt>#t</tt> (<tt>REGEXP</tt> matches field), which is the default.</p><PRE>(<B><FONT COLOR="#A020F0">define</FONT></B> <B><FONT COLOR="#0000FF">s</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;this is a string 1, 2, 3,&quot;</FONT></B>)(string-split-fields <B><FONT COLOR="#BC8F8F">&quot;[^ ]+&quot;</FONT></B> s)  <B><FONT COLOR="#A020F0">=&gt;</FONT></B> (<B><FONT COLOR="#BC8F8F">&quot;this&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;is&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;a&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;string&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;1,&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;2,&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;3,&quot;</FONT></B>)(string-split-fields <B><FONT COLOR="#BC8F8F">&quot; &quot;</FONT></B> s #:infix)  <B><FONT COLOR="#A020F0">=&gt;</FONT></B> (<B><FONT COLOR="#BC8F8F">&quot;this&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;is&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;a&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;string&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;1,&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;2,&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;3,&quot;</FONT></B>)(string-split-fields <B><FONT COLOR="#BC8F8F">&quot;,&quot;</FONT></B> s #:suffix)   <B><FONT COLOR="#A020F0">=&gt;</FONT></B> (<B><FONT COLOR="#BC8F8F">&quot;this is a string 1&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot; 2&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot; 3&quot;</FONT></B>)</PRE><a name="string-substitute"></a><h2>string-substitute</h2><pre>[procedure] (string-substitute REGEXP SUBST STRING [MODE])</pre><p>Searches substrings in <tt>STRING</tt> that match <tt>REGEXP</tt> and substitutes them with the string <tt>SUBST</tt>. The substitution can contain references to subexpressions in  <tt>REGEXP</tt> with the <tt>\NUM</tt> notation, where <tt>NUM</tt> refers to the NUMth parenthesized expression. The optional argument <tt>MODE</tt> defaults to 1 and specifies the number of the match to be substituted. Any non-numeric index specifies that all matches are to be substituted.</p><PRE>(string-substitute <B><FONT COLOR="#BC8F8F">&quot;([0-9]+) (eggs|chicks)&quot;</FONT></B>                   <B><FONT COLOR="#BC8F8F">&quot;\\2 (\\1)&quot;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;99 eggs or 99 chicks&quot;</FONT></B> 2)<B><FONT COLOR="#A020F0">=&gt;</FONT></B> <B><FONT COLOR="#BC8F8F">&quot;99 eggs or chicks (99)&quot;</FONT></B></PRE><p>Note that a regular expression that matches an empty string will signal an error.</p><a name="string-substitute"></a><h2>string-substitute*</h2><pre>[procedure] (string-substitute* STRING SMAP [MODE])</pre><p>Substitutes elements of <tt>STRING</tt> with <tt>string-substitute</tt> according to <tt>SMAP</tt>. <tt>SMAP</tt> should be an association-list where each element of the list is a pair of the form <tt>(MATCH . REPLACEMENT)</tt>. Every occurrence of the regular expression <tt>MATCH</tt> in <tt>STRING</tt> will be replaced by the string <tt>REPLACEMENT</tt></p><PRE>(string-substitute* <B><FONT COLOR="#BC8F8F">&quot;&lt;h1&gt;Hello, world!&lt;/h1&gt;&quot;</FONT></B>                    '((<B><FONT COLOR="#BC8F8F">&quot;&lt;[/A-Za-z0-9]+&gt;&quot;</FONT></B> . <B><FONT COLOR="#BC8F8F">&quot;&quot;</FONT></B>)))<B><FONT COLOR="#A020F0">=&gt;</FONT></B>  <B><FONT COLOR="#BC8F8F">&quot;Hello, world!&quot;</FONT></B></PRE><a name="regexp-escape"></a><h2>regexp-escape</h2><pre>[procedure] (regexp-escape STRING)</pre><p>Escapes all special characters in <tt>STRING</tt> with <tt>\</tt>, so that the string can be embedded into a regular expression.</p><PRE>(regexp-escape <B><FONT COLOR="#BC8F8F">&quot;^[0-9]+:.*$&quot;</FONT></B>)<B><FONT COLOR="#A020F0">=&gt;</FONT></B>  <B><FONT COLOR="#BC8F8F">&quot;\\^\\[0-9\\]\\+:.\n.\\*\\$&quot;</FONT></B></PRE><a name="make-anchored-pattern"></a><h2>make-anchored-pattern</h2><pre>[procedure] (make-anchored-pattern REGEXP [WITHOUT-BOL [WITHOUT-EOL]])</pre><p>Makes an anchored pattern from <tt>REGEXP</tt> (a string or a precompiled regular expression) and returns the updated pattern. When <tt>WITHOUT-BOL</tt> is <tt>#t</tt> the beginning-of-line anchor is not added. When <tt>WITHOUT-EOL</tt> is <tt>#t</tt> the end-of-line anchor is not added.</p><p>The <tt>WITHOUT-BOL</tt> and {WITHOUT-EOL}} arguments are ignored for a precompiled regular expression.</p><p>Previous: <a href="unit-match.html" class="internal">Unit match</a></p><p>Next: <a href="unit-srfi-18.html" class="internal">Unit srfi-18</a></p></body></html>
unit-regex.html - 源码说明

本页面展示了「Scheme跨平台编译器」中的 unit-regex.html 源码文件，采用 HTML 编程语言编写，共 52 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Scheme相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?