📄 00000004.htm
字号:
<HTML><HEAD> <TITLE>BBS水木清华站∶精华区</TITLE></HEAD><BODY><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER>发信人: starw (化缘道人), 信区: Linux <BR>标 题: Python Regular Expression HOWTO 4.4 <BR>发信站: BBS 水木清华站 (Tue Nov 21 23:48:29 2000) <BR> <BR>4.4 Other Assertions <BR> <BR>Another zero-width assertion is the lookahead assertion. Lookahead assertions <BR>are available in both positive and negative form, and look like this: <BR> <BR>(?=...) <BR> <BR>Positive lookahead assertion. This succeeds if the contained regular <BR>expression, represented here by ..., successfully matches at the current <BR>location, and fails otherwise. But, once the contained expression has been <BR>tried, the matching engine doesn't advance at all; the rest of the pattern <BR>is tried right where the assertion started. <BR> <BR>(?!...) <BR> <BR>Negative lookahead assertion. This is the opposite of the positive assertion; <BR>it succeeds if the contained expression doesn't match at the current position <BR>in the string. An example will help make this concrete, and will demonstrate <BR>a case where a lookahead is useful. Consider a simple pattern to match a <BR>filename, and split it apart into a base name and an extension, separated by <BR>a ".". For example, in "news.rc", "news"is the base name, and "rc" is the <BR>filename's extension. <BR> <BR>The pattern to match this is quite simple: .*[.].*$. (Notice that the "." <BR>needs to be treated specially because it's a metacharacter; I've put it <BR>inside a character class. Also notice the trailing $; this is added to ensure <BR>that all the rest of the string must be included in the extension.) This <BR>regular expression matches "foo.bar" and "autoexec.bat" and "sendmail.cf" <BR>and "printers.conf". <BR> <BR>Now, consider complicating the problem a bit; what if you want to match <BR>filenames where the extension is not "bat"? Some incorrect attempts: <BR> <BR>.*[.][^b].*$ <BR> <BR>First attempt: Exclude "bat" by requiring that the first character of the <BR>extension is not a "b". This is wrong, because it also doesn't match "foo.bar". <BR> <BR>.*[.]([^b]..|.[^a].|..[^t])$ <BR> <BR>The expression gets messier when you try to patch up the first solution by <BR>requiring one of the following cases must match: the first character of the <BR>extension isn't "b"; the second character isn't "a"; or the third character <BR>isn't "t". This accepts "foo.bar" and rejects "autoexec.bat", but it requires <BR>a three-letter extension, and doesn't accept "sendmail.cf". Another bug, so <BR>we'll complicate the pattern again in an effort to fix it. <BR> <BR>.*[.]([^b].?.?|.[^a]?.?|..?[^t]?)$ <BR> <BR>In the third attempt, the second and third letters are all made optional in <BR>order to allow matching extensions shorter than three characters, such as <BR>"sendmail.cf". <BR> <BR>The pattern's getting really complicated now, which makes it hard to read and <BR>understand. When you write a regular expression, ask yourself: if you <BR>encountered this expression in a program, how hard would it be to figure out <BR>what the expression was intended to do? Worse, this solution doesn't scale <BR>well; if the problem changes, and you want to exclude both "bat" and "exe" as <BR>extensions, the pattern would get still more complicated and confusing. <BR> <BR>A negative lookahead cuts through all this. Go back to the original pattern, <BR>and, before the .* which matches the extension, insert (?!bat$). This means: <BR>if the expression bat doesn't match at this point, try the rest of the <BR>pattern; if bat$ does match, the whole pattern will fail. (The trailing $ is <BR>required to ensure that something like "sample.batch", where the extension <BR>only starts with "bat", will be allowed. <BR> <BR>After this modification, the whole pattern is .*[.](?!bat$).*$. Excluding <BR>another filename extension is now easy; simply add it as an alternative <BR>inside the assertion. .*[.](?!bat$|exe$).*$ excludes both "bat" and "exe". <BR> <BR> <BR>-- <BR> <BR> 铜铁投洪冶,蝼蚁上粉墙。 <BR> 阴阳无二义,天地我中央。 <BR> <BR> <BR>※ 来源:·BBS 水木清华站 smth.org·[FROM: 202.117.27.35] <BR><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -