📄 促使我写此正则表达式解析库的由来.htm
字号:
style='font-family:宋体'>”可匹配“</span><span lang=EN-US>\n</span><span
style='font-family:宋体'>”),“</span><span lang=EN-US>^</span><span
style='font-family:宋体'>”可否写在表达式的中间,如此必然没有任何可匹配的东西。这个时候是否作为语法错误?</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>二、在多行模式下,“</span><span lang=EN-US>^</span><span
style='font-family:宋体'>”是作为匹配的起始地址还是说匹配“</span><span lang=EN-US>\n</span><span
style='font-family:宋体'>”的后一个字符?我比较趋向于匹配起始地址</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>三、在多行模式下,“</span><span lang=EN-US>$</span><span
style='font-family:宋体'>”可否匹配匹配串的结束?还是只匹配“</span><span lang=EN-US>\n|\r\n</span><span
style='font-family:宋体'>”</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>四、“</span><span lang=EN-US>^$</span><span
style='font-family:宋体'>”是否作为语法错误处理?如果不作为语法错误,是匹配一个空串还是什么都不匹配?</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>这些疑问,在各个正则表达式库或各个工具</span><span lang=EN-US>(</span><span
style='font-family:宋体'>如</span><span lang=EN-US>VC</span><span
style='font-family:宋体'>,</span><span lang=EN-US>UEdit)</span><span
style='font-family:宋体'>都有不同的处理方案。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>另外</span><span lang=EN-US>,</span><span
style='font-family:宋体'>我暂时没有找到正则表达式详细介绍的资料(如</span><span lang=EN-US>rfc</span><span
style='font-family:宋体'>或编译原理的书籍),烦请哪位达人提供一份资料,中文最好,英语也可,其它语言不考虑。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>= = = = = = = = = = = = = = = = = = = =</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'> 致</span></p>
<p class=MsoNormal><span style='font-family:宋体'>礼!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'> </span><span
lang=EN-US>lanzhengpeng</span></p>
<p class=MsoNormal><span style='font-family:宋体'> </span><span
lang=EN-US>2004-06-05</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>> >> ======= 2004-06-02 11:18:58 </span><span
style='font-family:宋体'>您在来信中写道:</span><span lang=EN-US>=======</span></p>
<p class=MsoNormal><span lang=EN-US>> >> </span></p>
<p class=MsoNormal><span lang=EN-US>> >> > </span><span
style='font-family:宋体'>关于</span><span lang=EN-US>C++</span><span
style='font-family:宋体'>汉字查找的问题最近大话西游也遇到,因为要限制经济频道里的说话必须包含“卖”。要精确判断的</span></p>
<p class=MsoNormal><span lang=EN-US>> >> ></span><span
style='font-family:宋体'>话,需要先把</span><span lang=EN-US>char*</span><span
style='font-family:宋体'>或</span><span lang=EN-US>string</span><span
style='font-family:宋体'>的字符串先用</span><span lang=EN-US>MultiByteToWideChar</span><span
style='font-family:宋体'>转为</span><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>> WCHAR</span><span style='font-family:
宋体'>或</span><span lang=EN-US>wstring, </span><span style='font-family:宋体'>然后再查找。</span></p>
<p class=MsoNormal><span lang=EN-US>> >> </span><span
style='font-family:宋体'>这样只能判断有和无,实际上我需要精确位置。</span></p>
<p class=MsoNormal><span lang=EN-US>> ></span></p>
<p class=MsoNormal><span lang=EN-US>> ></span><span style='font-family:
宋体'>是可以精确查找的呀。</span></p>
<p class=MsoNormal><span lang=EN-US>> </span><span
style='font-family:宋体'>我曾经做过一个小工具,提取并修改代码中的文字部分,并将文字汇总到一个文件里,需要本地化的时候,修改这个文件就可。比如:</span></p>
<p class=MsoNormal><span lang=EN-US>> LANGUAGE(0,"</span><span
style='font-family:宋体'>我曾经做过一个小工具</span><span lang=EN-US>");</span></p>
<p class=MsoNormal><span lang=EN-US>> </span></p>
<p class=MsoNormal><span lang=EN-US>> </span><span style='font-family:宋体'>正则表达式应该很容易抽取出</span><span
lang=EN-US>0</span><span style='font-family:宋体'>,并且将</span><span lang=EN-US>0</span><span
style='font-family:宋体'>替换成一个其他的数值(就是后面的字符在文件中排序的编号)。如果转换过后,我怎么知道原来的位置</span></p>
<p class=MsoNormal><span style='font-family:宋体'>呢?</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>wchar </span><span style='font-family:宋体'>查找到位置后,然后把它逐个转换为</span><span
lang=EN-US>char</span><span style='font-family:宋体'>不就知道了嘛,哈哈</span> <span
style='font-family:宋体'>:</span><span lang=EN-US>B</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'>我是初学者,</span> <span
style='font-family:宋体'>也来添乱:</span></p>
<p class=MsoNormal><span style='font-family:宋体'>一</span><span lang=EN-US>.</span><span
style='font-family:宋体'>在单行模式下</span><span lang=EN-US>^ $</span><span
style='font-family:宋体'>都只表述位置概念</span><span lang=EN-US>(</span><span
style='font-family:宋体'>开始,</span> <span style='font-family:宋体'>结束),</span> <span
style='font-family:宋体'>而不匹配具体的任何字串,</span></p>
<p class=MsoNormal><span style='font-family:宋体'>所以</span><span lang=EN-US>$</span><span
style='font-family:宋体'>当然不匹配</span><span lang=EN-US>"\n"</span><span
style='font-family:宋体'>,</span><span lang=EN-US> ^</span><span
style='font-family:宋体'>写在中间是可以匹配</span></p>
<p class=MsoNormal><span style='font-family:宋体'>的,</span> <span
style='font-family:宋体'>比如我写</span><span lang=EN-US> a?^b </span><span
style='font-family:宋体'>或者是</span><span lang=EN-US>a{0}^b</span><span
style='font-family:宋体'>时,</span><span lang=EN-US> bcd</span><span
style='font-family:宋体'>是可以匹配的,</span> <span style='font-family:宋体'>当然把开始位置的匹配</span></p>
<p class=MsoNormal><span style='font-family:宋体'>写到其他地方不是一个好习惯,</span> <span
style='font-family:宋体'>但是也许也不应该禁止吧?</span></p>
<p class=MsoNormal><span style='font-family:宋体'>二</span><span lang=EN-US>.</span><span
style='font-family:宋体'>多行模式下</span><span lang=EN-US>^</span><span
style='font-family:宋体'>匹配</span><span lang=EN-US>"\n"</span><span
style='font-family:宋体'>的下一个字符</span><span lang=EN-US>, </span><span
style='font-family:宋体'>比如</span><span lang=EN-US>^\d </span><span
style='font-family:宋体'>和</span><span lang=EN-US>abcd\n123</span><span
style='font-family:宋体'>匹配,</span> <span style='font-family:宋体'>而</span><span
lang=EN-US>$</span><span style='font-family:宋体'>匹配</span><span lang=EN-US>"\n"</span><span
style='font-family:宋体'>的</span></p>
<p class=MsoNormal><span style='font-family:宋体'>上一个字符,</span> <span
style='font-family:宋体'>比如</span><span lang=EN-US>is$ </span><span
style='font-family:宋体'>和</span><span lang=EN-US>his\nhere</span><span
style='font-family:宋体'>匹配</span></p>
<p class=MsoNormal><span style='font-family:宋体'>三</span><span lang=EN-US>.</span><span
style='font-family:宋体'>不是很理解所说的,</span> <span style='font-family:宋体'>应该是指</span><span
lang=EN-US>is$</span><span style='font-family:宋体'>是否和</span><span lang=EN-US>hr\nhis</span><span
style='font-family:宋体'>这样的字串匹配吧?</span> <span style='font-family:宋体'>答案是</span><span
lang=EN-US>yes</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>多行模式下使</span><span lang=EN-US>$</span><span
style='font-family:宋体'>舆行的末尾匹配(而不止是字符串的末尾),</span><span lang=EN-US> ^</span><span
style='font-family:宋体'>舆行的开始匹配</span></p>
<p class=MsoNormal><span style='font-family:宋体'>(而不止是字符串的开始)</span></p>
<p class=MsoNormal><span style='font-family:宋体'>四</span><span lang=EN-US>.</span><span
style='font-family:宋体'>不应该判为非法,</span> <span style='font-family:宋体'>单行模式下</span><span
lang=EN-US> ^$</span><span style='font-family:宋体'>匹配</span><span lang=EN-US>""
</span><span style='font-family:宋体'>多行模式下匹配</span><span lang=EN-US>"aaa\n\naaa"</span><span
style='font-family:宋体'>(空行</span><span lang=EN-US>)</span></p>
<p class=MsoNormal><span style='font-family:宋体'>??只用</span><span lang=EN-US>egrep</span><span
style='font-family:宋体'>测试过</span><span lang=EN-US>^_^</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'>书就有一本了,</span> <span
style='font-family:宋体'>来可欣这边就可以看到啦</span><span lang=EN-US>...</span><span
style='font-family:宋体'>《</span><span lang=EN-US>regular expression</span><span
style='font-family:宋体'>》</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><a name=我想说的><span style='font-family:宋体'>我想说的</span></a></p>
<p class=MsoNormal style='text-indent:21.0pt'><span style='font-family:宋体'>我相信,这个世界上的正则表达式解析库远远不止我知道的这些,但是,愿意开源的就很少了。</span></p>
<p class=MsoNormal><span style='font-family:宋体'>开源的库其质量也是参差不齐,或着重点不同。特别引人注目的是,在所有这些已知的代码库中,</span></p>
<p class=MsoNormal><span style='font-family:宋体'>没有一个是中国程序员写的。我希望打破这种局面,并且注重功能和支持多字节码(不仅仅是</span></p>
<p class=MsoNormal><span style='font-family:宋体'>东亚大字符集),速度是其次——因为我的功底不够,对正则表达式不是非常熟悉,对有限自动机</span></p>
<p class=MsoNormal><span style='font-family:宋体'>也没有理解透彻。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>如果你要说我这个正则表达式语法不正统,那么,我会说你长得不像人——因为我才是标准的人样。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>请勿散播造车不必从轮子造起的论调,我有很多大道理,但我不愿意写。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>公司领导对这个开放代码持有异议,不过,由于此代码库没有用在公司任何项目上,并且我坚持</span></p>
<p class=MsoNormal><span style='font-family:宋体'>在家里完成代码的编码工作(这篇文章倒是在公司机器上写的),因此,此代码库不存在其他的版权纠纷,</span></p>
<p class=MsoNormal><span style='font-family:宋体'>你可以在遵守原始作者申明的前提下放心的分发、修改和使用。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>这个代码库必然还存在我不知道的缺陷——功能上的,安全上的,易用性上的。。。。。。不管如何,</span></p>
<p class=MsoNormal><span style='font-family:宋体'>我希望你在拥有良好心态前提下,将你发现的缺陷和你的想法告诉我:</span></p>
<p class=MsoNormal><span lang=EN-US> <a
href="mailto:tearshark@eaglefly.com.cn">tearshark@eaglefly.com.cn</a></span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>兰征鹏</span></p>
<p class=MsoNormal><span lang=EN-US> 2004-6-16</span></p>
</div>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -