📄 促使我写此正则表达式解析库的由来.htm
字号:
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<title>促使我写此正则表达式解析库的由来</title>
<style>
<!--
/* Font Definitions */
@font-face
{font-family:宋体;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"\@宋体";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-justify:inter-ideograph;
font-size:10.5pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;}
/* Page Definitions */
@page Section1
{size:595.3pt 841.9pt;
margin:72.0pt 10.3pt 72.0pt 18.0pt;
layout-grid:15.6pt;}
div.Section1
{page:Section1;}
-->
</style>
</head>
<body lang=ZH-CN link=blue vlink=purple style='text-justify-trim:punctuation'>
<div class=Section1 style='layout-grid:15.6pt'>
<p class=MsoNormal><span lang=EN-US><a href="#初衷"><span style='font-family:
宋体'>初衷</span></a></span></p>
<p class=MsoNormal><span lang=EN-US><a href="#我想说的"><span style='font-family:
宋体'>我想说的</span></a></span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><a name=初衷><span style='font-family:宋体'>大家好</span></a><span
style='font-family:宋体'>!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>我所知道的正则表达式库有:</span><span lang=EN-US>boost</span><span
style='font-family:宋体'>的,</span><span lang=EN-US>GNU</span><span
style='font-family:宋体'>的,</span><span lang=EN-US>VC7</span><span
style='font-family:宋体'>带的</span><span lang=EN-US>ATL</span><span
style='font-family:宋体'>中的和微软发布的</span><span lang=EN-US>greta</span><span
style='font-family:宋体'>。我使用过后三种,</span><span lang=EN-US>greta</span><span
style='font-family:宋体'>使用时间最短(才两天)。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>现在我来说说我的感受:</span></p>
<p class=MsoNormal><span lang=EN-US> GNU</span><span
style='font-family:宋体'>的正则表达式根本就不支持多字节码,设置连</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋体'>都不支持,在</span><span lang=EN-US>parse</span><span
style='font-family:宋体'>阶段就会非法操作。在软件全球化的今天,实在不是一个好现象。优点是支持的语法完备。</span></p>
<p class=MsoNormal><span lang=EN-US> ATL</span><span
style='font-family:宋体'>中的正则表达式不完全支持多字节码,可以完善的支持</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋体'>。不过,此正则表达式书写非常清晰,没有用到</span><span lang=EN-US>STL</span><span
style='font-family:宋体'>里面任何高深的东西,也没有用到模板中特别高深的东西</span><span lang=EN-US>(</span><span
style='font-family:宋体'>我认为这才是</span><span lang=EN-US>C++</span><span
style='font-family:宋体'>的发展之道,毕竟,聪明人是少数——大部分是平庸的人,曲高寡合,总有一天会被大多数程序员抛弃</span><span
lang=EN-US>,</span><span style='font-family:宋体'>剩下一帮高手顾影自怜),所以,通过非常微小和容易的更改就可以完善支持多字节码。缺点是不支持</span><span
lang=EN-US>{n,m}</span><span style='font-family:宋体'>语法,不支持递归语法,如:</span><span
lang=EN-US>"([^\\"]*(\\.)*[^\\"]*)*"</span><span
style='font-family:宋体'>。最后一个</span><span lang=EN-US>*</span><span
style='font-family:宋体'>是不被支持的。</span></p>
<p class=MsoNormal><span lang=EN-US> greta</span><span
style='font-family:宋体'>能完善的支持单字节码和</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋体'>,语法也完善,而且据说普遍情况下速度也快,不过,把部分实现放</span><span lang=EN-US>cpp</span><span
style='font-family:宋体'>里导致不能同时使用单字节码和</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋体'>编码,</span><span lang=EN-US>posix</span><span
style='font-family:宋体'>和</span><span lang=EN-US>perl</span><span
style='font-family:宋体'>语法,解决办法还算简单:把</span><span lang=EN-US>cpp</span><span
style='font-family:宋体'>改名为</span><span lang=EN-US>inl</span><span
style='font-family:宋体'>,在</span><span lang=EN-US>.h</span><span
style='font-family:宋体'>里</span><span lang=EN-US>include</span><span
style='font-family:宋体'>这个</span><span lang=EN-US>inl</span><span
style='font-family:宋体'>,再修改一点别的东西就可。问题是,它没有支持多字节码的实现,我仔细看看了,似乎通过自己写一个多字节码的迭代子,可以解决这个问题,因为他支持</span><span
lang=EN-US>basic_string</span><span style='font-family:宋体'>。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>接下来的问题是:</span><span lang=EN-US>STL</span><span
style='font-family:宋体'>如何支持多字节码的?我没有在</span><span lang=EN-US>SGI-STL</span><span
style='font-family:宋体'>,</span><span lang=EN-US>STLPort453</span><span
style='font-family:宋体'>中找到关于多字节码的东西。</span><span lang=EN-US>basic_string</span><span
style='font-family:宋体'>默认只实现了</span><span lang=EN-US>char,wchar_t</span><span
style='font-family:宋体'>的</span><span lang=EN-US>base_string</span><span
style='font-family:宋体'>。而要自己实现一个迭代子,我又不知道如何下手。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>我现在的需求是:</span><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>需要正则表达式支持类似这样的语法:“</span><span lang=EN-US>/</span><span
style='font-family:宋体'>汉字</span><span lang=EN-US>[</span><span
style='font-family:宋体'> </span><span lang=EN-US> ]+[^</span><span
style='font-family:宋体'> ,</span><span lang=EN-US> ,]+[</span><span
style='font-family:宋体'> </span><span lang=EN-US> ]*[</span><span
style='font-family:宋体'>,</span><span lang=EN-US>,][</span><span
style='font-family:宋体'> </span><span lang=EN-US> ]*[^</span><span
style='font-family:宋体'> ,</span><span lang=EN-US> ,]+</span><span
style='font-family:宋体'>”以匹配“/汉字 兰征鹏 ,正则表达式”。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>使用</span><span lang=EN-US>STL</span><span
style='font-family:宋体'>进行字符串搜索都有问题,比如在一篇文章中搜索“正则”,很可能就把三个汉字的中间四个字节匹配上了。出现这样的情况,让人哭笑不得。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋体'>有这方面经验的或对</span><span lang=EN-US>STL</span><span
style='font-family:宋体'>比较熟悉的同仁,请勿吝啬指导</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'> 致</span></p>
<p class=MsoNormal><span style='font-family:宋体'>礼!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'> </span><span
lang=EN-US>lanzhengpeng</span></p>
<p class=MsoNormal><span style='font-family:宋体'> </span><span
lang=EN-US>2004-06-02</span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span style='font-family:宋体'>在</span><span
lang=EN-US>C/C++</span><span style='font-family:宋体'>中如果想要使用与</span><span
lang=EN-US>Perl</span><span style='font-family:宋体'>兼容的</span><span lang=EN-US>regexp</span><span
style='font-family:宋体'>库,一个选择是</span><span lang=EN-US>Boost</span><span
style='font-family:宋体'>,另一个选择是</span><span lang=EN-US>PCRE</span></p>
<p class=MsoNormal><span style='font-family:宋体'>库。</span><span lang=EN-US>Boost</span><span
style='font-family:宋体'>中的</span><span lang=EN-US>regex</span><span
style='font-family:宋体'>算法最近做了改近,平均效率比以前的版本提高了</span><span lang=EN-US>10</span><span
style='font-family:宋体'>倍,不过用起</span></p>
<p class=MsoNormal><span style='font-family:宋体'>来可能比较麻烦。</span><span
lang=EN-US>PCRE</span><span style='font-family:宋体'>已经很成熟了,</span><span
lang=EN-US>Apache/Postfix/PHP/Python</span><span style='font-family:宋体'>都用它。我认为应</span></p>
<p class=MsoNormal><span style='font-family:宋体'>该优先考虑。不过我自己没有在</span><span
lang=EN-US>Windows</span><span style='font-family:宋体'>下编译过,不是很有把握。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>See www.pcre.org</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'>我个人很喜欢</span><span lang=EN-US>Ruby</span><span
style='font-family:宋体'>中的正则表达式功能,功能强,速度也很不错。因为</span><span lang=EN-US>Ruby</span><span
style='font-family:宋体'>是日本人发</span></p>
<p class=MsoNormal><span style='font-family:宋体'>明的,处理东亚大字符集没有任何问题。</span><span
lang=EN-US>Ruby</span><span style='font-family:宋体'>与</span><span lang=EN-US>C/C++</span><span
style='font-family:宋体'>接口很容易,但是为了这个小功</span></p>
<p class=MsoNormal><span style='font-family:宋体'>能加入</span><span lang=EN-US>Ruby</span><span
style='font-family:宋体'>,似乎有点小题大做了。</span><span lang=EN-US>Perl</span><span
style='font-family:宋体'>我不熟悉。</span><span lang=EN-US>Lua</span><span
style='font-family:宋体'>独创了一套模式匹配语法,而</span></p>
<p class=MsoNormal><span style='font-family:宋体'>且</span><span lang=EN-US>Lua</span><span
style='font-family:宋体'>天生就是要嵌入到</span><span lang=EN-US>C/C++</span><span
style='font-family:宋体'>中去的,性能比</span><span lang=EN-US>Perl/Ruby/Python</span><span
style='font-family:宋体'>都快的多。</span><span lang=EN-US>Lua</span><span
style='font-family:宋体'>的模式</span></p>
<p class=MsoNormal><span style='font-family:宋体'>匹配语法有点怪,解决</span><span
lang=EN-US>lanzhengpeng</span><span style='font-family:宋体'>的问题好像是足够的,不过跟标准</span><span
lang=EN-US>regex</span><span style='font-family:宋体'>语法完全</span></p>
<p class=MsoNormal><span style='font-family:宋体'>不同。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'>我个人的感觉,不如静下心来写一个</span><span
lang=EN-US>iterator</span><span style='font-family:宋体'>,应该是很容易的。不过我也很久没干过</span></p>
<p class=MsoNormal><span style='font-family:宋体'>这种事情了,也就泛泛的说说算了。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'>孟岩</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋体'>发件人</span><span lang=EN-US>: kyo</span></p>
<p class=MsoNormal><span style='font-family:宋体'>发送时间</span><span lang=EN-US>:
2004</span><span style='font-family:宋体'>年</span><span lang=EN-US>6</span><span
style='font-family:宋体'>月</span><span lang=EN-US>2</span><span style='font-family:
宋体'>日</span><span lang=EN-US> 11:19</span></p>
<p class=MsoNormal><span style='font-family:宋体'>收件人</span><span lang=EN-US>:
'C++ Discuss Group'</span></p>
<p class=MsoNormal><span style='font-family:宋体'>主题</span><span lang=EN-US>: RE:
[cpp]</span><span style='font-family:宋体'>正则表达式和多字节码的问题</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -