htmlcleanup1.pl
来自「harvest是一个下载html网页得机器人」· PL 代码 · 共 38 行
PL
38 行
#!/usr/local/bin/perl# things this script does:# 1. removes empty lines# 2. removes <B> </B> with nothing between it# 3. changes <B> foo</B> to <H2>foo</H2>while(<>) { if($ARGV ne $oldargv) { rename($ARGV, $ARGV . '.1.bak'); open(ARGVOUT, ">$ARGV"); select(ARGVOUT); $oldargv = $ARGV; } chop; next if /^$/; # kill empty lines if($suck_em && /^\<\/em\>$/) { next; } $suck_em=0; s,\<[bB]\> \</[bB]\>,,g; # kill empty annotation # if one sees a bolding on a line by itself, then turn it into a second level heading. s,^\<[bB]\> ([^<]*)\</[bB]\>$,<H2>\1</H2>,; if(s,^\<[bB]\> \<em i\>([^<]*)\</[bB]\>$,<H2>\1</H2>,) { # suck next line as well if it contains just </em> $suck_em=1; } print "$_\n";}
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?