158.html

来自「Python Ebook Python&XML」· HTML 代码 · 共 96 行

HTML
96
字号

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Robots" content="INDEX,NOFOLLOW">
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<TITLE>Safari | Python Developer's Handbook -&gt; Code Examples</TITLE>
<LINK REL="stylesheet" HREF="oreillyi/oreillyN.css">
</HEAD>
<BODY bgcolor="white" text="black" link="#990000" vlink="#990000" alink="#990000" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">

<table width="100%" cellpadding=5 cellspacing=0 border=0 class="navtopbg"><tr><td><font size="1"><p class="navtitle"><a href="8.html" class="navtitle">Web Development</a> &gt; <a href="0672319942.html" class="navtitle">Python Developer's Handbook</a> &gt; <a href="148.html" class="navtitle">9. Other Advanced Topics</a> &gt; <span class="nonavtitle">Code Examples</span></p></font></td><td align="right" valign="top" nowrap><font size="1"><a href="main.asp?list" class="safnavoff">See All Titles</a></font></td></tr></table>
<TABLE width=100% bgcolor=white border=0 cellspacing=0 cellpadding=5><TR><TD>
<TABLE border=0 width="100%" cellspacing=0 cellpadding=0><TR><td align=left width="15%" class="headingsubbarbg"><a href="157.html" title="Summary"><font size="1">&lt;&nbsp;BACK</font></a></td><td align=center width="70%" class="headingsubbarbg"><font size="1"><a href="popanote.asp?pubui=oreilly&bookname=0672319942&snode=158" target="_blank" title="Make a public or private annnotation">Make Note</a> | <a href="158.html" title="Use a Safari bookmark to remember this section">Bookmark</a></font></td><td align=right width="15%" class="headingsubbarbg"><a href="160.html" title="III: Network Programming"><font size="1">CONTINUE&nbsp;&gt;</font></a></td></TR></TABLE>
<a href="5%2F31%2F2002+4%3A41%3A11+PM.html" TABINDEX="-1"><img src=images/spacer.gif border=0 width=1 height=1></a><font color=white size=1>152015024128143245168232148039199167010047123209178152124239215162148045048070068078146247</font><a href="read4.asp?bookname=0672319942&snode=158&now=5%2F31%2F2002+4%3A41%3A11+PM" TABINDEX="-1"><img src=images/spacer.gif border=0 width=1 height=1></a><br>
<FONT>
				<h3>Code Examples</h3>
				<p>Next, you have some code examples that demonstrate the concepts illustrated by this chapter.</p>

				
					<H4>
				
				
				
				
				HTML Parsing Tool (File: <TT Class="monofont">parsing.py</TT>)</H4>
					<P>We are going to use the <tt clASS="monofont">exchange.html</Tt> as the source of information for this program. The idea is to read the file, replace all the occurrences of the domain name <tt class="monofont"> "lessaworld"</tt> for <tt class="monofont">"bebemania",</tt> and add hyperlinks for all email and Web pages references that exist there.</p>

					
						<H5>
Listing 9.1 File: <tt ClasS="monofont">exchange.html</tt>
						</h5>
						<pRe clASS="monofont">
&lt;HTML&gt;
&lt;HEAD&gt;
&lt;TITLE&gt;Exchange Rates Home Page&lt;/TITLE&gt;
&lt;/HEAD&gt;
&lt;BODY&gt;
&lt;p align=justify&gt;
&lt;b&gt;List of current files that we have available at this site:&lt;/b&gt;&lt;/p&gt;
&lt;br&gt;
http://www.lessaworld.com/exchange/real.txt &lt;br&gt;
http://www.lessaworld.com/exchange/pound.txt &lt;br&gt;
http://www.lessaworld.com/exchange/dollar.txt &lt;br&gt;&lt;br&gt;

Many people are currently working to keep these exchange rates updated.&lt;br&gt;
Andre (andre@bebemania.com.br) handles all the Brazilian Real operations,
 meanwhile,Joao Pedro (jp@bebemania.com.br) takes care of pounds and
 dollars.&lt;br&gt;&lt;br&gt;

&lt;/BODY&gt;
&lt;/HTML&gt;
</Pre>
					
					<p>The following code implements the parsing program.</p>

					
						<H5>
Listing 9.2 File: <TT Class="monofont">parsing.py</TT>
						</H5>
						<Pre clASS="monofont">
 1:
 2: import re, sys
 3:
 4: TextOriginal = open("exchange.html").read()
 5:
 6: TextIn = re.sub("lessaworld", "bebemania", TextOriginal)
 7:
 8: operation_result = re.search(r'&lt;title&gt;(.*?)&lt;/title&gt;', TextIn
    ,re.IGNORECASE)
 9: if operation_result:
10:     HTML_TITLE = operation_result.group(1)
11:
12: link_pattern = re.compile(r'((ftp|http)://[\w-]+(?:\.[\w-]+)*(?:/[\w-]*)*
                                   (?:\.[\w-]*)*)')
13: links = re.findall(link_pattern, TextIn)
14: TextIn = re.sub(link_pattern, r"&lt;a href=\1&gt;\1&lt;/a&gt;", TextIn)
15:
16: email_pattern = re.compile(r'([a-zA-Z][\w-]*@[\w-]+(?:\.[\w-]+)*)')
17: emails = re.findall(email_pattern, TextIn)
18: TextIn = re.sub(email_pattern, r"&lt;a href=mailto:\1&gt;\1&lt;/a&gt;", TextIn)
19:
20: FileOut = open("newexchange.html", "w")
21: FileOut.write(TextIn)
22: FileOut.close()
23:
24: print '"%s" is done.'% (HTML_TITLE)
</Pre>
					
					<p>Line 4: Opens and reads the original file.</p>

					<p>Line 6: Replaces occurrences of <tt class="monofont">"lessaworld"</tt> with <tt clasS="monofont">"bebemania".</tt>
					</P>

					<p>Lines 8

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?