📄 166.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Robots" content="INDEX,NOFOLLOW">
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<TITLE>Safari | Python Developer's Handbook -> Accessing URLs</TITLE>
<LINK REL="stylesheet" HREF="oreillyi/oreillyN.css">
</HEAD>
<BODY bgcolor="white" text="black" link="#990000" vlink="#990000" alink="#990000" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<table width="100%" cellpadding=5 cellspacing=0 border=0 class="navtopbg"><tr><td><font size="1"><p class="navtitle"><a href="8.html" class="navtitle">Web Development</a> > <a href="0672319942.html" class="navtitle">Python Developer's Handbook</a> > <a href="161.html" class="navtitle">10. Basic Network Background</a> > <span class="nonavtitle">Accessing URLs</span></p></font></td><td align="right" valign="top" nowrap><font size="1"><a href="main.asp?list" class="safnavoff">See All Titles</a></font></td></tr></table>
<TABLE width=100% bgcolor=white border=0 cellspacing=0 cellpadding=5><TR><TD>
<TABLE border=0 width="100%" cellspacing=0 cellpadding=0><TR><td align=left width="15%" class="headingsubbarbg"><a href="165.html" title="HTTP"><font size="1">< BACK</font></a></td><td align=center width="70%" class="headingsubbarbg"><font size="1"><a href="popanote.asp?pubui=oreilly&bookname=0672319942&snode=166" target="_blank" title="Make a public or private annnotation">Make Note</a> | <a href="166.html" title="Use a Safari bookmark to remember this section">Bookmark</a></font></td><td align=right width="15%" class="headingsubbarbg"><a href="167.html" title="FTP"><font size="1">CONTINUE ></font></a></td></TR></TABLE>
<a href="5%2F31%2F2002+4%3A42%3A13+PM.html" TABINDEX="-1"><img src=images/spacer.gif border=0 width=1 height=1></a><font color=white size=1>152015024128143245168232148039199167010047123209178152124239215162148046198039088135025208</font><a href="read9.asp?bookname=0672319942&snode=166&now=5%2F31%2F2002+4%3A42%3A13+PM" TABINDEX="-1"><img src=images/spacer.gif border=0 width=1 height=1></a><br>
<FONT>
<h3>
Accessing URLs</h3>
<p>URL stands for <i>uniform resource locator.</I> URLs are those strings, such as <A TArget="_blank" HREF="http://www.lessaworld.com/">http://www.lessaworld.com/</a>, that you have to type in your Web browser in order to jump to a Web page.</p>
<p>Python provides the <tT CLAss="monofont">urllib</tt> and <tt class="monofont">urlparse</tt> modules as great tools to process URLs.</p>
<div claSs="note"><p ClasS="notetitle"><b>Tip</b></p><p>
<P>Many applications today that have to <a naME="idx1073745986"></A>
<A name="idx1073745987"></A>
<A NAme="idx1073745988"></a>
<a NAME="idx1073745989"></a>
<a naME="idx1073745990"></A>
<A name="idx1073745991"></a>
<a name="idx1073745992"></a>parse Web pages always suffer with changes in the page design. However, these problems will go away when more structural formats (such as XML) start getting used to producing the pages.</p>
</p></div>
<br>
<br>
<H4>
The <tt ClasS="monofont">urllib</tt> Module</h4>
<p>The <Tt clASS="monofont">urllib</Tt> module is a high-level interface to retrieve data across the World Wide Web, supporting any HTTP, FTP, and gopher connections by using sockets. This module defines functions for writing programs that must be active users of the Web. It is normally used as an outer interface to other modules, such as <tt cLASS="monofont">httplib, ftplib, gopherlib,</tt> and so on.</p>
<p>To <A NAMe="idx1073745993"></a>
<a nAME="idx1073745994"></A>
<a name="idx1073745995"></a>
<a name="idx1073745996"></a>retrieve a Web page, use the <tt class="monofont">urllib.urlopen(url [,data])</tT> function. This function returns a <tt ClasS="monofont">stream</tt> object that can be manipulated as easily as any other regular <tt ClasS="monofont">file</TT> object, and is illustrated as follows:</P>
<pre>
>>> import urllib
>>> page = urllib.urlopen("http://www.bog.frb.fed.us")
>>> page.readline()
</pRE>
<P>This stream object has two additional <A name="idx1073745997"></A>attributes: <A NAme="idx1073745998"></a>
<a NAME="idx1073745999"></a>
<tt class="monofont">url</tt> and <a name="idx1073746000"></a>
<a namE="idx1073746001"></a>
<tT claSs="monofont">headers.</tt> The first one is the URL that you are opening, and the other is a dictionary that contains the page headers, as illustrated in the next example.</p>
<Pre>
>>> page.url
'http://www.bog.frb.fed.us'
>>> for key, value in page.headers.items():
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -