⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 module-urllib.html

📁 一本很好的python的说明书,适合对python感兴趣的人
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>11.3 urllib -- Open arbitrary resources by URL</title>
<META NAME="description" CONTENT="11.3 urllib -- Open arbitrary resources by URL">
<META NAME="keywords" CONTENT="lib">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="STYLESHEET" href="lib.css" tppabs="http://www.python.org/doc/current/lib/lib.css">
<LINK REL="next" href="module-httplib.html" tppabs="http://www.python.org/doc/current/lib/module-httplib.html">
<LINK REL="previous" href="module-cgi.html" tppabs="http://www.python.org/doc/current/lib/module-cgi.html">
<LINK REL="up" href="internet.html" tppabs="http://www.python.org/doc/current/lib/internet.html">
<LINK REL="next" href="urlopener-objs.html" tppabs="http://www.python.org/doc/current/lib/urlopener-objs.html">
</head>
<body>
<DIV CLASS="navigation"><table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td><A HREF="node247.html" tppabs="http://www.python.org/doc/current/lib/node247.html"><img src="previous.gif" tppabs="http://www.python.org/doc/current/icons/previous.gif" border="0" height="32"
  alt="Previous Page" width="32"></A></td>
<td><A href="internet.html" tppabs="http://www.python.org/doc/current/lib/internet.html"><img src="up.gif" tppabs="http://www.python.org/doc/current/icons/up.gif" border="0" height="32"
  alt="Up One Level" width="32"></A></td>
<td><A href="urlopener-objs.html" tppabs="http://www.python.org/doc/current/lib/urlopener-objs.html"><img src="next.gif" tppabs="http://www.python.org/doc/current/icons/next.gif" border="0" height="32"
  alt="Next Page" width="32"></A></td>
<td align="center" width="100%">Python Library Reference</td>
<td><A href="contents.html" tppabs="http://www.python.org/doc/current/lib/contents.html"><img src="contents.gif" tppabs="http://www.python.org/doc/current/icons/contents.gif" border="0" height="32"
  alt="Contents" width="32"></A></td>
<td><a href="modindex.html" tppabs="http://www.python.org/doc/current/lib/modindex.html" title="Module Index"><img src="modules.gif" tppabs="http://www.python.org/doc/current/icons/modules.gif" border="0" height="32"
  alt="Module Index" width="32"></a></td>
<td><A href="genindex.html" tppabs="http://www.python.org/doc/current/lib/genindex.html"><img src="index.gif" tppabs="http://www.python.org/doc/current/icons/index.gif" border="0" height="32"
  alt="Index" width="32"></A></td>
</tr></table>
<b class="navlabel">Previous:</b> <a class="sectref" HREF="node247.html" tppabs="http://www.python.org/doc/current/lib/node247.html">11.2.9 Common problems and</A>
<b class="navlabel">Up:</b> <a class="sectref" href="internet.html" tppabs="http://www.python.org/doc/current/lib/internet.html">11. Internet Protocols and</A>
<b class="navlabel">Next:</b> <a class="sectref" href="urlopener-objs.html" tppabs="http://www.python.org/doc/current/lib/urlopener-objs.html">11.3.1 URLopener Objects</A>
<br><hr></DIV>
<!--End of Navigation Panel-->

<H1><A NAME="SECTION0013300000000000000000">
11.3 <tt class="module">urllib</tt> --
         Open arbitrary resources by URL</A>
</H1>

<P>


<P>


<P>
This module provides a high-level interface for fetching data across
the World-Wide Web.  In particular, the <tt class="function">urlopen()</tt> function
is similar to the built-in function <tt class="function">open()</tt>, but accepts
Universal Resource Locators (URLs) instead of filenames.  Some
restrictions apply -- it can only open URLs for reading, and no seek
operations are available.

<P>
It defines the following public functions:

<P>
<dl><dt><b><a name='l2h-1993'><tt class='function'>urlopen</tt></a></b> (<var>url</var><big>[</big><var>, data</var><big>]</big>)
<dd>
Open a network object denoted by a URL for reading.  If the URL does
not have a scheme identifier, or if it has <span class="file">file:</span> as its scheme
identifier, this opens a local file; otherwise it opens a socket to a
server somewhere on the network.  If the connection cannot be made, or
if the server returns an error code, the <tt class="exception">IOError</tt> exception
is raised.  If all went well, a file-like object is returned.  This
supports the following methods: <tt class="method">read()</tt>, <tt class="method">readline()</tt>,
<tt class="method">readlines()</tt>, <tt class="method">fileno()</tt>, <tt class="method">close()</tt>,
<tt class="method">info()</tt> and <tt class="method">geturl()</tt>.

<P>
Except for the <tt class="method">info()</tt> and <tt class="method">geturl()</tt> methods,
these methods have the same interface as for
file objects -- see section <A href="bltin-file-objects.html#bltin-file-objects" tppabs="http://www.python.org/doc/current/lib/bltin-file-objects.html#bltin-file-objects">2.1.7</A> in this
manual.  (It is not a built-in file object, however, so it can't be
used at those few places where a true built-in file object is
required.)

<P>
The <tt class="method">info()</tt> method returns an instance of the class
<tt class="class">mimetools.Message</tt> containing meta-information associated
with the URL.  When the method is HTTP, these headers are those
returned by the server at the head of the retrieved HTML page
(including Content-Length and Content-Type).  When the method is FTP,
a Content-Length header will be present if (as is now usual) the
server passed back a file length in response to the FTP retrieval
request.  When the method is local-file, returned headers will include
a Date representing the file's last-modified time, a Content-Length
giving file size, and a Content-Type containing a guess at the file's
type. See also the description of the
<tt class='module'><a href="module-mimetools.html" tppabs="http://www.python.org/doc/current/lib/module-mimetools.html">mimetools</a></tt> module.

<P>
The <tt class="method">geturl()</tt> method returns the real URL of the page.  In
some cases, the HTTP server redirects a client to another URL.  The
<tt class="function">urlopen()</tt> function handles this transparently, but in some
cases the caller needs to know which URL the client was redirected
to.  The <tt class="method">geturl()</tt> method can be used to get at this
redirected URL.

<P>
If the <var>url</var> uses the <span class="file">http:</span> scheme identifier, the optional
<var>data</var> argument may be given to specify a <code>POST</code> request
(normally the request type is <code>GET</code>).  The <var>data</var> argument
must in standard <span class="file">application/x-www-form-urlencoded</span> format;
see the <tt class="function">urlencode()</tt> function below.

<P>
The <tt class="function">urlopen()</tt> function works transparently with proxies
which do not require authentication.  In a Unix or Windows
environment, set the <a class="envvar" name='l2h-2005'>$http_proxy</a>, <a class="envvar" name='l2h-2006'>$ftp_proxy</a> or
<a class="envvar" name='l2h-2007'>$gopher_proxy</a> environment variables to a URL that identifies
the proxy server before starting the Python interpreter.  For example
(the "<tt class="character">%</tt>" is the command prompt):

<P>
<dl><dd><pre class="verbatim">
% http_proxy="http://www.someproxy.com:3128"
% export http_proxy
% python
...
</pre></dl>

<P>
In a Macintosh environment, <tt class="function">urlopen()</tt> will retrieve proxy
information from Internet Config.

<P>
Proxies which require authentication for use are not currently
supported; this is considered an implementation limitation.
</dl>

<P>
<dl><dt><b><a name='l2h-1994'><tt class='function'>urlretrieve</tt></a></b> (<var>url</var><big>[</big><var>, filename</var><big>[</big><var>, hook</var><big>]</big><big>]</big>)
<dd>
Copy a network object denoted by a URL to a local file, if necessary.
If the URL points to a local file, or a valid cached copy of the
object exists, the object is not copied.  Return a tuple
<code>(<var>filename</var>, <var>headers</var>)</code> where <var>filename</var> is the
local file name under which the object can be found, and <var>headers</var>
is either <code>None</code> (for a local object) or whatever the
<tt class="method">info()</tt> method of the object returned by <tt class="function">urlopen()</tt>
returned (for a remote object, possibly cached).  Exceptions are the
same as for <tt class="function">urlopen()</tt>.

<P>
The second argument, if present, specifies the file location to copy
to (if absent, the location will be a tempfile with a generated name).
The third argument, if present, is a hook function that will be called
once on establishment of the network connection and once after each
block read thereafter.  The hook will be passed three arguments; a
count of blocks transferred so far, a block size in bytes, and the
total size of the file.  The third argument may be <code>-1</code> on older
FTP servers which do not return a file size in response to a retrieval 
request.

<P>
If the <var>url</var> uses the <span class="file">http:</span> scheme identifier, the optional
<var>data</var> argument may be given to specify a <code>POST</code> request
(normally the request type is <code>GET</code>).  The <var>data</var> argument
must in standard <span class="file">application/x-www-form-urlencoded</span> format;
see the <tt class="function">urlencode()</tt> function below.
</dl>

<P>
<dl><dt><b><a name='l2h-1995'><tt class='function'>urlcleanup</tt></a></b> ()
<dd>
Clear the cache that may have been built up by previous calls to
<tt class="function">urlretrieve()</tt>.
</dl>

<P>
<dl><dt><b><a name='l2h-1996'><tt class='function'>quote</tt></a></b> (<var>string</var><big>[</big><var>, safe</var><big>]</big>)
<dd>
Replace special characters in <var>string</var> using the "<tt class="samp">%xx</tt>" escape.
Letters, digits, and the characters "<tt class="character">_,.-</tt>" are never quoted.
The optional <var>safe</var> parameter specifies additional characters
that should not be quoted -- its default value is <code>'/'</code>.

<P>
Example: <code>quote('/~connolly/')</code> yields <code>'/%7econnolly/'</code>.
</dl>

<P>
<dl><dt><b><a name='l2h-1997'><tt class='function'>quote_plus</tt></a></b> (<var>string</var><big>[</big><var>, safe</var><big>]</big>)
<dd>
Like <tt class="function">quote()</tt>, but also replaces spaces by plus signs, as
required for quoting HTML form values.  Plus signs in the original
string are escaped unless they are included in <var>safe</var>.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -