📄 wget.texi

📁 wget (command line browser) source code
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
same cookie upon further requests.  Since cookies allow the serverowners to keep track of visitors and for sites to exchange thisinformation, some consider them a breach of privacy.  The default is touse cookies; however, @emph{storing} cookies is not on by default.@cindex loading cookies@cindex cookies, loading@item --load-cookies @var{file}Load cookies from @var{file} before the first HTTP retrieval.@var{file} is a textual file in the format originally used by Netscape's@file{cookies.txt} file.You will typically use this option when mirroring sites that requirethat you be logged in to access some or all of their content.  The loginprocess typically works by the web server issuing an @sc{http} cookieupon receiving and verifying your credentials.  The cookie is thenresent by the browser when accessing that part of the site, and soproves your identity.Mirroring such a site requires Wget to send the same cookies yourbrowser sends when communicating with the site.  This is achieved by@samp{--load-cookies}---simply point Wget to the location of the@file{cookies.txt} file, and it will send the same cookies your browserwould send in the same situation.  Different browsers keep textualcookie files in different locations:@table @asis@item Netscape 4.x.The cookies are in @file{~/.netscape/cookies.txt}.@item Mozilla and Netscape 6.x.Mozilla's cookie file is also named @file{cookies.txt}, locatedsomewhere under @file{~/.mozilla}, in the directory of your profile.The full path usually ends up looking somewhat like@file{~/.mozilla/default/@var{some-weird-string}/cookies.txt}.@item Internet Explorer.You can produce a cookie file Wget can use by using the File menu,Import and Export, Export Cookies.  This has been tested with InternetExplorer 5; it is not guaranteed to work with earlier versions.@item Other browsers.If you are using a different browser to create your cookies,@samp{--load-cookies} will only work if you can locate or produce acookie file in the Netscape format that Wget expects.@end tableIf you cannot use @samp{--load-cookies}, there might still be analternative.  If your browser supports a ``cookie manager'', you can useit to view the cookies used when accessing the site you're mirroring.Write down the name and value of the cookie, and manually instruct Wgetto send those cookies, bypassing the ``official'' cookie support:@examplewget --cookies=off --header "Cookie: @var{name}=@var{value}"@end example@cindex saving cookies@cindex cookies, saving@item --save-cookies @var{file}Save cookies to @var{file} at the end of session.  Cookies whose expirytime is not specified, or those that have already expired, are notsaved.@cindex Content-Length, ignore@cindex ignore length@item --ignore-lengthUnfortunately, some @sc{http} servers (@sc{cgi} programs, to be moreprecise) send out bogus @code{Content-Length} headers, which makes Wgetgo wild, as it thinks not all the document was retrieved.  You can spotthis syndrome if Wget retries getting the same document again and again,each time claiming that the (otherwise normal) connection has closed onthe very same byte.With this option, Wget will ignore the @code{Content-Length} header---asif it never existed.@cindex header, add@item --header=@var{additional-header}Define an @var{additional-header} to be passed to the @sc{http} servers.Headers must contain a @samp{:} preceded by one or more non-blankcharacters, and must not contain newlines.You may define more than one additional header by specifying@samp{--header} more than once.@example@groupwget --header='Accept-Charset: iso-8859-2' \     --header='Accept-Language: hr'        \       http://fly.srk.fer.hr/@end group@end exampleSpecification of an empty string as the header value will clear allprevious user-defined headers.@cindex proxy user@cindex proxy password@cindex proxy authentication@item --proxy-user=@var{user}@itemx --proxy-passwd=@var{password}Specify the username @var{user} and password @var{password} forauthentication on a proxy server.  Wget will encode them using the@code{basic} authentication scheme.Security considerations similar to those with @samp{--http-passwd}pertain here as well.@cindex http referer@cindex referer, http@item --referer=@var{url}Include `Referer: @var{url}' header in HTTP request.  Useful forretrieving documents with server-side processing that assume they arealways being retrieved by interactive web browsers and only come outproperly when Referer is set to one of the pages that point to them.@cindex server response, save@item -s@itemx --save-headersSave the headers sent by the @sc{http} server to the file, preceding theactual contents, with an empty line as the separator.@cindex user-agent@item -U @var{agent-string}@itemx --user-agent=@var{agent-string}Identify as @var{agent-string} to the @sc{http} server.The @sc{http} protocol allows the clients to identify themselves using a@code{User-Agent} header field.  This enables distinguishing the@sc{www} software, usually for statistical purposes or for tracing ofprotocol violations.  Wget normally identifies as@samp{Wget/@var{version}}, @var{version} being the current versionnumber of Wget.However, some sites have been known to impose the policy of tailoringthe output according to the @code{User-Agent}-supplied information.While conceptually this is not such a bad idea, it has been abused byservers denying information to clients other than @code{Mozilla} orMicrosoft @code{Internet Explorer}.  This option allows you to changethe @code{User-Agent} line issued by Wget.  Use of this option isdiscouraged, unless you really know what you are doing.@cindex POST@item --post-data=@var{string}@itemx --post-file=@var{file}Use POST as the method for all HTTP requests and send the specified datain the request body.  @code{--post-data} sends @var{string} as data,whereas @code{--post-file} sends the contents of @var{file}.  Other thanthat, they work in exactly the same way.Please be aware that Wget needs to know the size of the POST data inadvance.  Therefore the argument to @code{--post-file} must be a regularfile; specifying a FIFO or something like @file{/dev/stdin} won't work.It's not quite clear how to work around this limitation inherent inHTTP/1.0.  Although HTTP/1.1 introduces @dfn{chunked} transfer thatdoesn't require knowing the request length in advance, a client can'tuse chunked unless it knows it's talking to an HTTP/1.1 server.  And itcan't know that until it receives a response, which in turn requires therequest to have been completed -- a chicken-and-egg problem.Note: if Wget is redirected after the POST request is completed, it willnot send the POST data to the redirected URL.  This is because URLs thatprocess POST often respond with a redirection to a regular page(although that's technically disallowed), which does not desire oraccept POST.  It is not yet clear that this behavior is optimal; if itdoesn't work out, it will be changed.This example shows how to log to a server using POST and then proceed todownload the desired pages, presumably only accessible to authorizedusers:@example@group# @r{Log in to the server.  This can be done only once.}wget --save-cookies cookies.txt \     --post-data 'user=foo&password=bar' \     http://server.com/auth.php# @r{Now grab the page or pages we care about.}wget --load-cookies cookies.txt \     -p http://server.com/interesting/article.php@end group@end example@end table@node FTP Options, Recursive Retrieval Options, HTTP Options, Invoking@section FTP Options@table @samp@cindex .listing files, removing@item -nr@itemx --dont-remove-listingDon't remove the temporary @file{.listing} files generated by @sc{ftp}retrievals.  Normally, these files contain the raw directory listingsreceived from @sc{ftp} servers.  Not removing them can be useful fordebugging purposes, or when you want to be able to easily check on thecontents of remote server directories (e.g. to verify that a mirroryou're running is complete).Note that even though Wget writes to a known filename for this file,this is not a security hole in the scenario of a user making@file{.listing} a symbolic link to @file{/etc/passwd} or something andasking @code{root} to run Wget in his or her directory.  Depending onthe options used, either Wget will refuse to write to @file{.listing},making the globbing/recursion/time-stamping operation fail, or thesymbolic link will be deleted and replaced with the actual@file{.listing} file, or the listing will be written to a@file{.listing.@var{number}} file.Even though this situation isn't a problem, though, @code{root} shouldnever run Wget in a non-trusted user's directory.  A user could dosomething as simple as linking @file{index.html} to @file{/etc/passwd}and asking @code{root} to run Wget with @samp{-N} or @samp{-r} so the filewill be overwritten.@cindex globbing, toggle@item -g on/off@itemx --glob=on/offTurn @sc{ftp} globbing on or off.  Globbing means you may use theshell-like special characters (@dfn{wildcards}), like @samp{*},@samp{?}, @samp{[} and @samp{]} to retrieve more than one file from thesame directory at once, like:@examplewget ftp://gnjilux.srk.fer.hr/*.msg@end exampleBy default, globbing will be turned on if the @sc{url} contains aglobbing character.  This option may be used to turn globbing on or offpermanently.You may have to quote the @sc{url} to protect it from being expanded byyour shell.  Globbing makes Wget look for a directory listing, which issystem-specific.  This is why it currently works only with Unix @sc{ftp}servers (and the ones emulating Unix @code{ls} output).@cindex passive ftp@item --passive-ftpUse the @dfn{passive} @sc{ftp} retrieval scheme, in which the clientinitiates the data connection.  This is sometimes required for @sc{ftp}to work behind firewalls.@cindex symbolic links, retrieving@item --retr-symlinksUsually, when retrieving @sc{ftp} directories recursively and a symboliclink is encountered, the linked-to file is not downloaded.  Instead, amatching symbolic link is created on the local filesystem.  Thepointed-to file will not be downloaded unless this recursive retrievalwould have encountered it separately and downloaded it anyway.When @samp{--retr-symlinks} is specified, however, symbolic links aretraversed and the pointed-to files are retrieved.  At this time, thisoption does not cause Wget to traverse symlinks to directories andrecurse through them, but in the future it should be enhanced to dothis.Note that when retrieving a file (not a directory) because it wasspecified on the command-line, rather than because it was recursed to,this option has no effect.  Symbolic links are always traversed in thiscase.@end table@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking@section Recursive Retrieval Options@table @samp@item -r@itemx --recursiveTurn on recursive retrieving.  @xref{Recursive Retrieval}, for moredetails.@item -l @var{depth}@itemx --level=@var{depth}Specify recursion maximum depth level @var{depth} (@pxref{RecursiveRetrieval}).  The default maximum depth is 5.@cindex proxy filling@cindex delete after retrieval@cindex filling proxy cache@item --delete-afterThis option tells Wget to delete every single file it downloads,@emph{after} having done so.  It is useful for pre-fetching popularpages through a proxy, e.g.:@examplewget -r -nd --delete-after http://whatever.com/~popular/page/@end exampleThe @samp{-r} option is to retrieve recursively, and @samp{-nd} to notcreate directories.  Note that @samp{--delete-after} deletes files on the local machine.  Itdoes not issue the @samp{DELE} command to remote FTP sites, forinstance.  Also note that when @samp{--delete-after} is specified,@samp{--convert-links} is ignored, so @samp{.orig} files are simply notcreated in the first place.@cindex conversion of links@cindex link conversion@item -k@itemx --convert-linksAfter the download is complete, convert the links in the document tomake them suitable for local viewing.  This affects not only the visiblehyperlinks, but any part of the document that links to external content,such as embedded images, links to style sheets, hyperlinks to non-@sc{html}content, etc.Each link will be changed in one of the two ways:@itemize @bullet@itemThe links to files that have been downloaded by Wget will be changed torefer to the file they point to as a relative link.Example: if the downloaded file @file{/foo/doc.html} links to@file{/bar/img.gif}, also downloaded, then the link in @file{doc.html}will be modified to point to @samp{../bar/img.gif}.  This kind oftransformation works reliably for arbitrary combinations of directories.@itemThe links to files that have not been downloaded by Wget will be changedto include host name and absolute path of the location they point to.Example: if the downloaded file @file{/foo/doc.html} links to@file{/bar/img.gif} (or to @file{../bar/img.gif}), then the link in@file{doc.html} will be modified to point to@file{http://@var{hostname}/bar/img.gif}.@end itemizeBecause of this, local browsing works reliably: if a linked file wasdownloaded, the link will refer to its local name; if it was notdownloaded, the link will refer to its full Internet address rather than
💿 文件大小 1292 K
👤 上传用户 xxjjyy1237
📂 所属分类 Linux/Unix编程
🏷️ 相关标签

#command #browser #source #wget
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -