📄 wget.pod

📁 一个从网络上自动下载文件的自由工具
💻 POD
📖 第 1 页 / 共 5 页
字号:
=over 4=item B<-nd>=item B<--no-directories>Do not create a hierarchy of directories when retrieving recursively.With this option turned on, all files will get saved to the currentdirectory, without clobbering (if a name shows up more than once, thefilenames will get extensions B<.n>).=item B<-x>=item B<--force-directories>The opposite of B<-nd>---create a hierarchy of directories, even ifone would not have been created otherwise.  E.g. B<wget -xhttp://fly.srk.fer.hr/robots.txt> will save the downloaded file toF<fly.srk.fer.hr/robots.txt>.=item B<-nH>=item B<--no-host-directories>Disable generation of host-prefixed directories.  By default, invokingWget with B<-r http://fly.srk.fer.hr/> will create a structure ofdirectories beginning with F<fly.srk.fer.hr/>.  This option disablessuch behavior.=item B<--protocol-directories>Use the protocol name as a directory component of local file names.  Forexample, with this option, B<wget -r http://>I<host> will save toB<http/>I<host>B</...> rather than just to I<host>B</...>.=item B<--cut-dirs=>I<number>Ignore I<number> directory components.  This is useful for getting afine-grained control over the directory where recursive retrieval willbe saved.Take, for example, the directory atB<ftp://ftp.xemacs.org/pub/xemacs/>.  If you retrieve it withB<-r>, it will be saved locally underF<ftp.xemacs.org/pub/xemacs/>.  While the B<-nH> option canremove the F<ftp.xemacs.org/> part, you are still stuck withF<pub/xemacs>.  This is where B<--cut-dirs> comes in handy; itmakes Wget not "see" I<number> remote directory components.  Hereare several examples of how B<--cut-dirs> option works.		No options        -> ftp.xemacs.org/pub/xemacs/	-nH               -> pub/xemacs/	-nH --cut-dirs=1  -> xemacs/	-nH --cut-dirs=2  -> .		--cut-dirs=1      -> ftp.xemacs.org/xemacs/	...If you just want to get rid of the directory structure, this option issimilar to a combination of B<-nd> and B<-P>.  However, unlikeB<-nd>, B<--cut-dirs> does not lose with subdirectories---forinstance, with B<-nH --cut-dirs=1>, a F<beta/> subdirectory willbe placed to F<xemacs/beta>, as one would expect.=item B<-P> I<prefix>=item B<--directory-prefix=>I<prefix>Set directory prefix to I<prefix>.  The I<directory prefix> is thedirectory where all other files and subdirectories will be saved to,i.e. the top of the retrieval tree.  The default is B<.> (thecurrent directory).=back=head2 HTTP Options=over 4=item B<-E>=item B<--html-extension>If a file of type B<application/xhtml+xml> or B<text/html> is downloaded and the URL does not end with the regexp B<\.[Hh][Tt][Mm][Ll]?>, this option will cause the suffix B<.html> to be appended to the local filename.  This is useful, for instance, when you're mirroring a remote site that uses B<.asp> pages, but you want the mirrored pages to be viewable on your stock Apache server.  Another good use for this is when you're downloading CGI-generated materials.  A URL like B<http://site.com/article.cgi?25> will be saved asF<article.cgi?25.html>.Note that filenames changed in this way will be re-downloaded every timeyou re-mirror a site, because Wget can't tell that the localF<I<X>.html> file corresponds to remote URL I<X> (sinceit doesn't yet know that the URL produces output of typeB<text/html> or B<application/xhtml+xml>.  To prevent this re-downloading, you must use B<-k> and B<-K> so that the original version of the file will be saved as F<I<X>.orig>.=item B<--http-user=>I<user>=item B<--http-password=>I<password>Specify the username I<user> and password I<password> on anHTTP server.  According to the type of the challenge, Wget willencode them using either the C<basic> (insecure),the C<digest>, or the Windows C<NTLM> authentication scheme.Another way to specify username and password is in the URL itself.  Either method reveals your password to anyone whobothers to run C<ps>.  To prevent the passwords from being seen,store them in F<.wgetrc> or F<.netrc>, and make sure to protectthose files from other users with C<chmod>.  If the passwords arereally important, do not leave them lying in those files either---editthe files and delete them after Wget has started the download.=item B<--no-cache>Disable server-side cache.  In this case, Wget will send the remoteserver an appropriate directive (B<Pragma: no-cache>) to get thefile from the remote service, rather than returning the cached version.This is especially useful for retrieving and flushing out-of-datedocuments on proxy servers.Caching is allowed by default.=item B<--no-cookies>Disable the use of cookies.  Cookies are a mechanism for maintainingserver-side state.  The server sends the client a cookie using theC<Set-Cookie> header, and the client responds with the same cookieupon further requests.  Since cookies allow the server owners to keeptrack of visitors and for sites to exchange this information, someconsider them a breach of privacy.  The default is to use cookies;however, I<storing> cookies is not on by default.=item B<--load-cookies> I<file>Load cookies from I<file> before the first HTTP retrieval.I<file> is a textual file in the format originally used by Netscape'sF<cookies.txt> file.You will typically use this option when mirroring sites that requirethat you be logged in to access some or all of their content.  The loginprocess typically works by the web server issuing an HTTP cookieupon receiving and verifying your credentials.  The cookie is thenresent by the browser when accessing that part of the site, and soproves your identity.Mirroring such a site requires Wget to send the same cookies yourbrowser sends when communicating with the site.  This is achieved byB<--load-cookies>---simply point Wget to the location of theF<cookies.txt> file, and it will send the same cookies your browserwould send in the same situation.  Different browsers keep textualcookie files in different locations:=over 4=item @asis<Netscape 4.x.>The cookies are in F<~/.netscape/cookies.txt>.=item @asis<Mozilla and Netscape 6.x.>Mozilla's cookie file is also named F<cookies.txt>, locatedsomewhere under F<~/.mozilla>, in the directory of your profile.The full path usually ends up looking somewhat likeF<~/.mozilla/default/I<some-weird-string>/cookies.txt>.=item @asis<Internet Explorer.>You can produce a cookie file Wget can use by using the File menu,Import and Export, Export Cookies.  This has been tested with InternetExplorer 5; it is not guaranteed to work with earlier versions.=item @asis<Other browsers.>If you are using a different browser to create your cookies,B<--load-cookies> will only work if you can locate or produce acookie file in the Netscape format that Wget expects.=backIf you cannot use B<--load-cookies>, there might still be analternative.  If your browser supports a "cookie manager", you can useit to view the cookies used when accessing the site you're mirroring.Write down the name and value of the cookie, and manually instruct Wgetto send those cookies, bypassing the "official" cookie support:		wget --no-cookies --header "Cookie: <name>=<value>"=item B<--save-cookies> I<file>Save cookies to I<file> before exiting.  This will not save cookiesthat have expired or that have no expiry time (so-called "sessioncookies"), but also see B<--keep-session-cookies>.=item B<--keep-session-cookies>When specified, causes B<--save-cookies> to also save sessioncookies.  Session cookies are normally not saved because they aremeant to be kept in memory and forgotten when you exit the browser.Saving them is useful on sites that require you to log in or to visitthe home page before you can access some pages.  With this option,multiple Wget runs are considered a single browser session as far asthe site is concerned.Since the cookie file format does not normally carry session cookies,Wget marks them with an expiry timestamp of 0.  Wget'sB<--load-cookies> recognizes those as session cookies, but it mightconfuse other browsers.  Also note that cookies so loaded will betreated as other session cookies, which means that if you wantB<--save-cookies> to preserve them again, you must useB<--keep-session-cookies> again.=item B<--ignore-length>Unfortunately, some HTTP servers (CGI programs, to be moreprecise) send out bogus C<Content-Length> headers, which makes Wgetgo wild, as it thinks not all the document was retrieved.  You can spotthis syndrome if Wget retries getting the same document again and again,each time claiming that the (otherwise normal) connection has closed onthe very same byte.With this option, Wget will ignore the C<Content-Length> header---asif it never existed.=item B<--header=>I<header-line>Send I<header-line> along with the rest of the headers in eachHTTP request.  The supplied header is sent as-is, which means itmust contain name and value separated by colon, and must not containnewlines.You may define more than one additional header by specifyingB<--header> more than once.		wget --header='Accept-Charset: iso-8859-2' \	     --header='Accept-Language: hr'        \	       http://fly.srk.fer.hr/Specification of an empty string as the header value will clear allprevious user-defined headers.As of Wget 1.10, this option can be used to override headers otherwisegenerated automatically.  This example instructs Wget to connect tolocalhost, but to specify B<foo.bar> in the C<Host> header:		wget --header="Host: foo.bar" http://localhost/In versions of Wget prior to 1.10 such use of B<--header> causedsending of duplicate headers.=item B<--max-redirect=>I<number>Specifies the maximum number of redirections to follow for a resource.The default is 20, which is usually far more than necessary. However, onthose occasions where you want to allow more (or fewer), this is theoption to use.=item B<--proxy-user=>I<user>=item B<--proxy-password=>I<password>Specify the username I<user> and password I<password> forauthentication on a proxy server.  Wget will encode them using theC<basic> authentication scheme.Security considerations similar to those with B<--http-password>pertain here as well.=item B<--referer=>I<url>Include `Referer: I<url>' header in HTTP request.  Useful forretrieving documents with server-side processing that assume they arealways being retrieved by interactive web browsers and only come outproperly when Referer is set to one of the pages that point to them.=item B<--save-headers>Save the headers sent by the HTTP server to the file, preceding theactual contents, with an empty line as the separator.=item B<-U> I<agent-string>=item B<--user-agent=>I<agent-string>Identify as I<agent-string> to the HTTP server.The HTTP protocol allows the clients to identify themselves using aC<User-Agent> header field.  This enables distinguishing theWWW software, usually for statistical purposes or for tracing ofprotocol violations.  Wget normally identifies asB<Wget/>I<version>, I<version> being the current versionnumber of Wget.However, some sites have been known to impose the policy of tailoringthe output according to the C<User-Agent>-supplied information.While this is not such a bad idea in theory, it has been abused byservers denying information to clients other than (historically)Netscape or, more frequently, Microsoft Internet Explorer.  Thisoption allows you to change the C<User-Agent> line issued by Wget.Use of this option is discouraged, unless you really know what you aredoing.Specifying empty user agent with B<--user-agent=""> instructs Wgetnot to send the C<User-Agent> header in HTTP requests.=item B<--post-data=>I<string>=item B<--post-file=>I<file>Use POST as the method for all HTTP requests and send the specified datain the request body.  C<--post-data> sends I<string> as data,whereas C<--post-file> sends the contents of I<file>.  Other thanthat, they work in exactly the same way.Please be aware that Wget needs to know the size of the POST data inadvance.  Therefore the argument to C<--post-file> must be a regularfile; specifying a FIFO or something like F</dev/stdin> won't work.It's not quite clear how to work around this limitation inherent inHTTP/1.0.  Although HTTP/1.1 introduces I<chunked> transfer thatdoesn't require knowing the request length in advance, a client can'tuse chunked unless it knows it's talking to an HTTP/1.1 server.  And itcan't know that until it receives a response, which in turn requires therequest to have been completed -- a chicken-and-egg problem.Note: if Wget is redirected after the POST request is completed, itwill not send the POST data to the redirected URL.  This is becauseURLs that process POST often respond with a redirection to a regularpage, which does not desire or accept POST.  It is not completelyclear that this behavior is optimal; if it doesn't work out, it mightbe changed in the future.This example shows how to log to a server using POST and then proceed todownload the desired pages, presumably only accessible to authorizedusers:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -