📄 wget.1
字号:
.PD 0.IP "\fB\-\-directory\-prefix=\fR\fIprefix\fR" 4.IX Item "--directory-prefix=prefix".PDSet directory prefix to \fIprefix\fR. The \fIdirectory prefix\fR is thedirectory where all other files and subdirectories will be saved to,i.e. the top of the retrieval tree. The default is \fB.\fR (thecurrent directory)..Sh "\s-1HTTP\s0 Options".IX Subsection "HTTP Options".IP "\fB\-E\fR" 4.IX Item "-E".PD 0.IP "\fB\-\-html\-extension\fR" 4.IX Item "--html-extension".PDIf a file of type \fBapplication/xhtml+xml\fR or \fBtext/html\fR is downloaded and the \s-1URL\s0 does not end with the regexp \&\fB\e.[Hh][Tt][Mm][Ll]?\fR, this option will cause the suffix \fB.html\fR to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses \fB.asp\fR pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A \s-1URL\s0 like \fBhttp://site.com/article.cgi?25\fR will be saved as\&\fIarticle.cgi?25.html\fR..SpNote that filenames changed in this way will be re-downloaded every timeyou re-mirror a site, because Wget can't tell that the local\&\fI\fIX\fI.html\fR file corresponds to remote \s-1URL\s0 \fIX\fR (sinceit doesn't yet know that the \s-1URL\s0 produces output of type\&\fBtext/html\fR or \fBapplication/xhtml+xml\fR. To prevent this re\-downloading, you must use \fB\-k\fR and \fB\-K\fR so that the original version of the file will be saved as \fI\fIX\fI.orig\fR..IP "\fB\-\-http\-user=\fR\fIuser\fR" 4.IX Item "--http-user=user".PD 0.IP "\fB\-\-http\-password=\fR\fIpassword\fR" 4.IX Item "--http-password=password".PDSpecify the username \fIuser\fR and password \fIpassword\fR on an\&\s-1HTTP\s0 server. According to the type of the challenge, Wget willencode them using either the \f(CW\*(C`basic\*(C'\fR (insecure),the \f(CW\*(C`digest\*(C'\fR, or the Windows \f(CW\*(C`NTLM\*(C'\fR authentication scheme..SpAnother way to specify username and password is in the \s-1URL\s0 itself. Either method reveals your password to anyone whobothers to run \f(CW\*(C`ps\*(C'\fR. To prevent the passwords from being seen,store them in \fI.wgetrc\fR or \fI.netrc\fR, and make sure to protectthose files from other users with \f(CW\*(C`chmod\*(C'\fR. If the passwords arereally important, do not leave them lying in those files either\-\-\-editthe files and delete them after Wget has started the download..IP "\fB\-\-no\-cache\fR" 4.IX Item "--no-cache"Disable server-side cache. In this case, Wget will send the remoteserver an appropriate directive (\fBPragma: no-cache\fR) to get thefile from the remote service, rather than returning the cached version.This is especially useful for retrieving and flushing out-of-datedocuments on proxy servers..SpCaching is allowed by default..IP "\fB\-\-no\-cookies\fR" 4.IX Item "--no-cookies"Disable the use of cookies. Cookies are a mechanism for maintainingserver-side state. The server sends the client a cookie using the\&\f(CW\*(C`Set\-Cookie\*(C'\fR header, and the client responds with the same cookieupon further requests. Since cookies allow the server owners to keeptrack of visitors and for sites to exchange this information, someconsider them a breach of privacy. The default is to use cookies;however, \fIstoring\fR cookies is not on by default..IP "\fB\-\-load\-cookies\fR \fIfile\fR" 4.IX Item "--load-cookies file"Load cookies from \fIfile\fR before the first \s-1HTTP\s0 retrieval.\&\fIfile\fR is a textual file in the format originally used by Netscape's\&\fIcookies.txt\fR file..SpYou will typically use this option when mirroring sites that requirethat you be logged in to access some or all of their content. The loginprocess typically works by the web server issuing an \s-1HTTP\s0 cookieupon receiving and verifying your credentials. The cookie is thenresent by the browser when accessing that part of the site, and soproves your identity..SpMirroring such a site requires Wget to send the same cookies yourbrowser sends when communicating with the site. This is achieved by\&\fB\-\-load\-cookies\fR\-\-\-simply point Wget to the location of the\&\fIcookies.txt\fR file, and it will send the same cookies your browserwould send in the same situation. Different browsers keep textualcookie files in different locations:.RS 4.IP "@asis<Netscape 4.x.>" 4.IX Item "@asis<Netscape 4.x.>"The cookies are in \fI~/.netscape/cookies.txt\fR..IP "@asis<Mozilla and Netscape 6.x.>" 4.IX Item "@asis<Mozilla and Netscape 6.x.>"Mozilla's cookie file is also named \fIcookies.txt\fR, locatedsomewhere under \fI~/.mozilla\fR, in the directory of your profile.The full path usually ends up looking somewhat like\&\fI~/.mozilla/default/\fIsome-weird-string\fI/cookies.txt\fR..IP "@asis<Internet Explorer.>" 4.IX Item "@asis<Internet Explorer.>"You can produce a cookie file Wget can use by using the File menu,Import and Export, Export Cookies. This has been tested with InternetExplorer 5; it is not guaranteed to work with earlier versions..IP "@asis<Other browsers.>" 4.IX Item "@asis<Other browsers.>"If you are using a different browser to create your cookies,\&\fB\-\-load\-cookies\fR will only work if you can locate or produce acookie file in the Netscape format that Wget expects..RE.RS 4.SpIf you cannot use \fB\-\-load\-cookies\fR, there might still be analternative. If your browser supports a \*(L"cookie manager\*(R", you can useit to view the cookies used when accessing the site you're mirroring.Write down the name and value of the cookie, and manually instruct Wgetto send those cookies, bypassing the \*(L"official\*(R" cookie support:.Sp.Vb 1\& wget --no-cookies --header "Cookie: <name>=<value>".Ve.RE.IP "\fB\-\-save\-cookies\fR \fIfile\fR" 4.IX Item "--save-cookies file"Save cookies to \fIfile\fR before exiting. This will not save cookiesthat have expired or that have no expiry time (so\-called \*(L"sessioncookies\*(R"), but also see \fB\-\-keep\-session\-cookies\fR..IP "\fB\-\-keep\-session\-cookies\fR" 4.IX Item "--keep-session-cookies"When specified, causes \fB\-\-save\-cookies\fR to also save sessioncookies. Session cookies are normally not saved because they aremeant to be kept in memory and forgotten when you exit the browser.Saving them is useful on sites that require you to log in or to visitthe home page before you can access some pages. With this option,multiple Wget runs are considered a single browser session as far asthe site is concerned..SpSince the cookie file format does not normally carry session cookies,Wget marks them with an expiry timestamp of 0. Wget's\&\fB\-\-load\-cookies\fR recognizes those as session cookies, but it mightconfuse other browsers. Also note that cookies so loaded will betreated as other session cookies, which means that if you want\&\fB\-\-save\-cookies\fR to preserve them again, you must use\&\fB\-\-keep\-session\-cookies\fR again..IP "\fB\-\-ignore\-length\fR" 4.IX Item "--ignore-length"Unfortunately, some \s-1HTTP\s0 servers (\s-1CGI\s0 programs, to be moreprecise) send out bogus \f(CW\*(C`Content\-Length\*(C'\fR headers, which makes Wgetgo wild, as it thinks not all the document was retrieved. You can spotthis syndrome if Wget retries getting the same document again and again,each time claiming that the (otherwise normal) connection has closed onthe very same byte..SpWith this option, Wget will ignore the \f(CW\*(C`Content\-Length\*(C'\fR header\-\-\-asif it never existed..IP "\fB\-\-header=\fR\fIheader-line\fR" 4.IX Item "--header=header-line"Send \fIheader-line\fR along with the rest of the headers in each\&\s-1HTTP\s0 request. The supplied header is sent as\-is, which means itmust contain name and value separated by colon, and must not containnewlines..SpYou may define more than one additional header by specifying\&\fB\-\-header\fR more than once..Sp.Vb 3\& wget --header='Accept-Charset: iso-8859-2' \e\& --header='Accept-Language: hr' \e\& http://fly.srk.fer.hr/.Ve.SpSpecification of an empty string as the header value will clear allprevious user-defined headers..SpAs of Wget 1.10, this option can be used to override headers otherwisegenerated automatically. This example instructs Wget to connect tolocalhost, but to specify \fBfoo.bar\fR in the \f(CW\*(C`Host\*(C'\fR header:.Sp.Vb 1\& wget --header="Host: foo.bar" http://localhost/.Ve.SpIn versions of Wget prior to 1.10 such use of \fB\-\-header\fR causedsending of duplicate headers..IP "\fB\-\-max\-redirect=\fR\fInumber\fR" 4.IX Item "--max-redirect=number"Specifies the maximum number of redirections to follow for a resource.The default is 20, which is usually far more than necessary. However, onthose occasions where you want to allow more (or fewer), this is theoption to use..IP "\fB\-\-proxy\-user=\fR\fIuser\fR" 4.IX Item "--proxy-user=user".PD 0.IP "\fB\-\-proxy\-password=\fR\fIpassword\fR" 4.IX Item "--proxy-password=password".PDSpecify the username \fIuser\fR and password \fIpassword\fR forauthentication on a proxy server. Wget will encode them using the\&\f(CW\*(C`basic\*(C'\fR authentication scheme..SpSecurity considerations similar to those with \fB\-\-http\-password\fRpertain here as well..IP "\fB\-\-referer=\fR\fIurl\fR" 4.IX Item "--referer=url"Include `Referer: \fIurl\fR' header in \s-1HTTP\s0 request. Useful forretrieving documents with server-side processing that assume they arealways being retrieved by interactive web browsers and only come outproperly when Referer is set to one of the pages that point to them..IP "\fB\-\-save\-headers\fR" 4.IX Item "--save-headers"Save the headers sent by the \s-1HTTP\s0 server to the file, preceding theactual contents, with an empty line as the separator..IP "\fB\-U\fR \fIagent-string\fR" 4.IX Item "-U agent-string".PD 0.IP "\fB\-\-user\-agent=\fR\fIagent-string\fR" 4.IX Item "--user-agent=agent-string".PDIdentify as \fIagent-string\fR to the \s-1HTTP\s0 server..SpThe \s-1HTTP\s0 protocol allows the clients to identify themselves using a\&\f(CW\*(C`User\-Agent\*(C'\fR header field. This enables distinguishing the\&\s-1WWW\s0 software, usually for statistical purposes or for tracing ofprotocol violations. Wget normally identifies as\&\fBWget/\fR\fIversion\fR, \fIversion\fR being the current versionnumber of Wget..SpHowever, some sites have been known to impose the policy of tailoringthe output according to the \f(CW\*(C`User\-Agent\*(C'\fR\-supplied information.While this is not such a bad idea in theory, it has been abused byservers denying information to clients other than (historically)Netscape or, more frequently, Microsoft Internet Explorer. Thisoption allows you to change the \f(CW\*(C`User\-Agent\*(C'\fR line issued by Wget.Use of this option is discouraged, unless you really know what you aredoing..SpSpecifying empty user agent with \fB\-\-user\-agent=""\fR instructs Wgetnot to send the \f(CW\*(C`User\-Agent\*(C'\fR header in \s-1HTTP\s0 requests..IP "\fB\-\-post\-data=\fR\fIstring\fR" 4.IX Item "--post-data=string".PD 0.IP "\fB\-\-post\-file=\fR\fIfile\fR" 4.IX Item "--post-file=file".PDUse \s-1POST\s0 as the method for all \s-1HTTP\s0 requests and send the specified datain the request body. \f(CW\*(C`\-\-post\-data\*(C'\fR sends \fIstring\fR as data,whereas \f(CW\*(C`\-\-post\-file\*(C'\fR sends the contents of \fIfile\fR. Other thanthat, they work in exactly the same way..SpPlease be aware that Wget needs to know the size of the \s-1POST\s0 data inadvance. Therefore the argument to \f(CW\*(C`\-\-post\-file\*(C'\fR must be a regularfile; specifying a \s-1FIFO\s0 or something like \fI/dev/stdin\fR won't work.It's not quite clear how to work around this limitation inherent in\&\s-1HTTP/1\s0.0. Although \s-1HTTP/1\s0.1 introduces \fIchunked\fR transfer thatdoesn't require knowing the request length in advance, a client can'tuse chunked unless it knows it's talking to an \s-1HTTP/1\s0.1 server. And itcan't know that until it receives a response, which in turn requires therequest to have been completed \*(-- a chicken-and-egg problem..SpNote: if Wget is redirected after the \s-1POST\s0 request is completed, itwill not send the \s-1POST\s0 data to the redirected \s-1URL\s0. This is becauseURLs that process \s-1POST\s0 often respond with a redirection to a regularpage, which does not desire or accept \s-1POST\s0. It is not completelyclear that this behavior is optimal; if it doesn't work out, it mightbe changed in the future..SpThis example shows how to log to a server using \s-1POST\s0 and then proceed todownload the desired pages, presumably only accessible to authorizedusers:.Sp.Vb 4\& # Log in to the server. This can be done only once.\& wget --save-cookies cookies.txt \e\& --post-data 'user=foo&password=bar' \e\& http://server.com/auth.php.Ve.Sp.Vb 3\& # Now grab the page or pages we care about.\& wget --load-cookies cookies.txt \e\& -p http://server.com/interesting/article.php.Ve.SpIf the server is using session cookies to track user authentication,the above will not work because \fB\-\-save\-cookies\fR will not savethem (and neither will browsers) and the \fIcookies.txt\fR file willbe empty. In that case use \fB\-\-keep\-session\-cookies\fR along with\&\fB\-\-save\-cookies\fR to force saving of session cookies..IP "\fB\-\-content\-disposition\fR" 4.IX Item "--content-disposition"If this is set to on, experimental (not fully\-functional) support for\&\f(CW\*(C`Content\-Disposition\*(C'\fR headers is enabled. This can currently result inextra round-trips to the server for a \f(CW\*(C`HEAD\*(C'\fR request, and is knownto suffer from a few bugs, which is why it is not currently enabled by default..SpThis option is useful for some file-downloading \s-1CGI\s0 programs that use\&\f(CW\*(C`Content\-Disposition\*(C'\fR headers to describe what the name of a
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -