⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 manual

📁 harvest是一个下载html网页得机器人
💻
📖 第 1 页 / 共 3 页
字号:
  referrer to be used on the command line. It is especially useful to  fool or trick stupid servers or CGI scripts that rely on that information  being available or contain certain data.        curl -e www.coolsite.com http://www.showme.com/  NOTE: The referer field is defined in the HTTP spec to be a full URL.USER AGENT  A HTTP request has the option to include information about the browser  that generated the request. Curl allows it to be specified on the command  line. It is especially useful to fool or trick stupid servers or CGI  scripts that only accept certain browsers.  Example:  curl -A 'Mozilla/3.0 (Win95; I)' http://www.nationsbank.com/  Other common strings:    'Mozilla/3.0 (Win95; I)'     Netscape Version 3 for Windows 95    'Mozilla/3.04 (Win95; U)'    Netscape Version 3 for Windows 95    'Mozilla/2.02 (OS/2; U)'     Netscape Version 2 for OS/2    'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)'           NS for AIX    'Mozilla/4.05 [en] (X11; U; Linux 2.0.32 i586)'      NS for Linux  Note that Internet Explorer tries hard to be compatible in every way:    'Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)'    MSIE for W95  Mozilla is not the only possible User-Agent name:    'Konqueror/1.0'             KDE File Manager desktop client    'Lynx/2.7.1 libwww-FM/2.14' Lynx command line browserCOOKIES  Cookies are generally used by web servers to keep state information at the  client's side. The server sets cookies by sending a response line in the  headers that looks like 'Set-Cookie: <data>' where the data part then  typically contains a set of NAME=VALUE pairs (separated by semicolons ';'  like "NAME1=VALUE1; NAME2=VALUE2;"). The server can also specify for what  path the "cookie" should be used for (by specifying "path=value"), when the  cookie should expire ("expire=DATE"), for what domain to use it  ("domain=NAME") and if it should be used on secure connections only  ("secure").  If you've received a page from a server that contains a header like:        Set-Cookie: sessionid=boo123; path="/foo";  it means the server wants that first pair passed on when we get anything in  a path beginning with "/foo".  Example, get a page that wants my name passed in a cookie:        curl -b "name=Daniel" www.sillypage.com  Curl also has the ability to use previously received cookies in following  sessions. If you get cookies from a server and store them in a file in a  manner similar to:        curl --dump-header headers www.example.com  ... you can then in a second connect to that (or another) site, use the  cookies from the 'headers' file like:        curl -b headers www.example.com  While saving headers to a file is a working way to store cookies, it is  however error-prone and not the prefered way to do this. Instead, make curl  save the incoming cookies using the well-known netscape cookie format like  this:        curl -c cookies.txt www.example.com  Note that by specifying -b you enable the "cookie awareness" and with -L  you can make curl follow a location: (which often is used in combination  with cookies). So that if a site sends cookies and a location, you can  use a non-existing file to trigger the cookie awareness like:        curl -L -b empty.txt www.example.com  The file to read cookies from must be formatted using plain HTTP headers OR  as netscape's cookie file. Curl will determine what kind it is based on the  file contents.  In the above command, curl will parse the header and store  the cookies received from www.example.com.  curl will send to the server the  stored cookies which match the request as it follows the location.  The  file "empty.txt" may be a non-existant file.  Alas, to both read and write cookies from a netscape cookie file, you can  set both -b and -c to use the same file:        curl -b cookies.txt -c cookies.txt www.example.comPROGRESS METER  The progress meter exists to show a user that something actually is  happening. The different fields in the output have the following meaning:  % Total    % Received % Xferd  Average Speed          Time             Curr.                                 Dload  Upload Total    Current  Left    Speed  0  151M    0 38608    0     0   9406      0  4:41:43  0:00:04  4:41:39  9287  From left-to-right:   %             - percentage completed of the whole transfer   Total         - total size of the whole expected transfer   %             - percentage completed of the download   Received      - currently downloaded amount of bytes   %             - percentage completed of the upload   Xferd         - currently uploaded amount of bytes   Average Speed   Dload         - the average transfer speed of the download   Average Speed   Upload        - the average transfer speed of the upload   Time Total    - expected time to complete the operation   Time Current  - time passed since the invoke   Time Left     - expected time left to completetion   Curr.Speed    - the average transfer speed the last 5 seconds (the first                   5 seconds of a transfer is based on less time of course.)  The -# option will display a totally different progress bar that doesn't  need much explanation!SPEED LIMIT  Curl allows the user to set the transfer speed conditions that must be met  to let the transfer keep going. By using the switch -y and -Y you  can make curl abort transfers if the transfer speed is below the specified  lowest limit for a specified time.  To have curl abort the download if the speed is slower than 3000 bytes per  second for 1 minute, run:        curl -Y 3000 -y 60 www.far-away-site.com  This can very well be used in combination with the overall time limit, so  that the above operatioin must be completed in whole within 30 minutes:        curl -m 1800 -Y 3000 -y 60 www.far-away-site.com  Forcing curl not to transfer data faster than a given rate is also possible,  which might be useful if you're using a limited bandwidth connection and you  don't want your transfer to use all of it (sometimes referred to as  "bandwith throttle").  Make curl transfer data no faster than 10 kilobytes per second:        curl --limit-rate 10K www.far-away-site.com    or        curl --limit-rate 10240 www.far-away-site.com  Or prevent curl from uploading data faster than 1 megabyte per second:        curl -T upload --limit-rate 1M ftp://uploadshereplease.com  When using the --limit-rate option, the transfer rate is regulated on a  per-second basis, which will cause the total transfer speed to become lower  than the given number. Sometimes of course substantially lower, if your  transfer stalls during periods.CONFIG FILE  Curl automatically tries to read the .curlrc file (or _curlrc file on win32  systems) from the user's home dir on startup.  The config file could be made up with normal command line switches, but you  can also specify the long options without the dashes to make it more  readable. You can separate the options and the parameter with spaces, or  with = or :. Comments can be used within the file. If the first letter on a  line is a '#'-letter the rest of the line is treated as a comment.  If you want the parameter to contain spaces, you must inclose the entire  parameter within double quotes ("). Within those quotes, you specify a  quote as \".  NOTE: You must specify options and their arguments on the same line.  Example, set default time out and proxy in a config file:        # We want a 30 minute timeout:        -m 1800        # ... and we use a proxy for all accesses:        proxy = proxy.our.domain.com:8080  White spaces ARE significant at the end of lines, but all white spaces  leading up to the first characters of each line are ignored.  Prevent curl from reading the default file by using -q as the first command  line parameter, like:        curl -q www.thatsite.com  Force curl to get and display a local help page in case it is invoked  without URL by making a config file similar to:        # default url to get        url = "http://help.with.curl.com/curlhelp.html"  You can specify another config file to be read by using the -K/--config  flag. If you set config file name to "-" it'll read the config from stdin,  which can be handy if you want to hide options from being visible in process  tables etc:        echo "user = user:passwd" | curl -K - http://that.secret.site.comEXTRA HEADERS  When using curl in your own very special programs, you may end up needing  to pass on your own custom headers when getting a web page. You can do  this by using the -H flag.  Example, send the header "X-you-and-me: yes" to the server when getting a  page:        curl -H "X-you-and-me: yes" www.love.com  This can also be useful in case you want curl to send a different text in a  header than it normally does. The -H header you specify then replaces the  header curl would normally send. If you replace an internal header with an  empty one, you prevent that header from being sent. To prevent the Host:  header from being used:        curl -H "Host:" www.server.comFTP and PATH NAMES  Do note that when getting files with the ftp:// URL, the given path is  relative the directory you enter. To get the file 'README' from your home  directory at your ftp site, do:        curl ftp://user:passwd@my.site.com/README  But if you want the README file from the root directory of that very same  site, you need to specify the absolute file name:        curl ftp://user:passwd@my.site.com//README  (I.e with an extra slash in front of the file name.)FTP and firewalls  The FTP protocol requires one of the involved parties to open a second  connction as soon as data is about to get transfered. There are two ways to  do this.  The default way for curl is to issue the PASV command which causes the  server to open another port and await another connection performed by the  client. This is good if the client is behind a firewall that don't allow  incoming connections.        curl ftp.download.com  If the server for example, is behind a firewall that don't allow connections  on other ports than 21 (or if it just doesn't support the PASV command), the  other way to do it is to use the PORT command and instruct the server to  connect to the client on the given (as parameters to the PORT command) IP  number and port.  The -P flag to curl supports a few different options. Your machine may have  several IP-addresses and/or network interfaces and curl allows you to select  which of them to use. Default address can also be used:        curl -P - ftp.download.com  Download with PORT but use the IP address of our 'le0' interface (this does  not work on windows):        curl -P le0 ftp.download.com  Download with PORT but use 192.168.0.10 as our IP address to use:        curl -P 192.168.0.10 ftp.download.comNETWORK INTERFACE  Get a web page from a server using a specified port for the interface:	curl --interface eth0:1 http://www.netscape.com/  or	curl --interface 192.168.1.10 http://www.netscape.com/HTTPS  Secure HTTP requires SSL libraries to be installed and used when curl is  built. If that is done, curl is capable of retrieving and posting documents  using the HTTPS procotol.  Example:        curl https://www.secure-site.com  Curl is also capable of using your personal certificates to get/post files

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -