⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 manual

📁 harvest是一个下载html网页得机器人
💻
📖 第 1 页 / 共 3 页
字号:
  from sites that require valid certificates. The only drawback is that the  certificate needs to be in PEM-format. PEM is a standard and open format to  store certificates with, but it is not used by the most commonly used  browsers (Netscape and MSIE both use the so called PKCS#12 format). If you  want curl to use the certificates you use with your (favourite) browser, you  may need to download/compile a converter that can convert your browser's  formatted certificates to PEM formatted ones. This kind of converter is  included in recent versions of OpenSSL, and for older versions Dr Stephen  N. Henson has written a patch for SSLeay that adds this functionality. You  can get his patch (that requires an SSLeay installation) from his site at:  http://www.drh-consultancy.demon.co.uk/  Example on how to automatically retrieve a document using a certificate with  a personal password:        curl -E /path/to/cert.pem:password https://secure.site.com/  If you neglect to specify the password on the command line, you will be  prompted for the correct password before any data can be received.  Many older SSL-servers have problems with SSLv3 or TLS, that newer versions  of OpenSSL etc is using, therefore it is sometimes useful to specify what  SSL-version curl should use. Use -3, -2 or -1 to specify that exact SSL  version to use (for SSLv3, SSLv2 or TLSv1 respectively):        curl -2 https://secure.site.com/  Otherwise, curl will first attempt to use v3 and then v2.  To use OpenSSL to convert your favourite browser's certificate into a PEM  formatted one that curl can use, do something like this (assuming netscape,  but IE is likely to work similarly):    You start with hitting the 'security' menu button in netscape.     Select 'certificates->yours' and then pick a certificate in the list     Press the 'export' button     enter your PIN code for the certs     select a proper place to save it     Run the 'openssl' application to convert the certificate. If you cd to the    openssl installation, you can do it like:     # ./apps/openssl pkcs12 -in [file you saved] -clcerts -out [PEMfile]RESUMING FILE TRANSFERS To continue a file transfer where it was previously aborted, curl supports resume on http(s) downloads as well as ftp uploads and downloads. Continue downloading a document:        curl -C - -o file ftp://ftp.server.com/path/file Continue uploading a document(*1):        curl -C - -T file ftp://ftp.server.com/path/file Continue downloading a document from a web server(*2):        curl -C - -o file http://www.server.com/ (*1) = This requires that the ftp server supports the non-standard command        SIZE. If it doesn't, curl will say so. (*2) = This requires that the web server supports at least HTTP/1.1. If it        doesn't, curl will say so.TIME CONDITIONS HTTP allows a client to specify a time condition for the document it requests. It is If-Modified-Since or If-Unmodified-Since. Curl allow you to specify them with the -z/--time-cond flag. For example, you can easily make a download that only gets performed if the remote file is newer than a local copy. It would be made like:        curl -z local.html http://remote.server.com/remote.html Or you can download a file only if the local file is newer than the remote one. Do this by prepending the date string with a '-', as in:        curl -z -local.html http://remote.server.com/remote.html You can specify a "free text" date as condition. Tell curl to only download the file if it was updated since yesterday:        curl -z yesterday http://remote.server.com/remote.html Curl will then accept a wide range of date formats. You always make the date check the other way around by prepending it with a dash '-'.DICT  For fun try        curl dict://dict.org/m:curl        curl dict://dict.org/d:heisenbug:jargon        curl dict://dict.org/d:daniel:web1913  Aliases for 'm' are 'match' and 'find', and aliases for 'd' are 'define'  and 'lookup'. For example,        curl dict://dict.org/find:curl  Commands that break the URL description of the RFC (but not the DICT  protocol) are        curl dict://dict.org/show:db        curl dict://dict.org/show:strat  Authentication is still missing (but this is not required by the RFC)LDAP  If you have installed the OpenLDAP library, curl can take advantage of it  and offer ldap:// support.  LDAP is a complex thing and writing an LDAP query is not an easy task. I do  advice you to dig up the syntax description for that elsewhere. Two places  that might suit you are:  Netscape's "Netscape Directory SDK 3.0 for C Programmer's Guide Chapter 10:  Working with LDAP URLs":  http://developer.netscape.com/docs/manuals/dirsdk/csdk30/url.htm  RFC 2255, "The LDAP URL Format" http://www.rfc-editor.org/rfc/rfc2255.txt  To show you an example, this is now I can get all people from my local LDAP  server that has a certain sub-domain in their email address:        curl -B "ldap://ldap.frontec.se/o=frontec??sub?mail=*sth.frontec.se"  If I want the same info in HTML format, I can get it by not using the -B  (enforce ASCII) flag.ENVIRONMENT VARIABLES  Curl reads and understands the following environment variables:        http_proxy, HTTPS_PROXY, FTP_PROXY, GOPHER_PROXY  They should be set for protocol-specific proxies. General proxy should be  set with                ALL_PROXY  A comma-separated list of host names that shouldn't go through any proxy is  set in (only an asterisk, '*' matches all hosts)        NO_PROXY  If a tail substring of the domain-path for a host matches one of these  strings, transactions with that node will not be proxied.  The usage of the -x/--proxy flag overrides the environment variables.NETRC  Unix introduced the .netrc concept a long time ago. It is a way for a user  to specify name and password for commonly visited ftp sites in a file so  that you don't have to type them in each time you visit those sites. You  realize this is a big security risk if someone else gets hold of your  passwords, so therefor most unix programs won't read this file unless it is  only readable by yourself (curl doesn't care though).  Curl supports .netrc files if told so (using the -n/--netrc and  --netrc-optional options). This is not restricted to only ftp,  but curl can use it for all protocols where authentication is used.  A very simple .netrc file could look something like:        machine curl.haxx.se login iamdaniel password mysecretCUSTOM OUTPUT  To better allow script programmers to get to know about the progress of  curl, the -w/--write-out option was introduced. Using this, you can specify  what information from the previous transfer you want to extract.  To display the amount of bytes downloaded together with some text and an  ending newline:        curl -w 'We downloaded %{size_download} bytes\n' www.download.comKERBEROS4 FTP TRANSFER  Curl supports kerberos4 for FTP transfers. You need the kerberos package  installed and used at curl build time for it to be used.  First, get the krb-ticket the normal way, like with the kauth tool. Then use  curl in way similar to:        curl --krb4 private ftp://krb4site.com -u username:fakepwd  There's no use for a password on the -u switch, but a blank one will make  curl ask for one and you already entered the real password to kauth.TELNET  The curl telnet support is basic and very easy to use. Curl passes all data  passed to it on stdin to the remote server. Connect to a remote telnet  server using a command line similar to:        curl telnet://remote.server.com  And enter the data to pass to the server on stdin. The result will be sent  to stdout or to the file you specify with -o.  You might want the -N/--no-buffer option to switch off the buffered output  for slow connections or similar.  Pass options to the telnet protocol negotiation, by using the -t option. To  tell the server we use a vt100 terminal, try something like:        curl -tTTYPE=vt100 telnet://remote.server.com  Other interesting options for it -t include:   - XDISPLOC=<X display> Sets the X display location.   - NEW_ENV=<var,val> Sets an environment variable.  NOTE: the telnet protocol does not specify any way to login with a specified  user and password so curl can't do that automatically. To do that, you need  to track when the login prompt is received and send the username and  password accordingly.PERSISTANT CONNECTIONS  Specifying multiple files on a single command line will make curl transfer  all of them, one after the other in the specified order.  libcurl will attempt to use persistant connections for the transfers so that  the second transfer to the same host can use the same connection that was  already initiated and was left open in the previous transfer. This greatly  decreases connection time for all but the first transfer and it makes a far  better use of the network.  Note that curl cannot use persistant connections for transfers that are used  in subsequence curl invokes. Try to stuff as many URLs as possible on the  same command line if they are using the same host, as that'll make the  transfers faster. If you use a http proxy for file transfers, practicly  all transfers will be persistant.  Persistant connections were introduced in curl 7.7.MAILING LISTS  For your convenience, we have several open mailing lists to discuss curl,  its development and things relevant to this. Get all info at  http://curl.haxx.se/mail/. The lists available are:  curl-users    Users of the command line tool. How to use it, what doesn't work, new    features, related tools, questions, news, installations, compilations,    running, porting etc.  curl-library    Developers using or developing libcurl. Bugs, extensions, improvements.  curl-announce    Low-traffic. Only announcements of new public versions.  curl-and-PHP    Using the curl functions in PHP. Everything curl with a PHP angle. Or PHP    with a curl angle.  curl-commits    Receives notifications on all CVS commits done to the curl source module.    This can become quite a large amount of mails during intense development,    be aware. This is for us who like email...  curl-www-commits    Receives notifications on all CVS commits done to the curl www module    (basicly the web site).  This can become quite a large amount of mails    during intense changing, be aware. This is for us who like email...  Please direct curl questions, feature requests and trouble reports to one of  these mailing lists instead of mailing any individual.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -