⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 faq

📁 功能最强大的网络爬虫,希望大家好好学习啊,好好研究啊
💻
📖 第 1 页 / 共 3 页
字号:
  Yes, SOCKS5 is supported.3. Usage problems  3.1 curl: (1) SSL is disabled, https: not supported  If you get this output when trying to get anything from a https:// server,  it means that the configure script couldn't find all libs and include files  it requires for SSL to work. If the configure script fails to find them,  curl is simply built without SSL support.  To get the https:// support into a curl that was previously built but that  reports that https:// is not supported, you should dig through the document  and logs and check out why the configure script doesn't find the SSL libs  and/or include files.  Also, check out the other paragraph in this FAQ labeled "configure doesn't  find OpenSSL even when it is installed".  3.2 How do I tell curl to resume a transfer?  Curl supports resumed transfers both ways on both FTP and HTTP.  Try the -C option.  3.3 Why doesn't my posting using -F work?  You can't simply use -F or -d at your choice. The web server that will  receive your post assumes one of the formats. If the form you're trying to  "fake" sets the type to 'multipart/form-data', then and only then you must  use the -F type. In all the most common cases, you should use -d which then  causes a posting with the type 'application/x-www-form-urlencoded'.  This is described in some detail in the MANUAL and TheArtOfHttpScripting  documents, and if you don't understand it the first time, read it again  before you post questions about this to the mailing list. Also, try reading  through the mailing list archives for old postings and questions regarding  this.  3.4 How do I tell curl to run custom FTP commands?  You can tell curl to perform optional commands both before and/or after a  file transfer. Study the -Q/--quote option.  Since curl is used for file transfers, you don't use curl to just perform  FTP commands without transferring anything. Therefore you must always specify  a URL to transfer to/from even when doing custom FTP commands.  3.5 How can I disable the Pragma: nocache header?  You can change all internally generated headers by adding a replacement with  the -H/--header option. By adding a header with empty contents you safely  disable that one. Use -H "Pragma:" to disable that specific header.  3.6 Does curl support ASP, XML, XHTML or HTML version Y?  To curl, all contents are alike. It doesn't matter how the page was  generated. It may be ASP, PHP, Perl, shell-script, SSI or plain  HTML-files. There's no difference to curl and it doesn't even know what kind  of language that generated the page.  See also item 3.14 regarding javascript.  3.7 Can I use curl to delete/rename a file through FTP?  Yes. You specify custom FTP commands with -Q/--quote.  One example would be to delete a file after you have downloaded it:     curl -O ftp://download.com/coolfile -Q '-DELE coolfile'  3.8 How do I tell curl to follow HTTP redirects?  Curl does not follow so-called redirects by default. The Location: header  that informs the client about this is only interpreted if you're using the  -L/--location option. As in:     curl -L http://redirector.com  Not all redirects are HTTP ones, see 4.14  3.9 How do I use curl in my favorite programming language?  There exist many language interfaces/bindings for curl that integrates it  better with various languages. If you are fluid in a script language, you  may very well opt to use such an interface instead of using the command line  tool.  Find out more about which languages that support curl directly, and how to  install and use them, in the libcurl section of the curl web site:  http://curl.haxx.se/libcurl/  In February 2003, there are interfaces available for the following  languages: Basic, C, C++, Cocoa, Dylan, Euphoria, Java, Lua, Object-Pascal,  Pascal, Perl, PHP, PostgreSQL, Python, Rexx, Ruby, Scheme and Tcl. By the  time you read this, additional ones may have appeared!  3.10 What about SOAP, WebDAV, XML-RPC or similar protocols over HTTP?  Curl adheres to the HTTP spec, which basically means you can play with *any*  protocol that is built on top of HTTP. Protocols such as SOAP, WEBDAV and  XML-RPC are all such ones. You can use -X to set custom requests and -H to  set custom headers (or replace internally generated ones).  Using libcurl is of course just as fine and you'd just use the proper  library options to do the same.  3.11 How do I POST with a different Content-Type?  You can always replace the internally generated headers with -H/--header.  To make a simple HTTP POST with text/xml as content-type, do something like:        curl -d "datatopost" -H "Content-Type: text/xml" [URL]  3.12 Why do FTP specific features over HTTP proxy fail?  Because when you use a HTTP proxy, the protocol spoken on the network will  be HTTP, even if you specify a FTP URL. This effectively means that you  normally can't use FTP specific features such as FTP upload and FTP quote  etc.  There is one exception to this rule, and that is if you can "tunnel through"  the given HTTP proxy. Proxy tunneling is enabled with a special option (-p)  and is generally not available as proxy admins usually disable tunneling to  other ports than 443 (which is used for HTTPS access through proxies).  3.13 Why does my single/double quotes fail?  To specify a command line option that includes spaces, you might need to  put the entire option within quotes. Like in:   curl -d " with spaces " url.com  or perhaps   curl -d ' with spaces ' url.com  Exactly what kind of quotes and how to do this is entirely up to the shell  or command line interpreter that you are using. For most unix shells, you  can more or less pick either single (') or double (") quotes. For  Windows/DOS prompts I believe you're forced to use double (") quotes.  Please study the documentation for your particular environment. Examples in  the curl docs will use a mix of both these ones as shown above. You must  adjust them to work in your environment.  Remember that curl works and runs on more operating systems than most single  individuals have ever tried.  3.14 Does curl support javascript or pac (automated proxy config)?  Many web pages do magic stuff using embedded javascript. Curl and libcurl  have no built-in support for that, so it will be treated just like any other  contents.  .pac files are a netscape invention and are sometimes used by organizations  to allow them to differentiate which proxies to use. The .pac contents is  just a javascript program that gets invoked by the browser and that returns  the name of the proxy to connect to. Since curl doesn't support javascript,  it can't support .pac proxy configuration either.  Some work-arounds usually suggested to overcome this javascript dependency:  - Depending on the javascript complexity, write up a script that    translates it to another language and execute that.  - Read the javascript code and rewrite the same logic in another language.  - Implement a javascript interpreter, people have successfully used the    Mozilla javascript engine in the past.  - Ask your admins to stop this, for a static proxy setup or similar.  3.15 Can I do recursive fetches with curl?  No. curl itself has no code that performs recursive operations, such as  those performed by wget and similar tools.  There exist wrapper scripts with that functionality (for example the  curlmirror perl script), and you can write programs based on libcurl to do  it, but the command line tool curl itself cannot.  3.16 What certificates do I need when I use SSL?  There are three different kinds of "certificates" to keep track of when we  talk about using SSL-based protocols (HTTPS or FTPS) using curl or libcurl.  - Client certificate. The server you communicate may require that you can    provide this in order to prove that you actually are who you claim to be.    If the server doesn't require this, you don't need a client certificate.  - Server certificate. The server you communicate with has a server    certificate. You can and should verify this certificate to make sure that    you are truly talking to the real server and not a server impersonating    it.  - Certificate Authority certificate ("CA cert"). You often have several CA    certs in a CA cert bundle that can be used to verify a server certificate    that was signed by one of the authorities in the bundle. curl comes with a    default CA cert bundle. You can override the default.    The server certificate verification process is made by using a Certificate    Authority certificate ("CA cert") that was used to sign the server    certificate. Server certificate verification is enabled by default in curl    and libcurl and is often the reason for problems as explained in FAQ entry    4.12 and the SSLCERTS document    (http://curl.haxx.se/docs/sslcerts.html). Server certificates that are    "self-signed" or otherwise signed by a CA that you do not have a CA cert    for, cannot be verified. If the verification during a connect fails, you    are refused access. You then need to explicitly disable the verification    to connect to the server.  3.17 How do I list the root dir of an FTP server?  There are two ways. The way defined in the RFC is to use an encoded slash  in the first path part. List the "/tmp" dir like this:     curl ftp://ftp.sunet.se/%2ftmp/  or the not-quite-kosher-but-more-readable way, by simply starting the path  section of the URL with a slash:     curl ftp://ftp.sunet.se//tmp/  3.18 Can I use curl to send a POST/PUT and not wait for a response?   No.  But you could easily write your own program using libcurl to do such stunts.4. Running Problems  4.1 Problems connecting to SSL servers.  It took a very long time before we could sort out why curl had problems to  connect to certain SSL servers when using SSLeay or OpenSSL v0.9+.  The  error sometimes showed up similar to:  16570:error:1407D071:SSL routines:SSL2_READ:bad mac decode:s2_pkt.c:233:  It turned out to be because many older SSL servers don't deal with SSLv3  requests properly. To correct this problem, tell curl to select SSLv2 from  the command line (-2/--sslv2).  There have also been examples where the remote server didn't like the SSLv2  request and instead you had to force curl to use SSLv3 with -3/--sslv3.  4.2 Why do I get problems when I use & or % in the URL?  In general unix shells, the & letter is treated special and when used, it  runs the specified command in the background. To safely send the & as a part  of a URL, you should quote the entire URL by using single (') or double (")  quotes around it.  An example that would invoke a remote CGI that uses &-letters could be:     curl 'http://www.altavista.com/cgi-bin/query?text=yes&q=curl'  In Windows, the standard DOS shell treats the %-letter specially and you  need to use TWO %-letters for each single one you want to use in the URL.  Also note that if you want the literal %-letter to be part of the data you  pass in a POST using -d/--data you must encode it as '%25' (which then also  needs the %-letter doubled on Windows machines).  4.3 How can I use {, }, [ or ] to specify multiple URLs?  Because those letters have a special meaning to the shell, and to be used in  a URL specified to curl you must quote them.  An example that downloads two URLs (sequentially) would do:    curl '{curl,www}.haxx.se'  To be able to use those letters as actual parts of the URL (without using  them for the curl URL "globbing" system), use the -g/--globoff option:    curl -g 'www.site.com/weirdname[].html'  4.4 Why do I get downloaded data even though the web page doesn't exist?  Curl asks remote servers for the page you specify. If the page doesn't exist  at the server, the HTTP protocol defines how the server should respond and  that means that headers and a "page" will be returned. That's simply how  HTTP works.  By using the --fail option you can tell curl explicitly to not get any data  if the HTTP return code doesn't say success.  4.5 Why do I get return code XXX from a HTTP server?  RFC2616 clearly explains the return codes. This is a short transcript. Go  read the RFC for exact details:    4.5.1 "400 Bad Request"    The request could not be understood by the server due to malformed    syntax. The client SHOULD NOT repeat the request without modifications.    4.5.2 "401 Unauthorized"    The request requires user authentication.    4.5.3 "403 Forbidden"    The server understood the request, but is refusing to fulfill it.    Authorization will not help and the request SHOULD NOT be repeated.    4.5.4 "404 Not Found"    The server has not found anything matching the Request-URI. No indication    is given of whether the condition is temporary or permanent.    4.5.5 "405 Method Not Allowed"    The method specified in the Request-Line is not allowed for the resource    identified by the Request-URI. The response MUST include an Allow header    containing a list of valid methods for the requested resource.    4.5.6 "301 Moved Permanently"    If you get this return code and an HTML output similar to this:       <H1>Moved Permanently</H1> The document has moved <A       HREF="http://same_url_now_with_a_trailing_slash/">here</A>.    it might be because you request a directory URL but without the trailing    slash. Try the same operation again _with_ the trailing URL, or use the    -L/--location option to follow the redirection.  4.6 Can you tell me what error code 142 means?  All curl error codes are described at the end of the man page, in the  section called "EXIT CODES".  Error codes that are larger than the highest documented error code means  that curl has exited due to a crash. This is a serious error, and we  appreciate a detailed bug report from you that describes how we could go  ahead and repeat this!  4.7 How do I keep user names and passwords secret in Curl command lines?  This problem has two sides:  The first part is to avoid having clear-text passwords in the command line  so that they don't appear in 'ps' outputs and similar. That is easily  avoided by using the "-K" option to tell curl to read parameters from a file  or stdin to which you can pass the secret info. curl itself will also  attempt to "hide" the given password by blanking out the option - this  doesn't work on all platforms.  To keep the passwords in your account secret from the rest of the world is  not a task that curl addresses. You could of course encrypt them somehow to  at least hide them from being read by human eyes, but that is not what  anyone would call security.  Also note that regular HTTP (using Basic authentication) and FTP passwords  are sent in clear across the network. All it takes for anyone to fetch them  is to listen on the network.  Eavesdropping is very easy. Use more secure  authentication methods (like Digest, Negotiate or even NTLM) or consider the  SSL-based alternatives HTTPS and FTPS.  4.8 I found a bug!  It is not a bug if the behavior is documented. Read the docs first.  Especially check out the KNOWN_BUGS file, it may be a documented bug!  If it is a problem with a binary you've downloaded or a package for your  particular platform, try contacting the person who built the package/archive

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -