📄 libcurl-the-guide

📁 harvest是一个下载html网页得机器人
💻
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
    "chunked" upload, even though the size of the data to upload might be    known. By default, libcurl usually switches over to chunked upload    automaticly if the upload data size is unknown.  HTTP Version    There's only one aspect left in the HTTP requests that we haven't yet    mentioned how to modify: the version field. All HTTP requests includes the    version number to tell the server which version we support. libcurl speak    HTTP 1.1 by default. Some very old servers don't like getting 1.1-requests    and when dealing with stubborn old things like that, you can tell libcurl    to use 1.0 instead by doing something like this:       curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION,                                    CURLHTTP_VERSION_1_0);  FTP Custom Commands    Not all protocols are HTTP-like, and thus the above may not help you when    you want to make for example your FTP transfers to behave differently.    Sending custom commands to a FTP server means that you need to send the    comands exactly as the FTP server expects them (RFC959 is a good guide    here), and you can only use commands that work on the control-connection    alone. All kinds of commands that requires data interchange and thus needs    a data-connection must be left to libcurl's own judgement. Also be aware    that libcurl will do its very best to change directory to the target    directory before doing any transfer, so if you change directory (with CWD    or similar) you might confuse libcurl and then it might not attempt to    transfer the file in the correct remote directory.    A little example that deletes a given file before an operation:      headers = curl_slist_append(headers, "DELE file-to-remove");      /* pass the list of custom commands to the handle */      curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers);      curl_easy_perform(easyhandle); /* transfer ftp data! */      curl_slist_free_all(headers); /* free the header list */    If you would instead want this operation (or chain of operations) to    happen _after_ the data transfer took place the option to    curl_easy_setopt() would instead be called CURLOPT_POSTQUOTE and used the    exact same way.    The custom FTP command will be issued to the server in the same order they    are added to the list, and if a command gets an error code returned back    from the server, no more commands will be issued and libcurl will bail out    with an error code (CURLE_FTP_QUOTE_ERROR). Note that if you use    CURLOPT_QUOTE to send commands before a transfer, no transfer will    actually take place when a quote command has failed.    If you set the CURLOPT_HEADER to true, you will tell libcurl to get    information about the target file and output "headers" about it. The    headers will be in "HTTP-style", looking like they do in HTTP.    The option to enable headers or to run custom FTP commands may be useful    to combine with CURLOPT_NOBODY. If this option is set, no actual file    content transfer will be performed.  FTP Custom CUSTOMREQUEST    If you do what list the contents of a FTP directory using your own defined    FTP command, CURLOPT_CUSTOMREQUEST will do just that. "NLST" is the    default one for listing directories but you're free to pass in your idea    of a good alternative.Cookies Without Chocolate Chips In the HTTP sense, a cookie is a name with an associated value. A server sends the name and value to the client, and expects it to get sent back on every subsequent request to the server that matches the particular conditions set. The conditions include that the domain name and path match and that the cookie hasn't become too old. In real-world cases, servers send new cookies to replace existing one to update them. Server use cookies to "track" users and to keep "sessions". Cookies are sent from server to clients with the header Set-Cookie: and they're sent from clients to servers with the Cookie: header. To just send whatever cookie you want to a server, you can use CURLOPT_COOKIE to set a cookie string like this:    curl_easy_setopt(easyhandle, CURLOPT_COOKIE, "name1=var1; name2=var2;"); In many cases, that is not enough. You might want to dynamicly save whatever cookies the remote server passes to you, and make sure those cookies are then use accordingly on later requests. One way to do this, is to save all headers you receive in a plain file and when you make a request, you tell libcurl to read the previous headers to figure out which cookies to use. Set header file to read cookies from with CURLOPT_COOKIEFILE. The CURLOPT_COOKIEFILE option also automaticly enables the cookie parser in libcurl. Until the cookie parser is enabled, libcurl will not parse or understand incoming cookies and they will just be ignored. However, when the parser is enabled the cookies will be understood and the cookies will be kept in memory and used properly in subsequent requests when the same handle is used. Many times this is enough, and you may not have to save the cookies to disk at all. Note that the file you specify to CURLOPT_COOKIEFILE doesn't have to exist to enable the parser, so a common way to just enable the parser and not read able might be to use a file name you know doesn't exist. If you rather use existing cookies that you've previously received with your Netscape or Mozilla browsers, you can make libcurl use that cookie file as input. The CURLOPT_COOKIEFILE is used for that too, as libcurl will automaticly find out what kind of file it is and act accordingly. The perhaps most advanced cookie operation libcurl offers, is saving the entire internal cookie state back into a Netscape/Mozilla formatted cookie file. We call that the cookie-jar. When you set a file name with CURLOPT_COOKIEJAR, that file name will be created and all received cookies will be stored in it when curl_easy_cleanup() is called. This enabled cookies to get passed on properly between multiple handles without any information getting lost.FTP Peculiarities We Need FTP transfers use a second TCP/IP connection for the data transfer. This is usually a fact you can forget and ignore but at times this fact will come back to haunt you. libcurl offers several different ways to custom how the second connection is being made. libcurl can either connect to the server a second time or tell the server to connect back to it. The first option is the default and it is also what works best for all the people behind firewalls, NATs or IP-masquarading setups. libcurl then tells the server to open up a new port and wait for a second connection. This is by default attempted with EPSV first, and if that doesn't work it tries PASV instead. (EPSV is an extension to the original FTP spec and does not exist nor work on all FTP servers.) You can prevent libcurl from first trying the EPSV command by setting CURLOPT_FTP_USE_EPSV to FALSE. In some cases, you will prefer to have the server connect back to you for the second connection. This might be when the server is perhaps behind a firewall or something and only allows connections on a single port. libcurl then informs the remote server which IP address and port number to connect to. This is made with the CURLOPT_FTPPORT option. If you set it to "-", libcurl will use your system's "default IP address". If you want to use a particular IP, you can set the full IP address, a host name to resolve to an IP address or even a local network interface name that libcurl will get the IP address from.Headers Equal Fun Some protocols provide "headers", meta-data separated from the normal data. These headers are by default not included in the normal data stream, but you can make them appear in the data stream by setting CURLOPT_HEADER to TRUE. What might be even more useful, is libcurl's ability to separate the headers from the data and thus make the callbacks differ. You can for example set a different pointer to pass to the ordinary write callback by setting CURLOPT_WRITEHEADER. Or, you can set an entirely separate function to receive the headers, by using CURLOPT_HEADERFUNCTION. The headers are passed to the callback function one by one, and you can depend on that fact. It makes it easier for you to add custom header parsers etc. "Headers" for FTP transfers equal all the FTP server responses. They aren't actually true headers, but in this case we pretend they are! ;-)Post Transfer Information [ curl_easy_getinfo ]Security Considerations libcurl is in itself not insecure. If used the right way, you can use libcurl to transfer data pretty safely. There are of course many things to consider that may loosen up this situation:  Command Lines    If you use a command line tool (such as curl) that uses libcurl, and you    give option to the tool on the command line those options can very likely    get read by other users of your system when they use 'ps' or other tools    to list currently running processes.    To avoid this problem, never feed sensitive things to programs using    command line options.  .netrc    .netrc is a pretty handy file/feature that allows you to login quickly and    automaticly to frequently visited sites. The file contains passwords in    clear text and is a real security risk. In some cases, your .netrc is also    stored in a home directory that is NFS mounted or used on another network    based file system, so the clear text password will fly through your    network every time anyone reads that file!    To avoid this problem, don't use .netrc files and never store passwords in    plain text anywhere.  Clear Text Passwords    Many of the protocols libcurl supports send name and password unencrypted    as clear text (HTTP Basic authentication, FTP, TELNET etc). It is very    easy for anyone on your network or a network nearby yours, to just fire up    a network analyzer tool and evesdrop on your passwords. Don't let the fact    that HTTP uses base64 encoded passwords fool you. They may not look    readable at a first glance, but they very easily "deciphered" by anyone    within seconds.    To avoid this problem, use protocols that don't let snoopers see your    password: HTTPS, FTPS and FTP-kerberos are a few examples. HTTP Digest    authentication allows this too, but isn't supported by libcurl as of this    writing.  Showing What You Do    On a related issue, be aware that even in situations like when you have    problems with libcurl and ask somone for help, everything you reveal in    order to get best possible help might also impose certain security related    risks. Host names, user names, paths, operating system specifics etc (not    to mention passwords of course) may in fact be used by intruders to gain    additional information of a potential target.    To avoid this problem, you must of course use your common sense. Often,    you can just edit out the senstive data or just rearch/replace your true    information with faked data.SSL, Certificates and Other Tricks [ seeding, passwords, keys, certificates, ENGINE, ca certs ]Multiple Transfers Using the multi Interface The easy interface as described in detail in this document is a synchronous interface that transfers one file at a time and doesn't return until its done. The multi interface on the other hand, allows your program to transfer multiple files in both directions at the same time, without forcing you to use multiple threads. [fill in lots of more multi stuff here]Future [ sharing between handles, mutexes, pipelining ]-----Footnotes:[1] = libcurl 7.10.3 and later have the ability to switch over to chunked      Tranfer-Encoding in cases were HTTP uploads are done with data of an      unknown size.[2] = This happens on Windows machines when libcurl is built and used as a      DLL. However, you can still do this on Windows if you link with a static      library.[3] = The curl-config tool is generated at build-time (on unix-like systems)      and should be installed with the 'make install' or similar instruction      that installs the library, header files, man pages etc.
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -