📄 libcurl-tutorial.3
字号:
Example C++ code:.nfclass AClass { static size_t write_data(void *ptr, size_t size, size_t nmemb, void *ourpointer) { /* do what you want with the data */ } }.fi.SH "Proxies"What "proxy" means according to Merriam-Webster: "a person authorized to actfor another" but also "the agency, function, or office of a deputy who acts asa substitute for another".Proxies are exceedingly common these days. Companies often only offerInternet access to employees through their HTTP proxies. Network clients oruser-agents ask the proxy for documents, the proxy does the actual requestand then it returns them.libcurl has full support for HTTP proxies, so when a given URL is wanted,libcurl will ask the proxy for it instead of trying to connect to the actualhost identified in the URL.The fact that the proxy is a HTTP proxy puts certain restrictions on what canactually happen. A requested URL that might not be a HTTP URL will be stillbe passed to the HTTP proxy to deliver back to libcurl. This happenstransparently, and an application may not need to know. I say "may", becauseat times it is very important to understand that all operations over a HTTPproxy is using the HTTP protocol. For example, you can't invoke your owncustom FTP commands or even proper FTP directory listings..IP "Proxy Options"To tell libcurl to use a proxy at a given port number: curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");Some proxies require user authentication before allowing a request, and youpass that information similar to this: curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");If you want to, you can specify the host name only in the CURLOPT_PROXYoption, and set the port number separately with CURLOPT_PROXYPORT..IP "Environment Variables"libcurl automatically checks and uses a set of environment variables toknow what proxies to use for certain protocols. The names of the variablesare following an ancient de facto standard and are built up as"[protocol]_proxy" (note the lower casing). Which makes the variable'http_proxy' checked for a name of a proxy to use when the input URL isHTTP. Following the same rule, the variable named 'ftp_proxy' is checkedfor FTP URLs. Again, the proxies are always HTTP proxies, the differentnames of the variables simply allows different HTTP proxies to be used.The proxy environment variable contents should be in the format\&"[protocol://][user:password@]machine[:port]". Where the protocol:// part issimply ignored if present (so http://proxy and bluerk://proxy will do thesame) and the optional port number specifies on which port the proxy operateson the host. If not specified, the internal default port number will be usedand that is most likely *not* the one you would like it to be.There are two special environment variables. 'all_proxy' is what sets proxyfor any URL in case the protocol specific variable wasn't set, and\&'no_proxy' defines a list of hosts that should not use a proxy even though avariable may say so. If 'no_proxy' is a plain asterisk ("*") it matches allhosts..IP "SSL and Proxies"SSL is for secure point-to-point connections. This involves strong encryptionand similar things, which effectively makes it impossible for a proxy tooperate as a "man in between" which the proxy's task is, as previouslydiscussed. Instead, the only way to have SSL work over a HTTP proxy is to askthe proxy to tunnel trough everything without being able to check or fiddlewith the traffic.Opening an SSL connection over a HTTP proxy is therefor a matter of asking theproxy for a straight connection to the target host on a specified port. Thisis made with the HTTP request CONNECT. ("please mr proxy, connect me to thatremote host").Because of the nature of this operation, where the proxy has no idea what kindof data that is passed in and out through this tunnel, this breaks some of thevery few advantages that come from using a proxy, such as caching. Manyorganizations prevent this kind of tunneling to other destination port numbersthan 443 (which is the default HTTPS port number)..IP "Tunneling Through Proxy"As explained above, tunneling is required for SSL to work and often evenrestricted to the operation intended for SSL; HTTPS.This is however not the only time proxy-tunneling might offer benefits toyou or your application.As tunneling opens a direct connection from your application to the remotemachine, it suddenly also re-introduces the ability to do non-HTTPoperations over a HTTP proxy. You can in fact use things such as FTPupload or FTP custom commands this way.Again, this is often prevented by the administrators of proxies and israrely allowed.Tell libcurl to use proxy tunneling like this: curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE);In fact, there might even be times when you want to do plain HTTPoperations using a tunnel like this, as it then enables you to operate onthe remote server instead of asking the proxy to do so. libcurl will notstand in the way for such innovative actions either!.IP "Proxy Auto-Config"Netscape first came up with this. It is basically a web page (usually using a\&.pac extension) with a javascript that when executed by the browser with therequested URL as input, returns information to the browser on how to connectto the URL. The returned information might be "DIRECT" (which means no proxyshould be used), "PROXY host:port" (to tell the browser where the proxy forthis particular URL is) or "SOCKS host:port" (to direct the browser to a SOCKSproxy).libcurl has no means to interpret or evaluate javascript and thus it doesn'tsupport this. If you get yourself in a position where you face this nastyinvention, the following advice have been mentioned and used in the past:- Depending on the javascript complexity, write up a script that translates itto another language and execute that.- Read the javascript code and rewrite the same logic in another language.- Implement a javascript interpreted, people have successfully used theMozilla javascript engine in the past.- Ask your admins to stop this, for a static proxy setup or similar..SH "Persistence Is The Way to Happiness"Re-cycling the same easy handle several times when doing multiple requests isthe way to go.After each single \fIcurl_easy_perform(3)\fP operation, libcurl will keep theconnection alive and open. A subsequent request using the same easy handle tothe same host might just be able to use the already open connection! Thisreduces network impact a lot.Even if the connection is dropped, all connections involving SSL to the samehost again, will benefit from libcurl's session ID cache that drasticallyreduces re-connection time.FTP connections that are kept alive saves a lot of time, as the command-response round-trips are skipped, and also you don't risk getting blockedwithout permission to login again like on many FTP servers only allowing Npersons to be logged in at the same time.libcurl caches DNS name resolving results, to make lookups of a previouslylooked up name a lot faster.Other interesting details that improve performance for subsequent requestsmay also be added in the future.Each easy handle will attempt to keep the last few connections alive for awhile in case they are to be used again. You can set the size of this "cache"with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom anypoint in changing this value, and if you think of changing this it is oftenjust a matter of thinking again.When the connection cache gets filled, libcurl must close an existingconnection in order to get room for the new one. To know which connection toclose, libcurl uses a "close policy" that you can affect with theCURLOPT_CLOSEPOLICY option. There's only two polices implemented as of thiswriting (libcurl 7.9.4) and they are:.RS.IP CURLCLOSEPOLICY_LEAST_RECENTLY_USEDsimply close the one that hasn't been used for the longest time. This is thedefault behavior..IP CURLCLOSEPOLICY_OLDESTcloses the oldest connection, the one that was created the longest time ago..REThere are, or at least were, plans to support a close policy that would calla user-specified callback to let the user be able to decide which connectionto dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION anexisting option still today. Nothing ever uses this though and this will notbe used within the foreseeable future either.To force your upcoming request to not use an already existing connection (itwill even close one first if there happens to be one alive to the same hostyou're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECTto TRUE. In a similar spirit, you can also forbid the upcoming request to be"lying" around and possibly get re-used after the request by settingCURLOPT_FORBID_REUSE to TRUE..SH "HTTP Headers Used by libcurl"When you use libcurl to do HTTP requests, it'll pass along a series of headersautomatically. It might be good for you to know and understand these ones..IP "Host"This header is required by HTTP 1.1 and even many 1.0 servers and should bethe name of the server we want to talk to. This includes the port number ifanything but default..IP "Pragma"\&"no-cache". Tells a possible proxy to not grab a copy from the cache but tofetch a fresh one..IP "Accept"\&"*/*"..IP "Expect:"When doing multi-part formposts, libcurl will set this header to\&"100-continue" to ask the server for an "OK" message before it proceeds withsending the data part of the post..SH "Customizing Operations"There is an ongoing development today where more and more protocols are builtupon HTTP for transport. This has obvious benefits as HTTP is a tested andreliable protocol that is widely deployed and have excellent proxy-support.When you use one of these protocols, and even when doing other kinds ofprogramming you may need to change the traditional HTTP (or FTP or...)manners. You may need to change words, headers or various data.libcurl is your friend here too..IP CUSTOMREQUESTIf just changing the actual HTTP request keyword is what you want, like whenGET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST is therefor you. It is very simple to use: curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST");When using the custom request, you change the request keyword of the actualrequest you are performing. Thus, by default you make GET request but you canalso make a POST operation (as described before) and then replace the POSTkeyword if you want to. You're the boss..IP "Modify Headers"HTTP-like protocols pass a series of headers to the server when doing therequest, and you're free to pass any amount of extra headers that youthink fit. Adding headers are this easy:.nf struct curl_slist *headers=NULL; /* init to NULL is important */ headers = curl_slist_append(headers, "Hey-server-hey: how are you?"); headers = curl_slist_append(headers, "X-silly-content: yes"); /* pass our list of custom made headers */ curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); curl_easy_perform(easyhandle); /* transfer http */ curl_slist_free_all(headers); /* free the header list */.fi\&... and if you think some of the internally generated headers, such asAccept: or Host: don't contain the data you want them to contain, you canreplace them by simply setting them too:.nf headers = curl_slist_append(headers, "Accept: Agent-007"); headers = curl_slist_append(headers, "Host: munged.host.line");.fi.IP "Delete Headers"If you replace an existing header with one with no contents, you will preventthe header from being sent. Like if you want to completely prevent the\&"Accept:" header to be sent, you can disable it with code similar to this: headers = curl_slist_append(headers, "Accept:");Both replacing and canceling internal headers should be done with carefulconsideration and you should be aware that you may violate the HTTP protocolwhen doing so..IP "Enforcing chunked transfer-encoding"By making sure a request uses the custom header "Transfer-Encoding: chunked"when doing a non-GET HTTP operation, libcurl will switch over to "chunked"upload, even though the size of the data to upload might be known. By default,libcurl usually switches over to chunked upload automatically if the uploaddata size is unknown..IP "HTTP Version"There's only one aspect left in the HTTP requests that we haven't yetmentioned how to modify: the version field. All HTTP requests includes theversion number to tell the server which version we support. libcurl speak HTTP1.1 by default. Some very old servers don't like getting 1.1-requests and whendealing with stubborn old things like that, you can tell libcurl to use 1.0instead by doing something like this:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -