📄 libcurl-tutorial.3
字号:
.SH "libcurl with C++"There's basically only one thing to keep in mind when using C++ instead of Cwhen interfacing libcurl:The callbacks CANNOT be non-static class member functionsExample C++ code:.nfclass AClass { static size_t write_data(void *ptr, size_t size, size_t nmemb, void *ourpointer) { /* do what you want with the data */ } }.fi.SH "Proxies"What "proxy" means according to Merriam-Webster: "a person authorized to actfor another" but also "the agency, function, or office of a deputy who acts asa substitute for another".Proxies are exceedingly common these days. Companies often only offer Internetaccess to employees through their proxies. Network clients or user-agents askthe proxy for documents, the proxy does the actual request and then it returnsthem.libcurl supports SOCKS and HTTP proxies. When a given URL is wanted, libcurlwill ask the proxy for it instead of trying to connect to the actual hostidentified in the URL.If you're using a SOCKS proxy, you may find that libcurl doesn't quite supportall operations through it.For HTTP proxies: the fact that the proxy is a HTTP proxy puts certainrestrictions on what can actually happen. A requested URL that might not be aHTTP URL will be still be passed to the HTTP proxy to deliver back tolibcurl. This happens transparently, and an application may not need toknow. I say "may", because at times it is very important to understand thatall operations over a HTTP proxy is using the HTTP protocol. For example, youcan't invoke your own custom FTP commands or even proper FTP directorylistings..IP "Proxy Options"To tell libcurl to use a proxy at a given port number: curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");Some proxies require user authentication before allowing a request, and youpass that information similar to this: curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");If you want to, you can specify the host name only in the CURLOPT_PROXYoption, and set the port number separately with CURLOPT_PROXYPORT.Tell libcurl what kind of proxy it is with CURLOPT_PROXYTYPE (if not, it willdefault to assume a HTTP proxy): curl_easy_setopt(easyhandle, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4);.IP "Environment Variables"libcurl automatically checks and uses a set of environment variables toknow what proxies to use for certain protocols. The names of the variablesare following an ancient de facto standard and are built up as"[protocol]_proxy" (note the lower casing). Which makes the variable'http_proxy' checked for a name of a proxy to use when the input URL isHTTP. Following the same rule, the variable named 'ftp_proxy' is checkedfor FTP URLs. Again, the proxies are always HTTP proxies, the differentnames of the variables simply allows different HTTP proxies to be used.The proxy environment variable contents should be in the format\&"[protocol://][user:password@]machine[:port]". Where the protocol:// part issimply ignored if present (so http://proxy and bluerk://proxy will do thesame) and the optional port number specifies on which port the proxy operateson the host. If not specified, the internal default port number will be usedand that is most likely *not* the one you would like it to be.There are two special environment variables. 'all_proxy' is what sets proxyfor any URL in case the protocol specific variable wasn't set, and\&'no_proxy' defines a list of hosts that should not use a proxy even though avariable may say so. If 'no_proxy' is a plain asterisk ("*") it matches allhosts.To explicitly disable libcurl's checking for and using the proxy environmentvariables, set the proxy name to "" - an empty string - with CURLOPT_PROXY..IP "SSL and Proxies"SSL is for secure point-to-point connections. This involves strong encryptionand similar things, which effectively makes it impossible for a proxy tooperate as a "man in between" which the proxy's task is, as previouslydiscussed. Instead, the only way to have SSL work over a HTTP proxy is to askthe proxy to tunnel trough everything without being able to check or fiddlewith the traffic.Opening an SSL connection over a HTTP proxy is therefor a matter of asking theproxy for a straight connection to the target host on a specified port. Thisis made with the HTTP request CONNECT. ("please mr proxy, connect me to thatremote host").Because of the nature of this operation, where the proxy has no idea what kindof data that is passed in and out through this tunnel, this breaks some of thevery few advantages that come from using a proxy, such as caching. Manyorganizations prevent this kind of tunneling to other destination port numbersthan 443 (which is the default HTTPS port number)..IP "Tunneling Through Proxy"As explained above, tunneling is required for SSL to work and often evenrestricted to the operation intended for SSL; HTTPS.This is however not the only time proxy-tunneling might offer benefits toyou or your application.As tunneling opens a direct connection from your application to the remotemachine, it suddenly also re-introduces the ability to do non-HTTPoperations over a HTTP proxy. You can in fact use things such as FTPupload or FTP custom commands this way.Again, this is often prevented by the administrators of proxies and israrely allowed.Tell libcurl to use proxy tunneling like this: curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE);In fact, there might even be times when you want to do plain HTTPoperations using a tunnel like this, as it then enables you to operate onthe remote server instead of asking the proxy to do so. libcurl will notstand in the way for such innovative actions either!.IP "Proxy Auto-Config"Netscape first came up with this. It is basically a web page (usually using a\&.pac extension) with a javascript that when executed by the browser with therequested URL as input, returns information to the browser on how to connectto the URL. The returned information might be "DIRECT" (which means no proxyshould be used), "PROXY host:port" (to tell the browser where the proxy forthis particular URL is) or "SOCKS host:port" (to direct the browser to a SOCKSproxy).libcurl has no means to interpret or evaluate javascript and thus it doesn'tsupport this. If you get yourself in a position where you face this nastyinvention, the following advice have been mentioned and used in the past:- Depending on the javascript complexity, write up a script that translates itto another language and execute that.- Read the javascript code and rewrite the same logic in another language.- Implement a javascript interpreted, people have successfully used theMozilla javascript engine in the past.- Ask your admins to stop this, for a static proxy setup or similar..SH "Persistence Is The Way to Happiness"Re-cycling the same easy handle several times when doing multiple requests isthe way to go.After each single \fIcurl_easy_perform(3)\fP operation, libcurl will keep theconnection alive and open. A subsequent request using the same easy handle tothe same host might just be able to use the already open connection! Thisreduces network impact a lot.Even if the connection is dropped, all connections involving SSL to the samehost again, will benefit from libcurl's session ID cache that drasticallyreduces re-connection time.FTP connections that are kept alive saves a lot of time, as the command-response round-trips are skipped, and also you don't risk getting blockedwithout permission to login again like on many FTP servers only allowing Npersons to be logged in at the same time.libcurl caches DNS name resolving results, to make lookups of a previouslylooked up name a lot faster.Other interesting details that improve performance for subsequent requestsmay also be added in the future.Each easy handle will attempt to keep the last few connections alive for awhile in case they are to be used again. You can set the size of this "cache"with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom anypoint in changing this value, and if you think of changing this it is oftenjust a matter of thinking again.To force your upcoming request to not use an already existing connection (itwill even close one first if there happens to be one alive to the same hostyou're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECTto TRUE. In a similar spirit, you can also forbid the upcoming request to be"lying" around and possibly get re-used after the request by settingCURLOPT_FORBID_REUSE to TRUE..SH "HTTP Headers Used by libcurl"When you use libcurl to do HTTP requests, it'll pass along a series of headersautomatically. It might be good for you to know and understand these ones. Youcan replace or remove them by using the CURLOPT_HTTPHEADER option..IP "Host"This header is required by HTTP 1.1 and even many 1.0 servers and should bethe name of the server we want to talk to. This includes the port number ifanything but default..IP "Pragma"\&"no-cache". Tells a possible proxy to not grab a copy from the cache but tofetch a fresh one..IP "Accept"\&"*/*"..IP "Expect"When doing POST requests, libcurl sets this header to \&"100-continue" to askthe server for an "OK" message before it proceeds with sending the data partof the post. If the POSTed data amount is deemed "small", libcurl will not usethis header..SH "Customizing Operations"There is an ongoing development today where more and more protocols are builtupon HTTP for transport. This has obvious benefits as HTTP is a tested andreliable protocol that is widely deployed and have excellent proxy-support.When you use one of these protocols, and even when doing other kinds ofprogramming you may need to change the traditional HTTP (or FTP or...)manners. You may need to change words, headers or various data.libcurl is your friend here too..IP CUSTOMREQUESTIf just changing the actual HTTP request keyword is what you want, like whenGET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST is therefor you. It is very simple to use: curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST");When using the custom request, you change the request keyword of the actualrequest you are performing. Thus, by default you make GET request but you canalso make a POST operation (as described before) and then replace the POSTkeyword if you want to. You're the boss..IP "Modify Headers"HTTP-like protocols pass a series of headers to the server when doing therequest, and you're free to pass any amount of extra headers that youthink fit. Adding headers are this easy:.nf struct curl_slist *headers=NULL; /* init to NULL is important */ headers = curl_slist_append(headers, "Hey-server-hey: how are you?"); headers = curl_slist_append(headers, "X-silly-content: yes"); /* pass our list of custom made headers */ curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); curl_easy_perform(easyhandle); /* transfer http */ curl_slist_free_all(headers); /* free the header list */.fi\&... and if you think some of the internally generated headers, such asAccept: or Host: don't contain the data you want them to contain, you canreplace them by simply setting them too:.nf headers = curl_slist_append(headers, "Accept: Agent-007"); headers = curl_slist_append(headers, "Host: munged.host.line");.fi.IP "Delete Headers"If you replace an existing header with one with no contents, you will preventthe header from being sent. Like if you want to completely prevent the\&"Accept:" header to be sent, you can disable it with code similar to this: headers = curl_slist_append(headers, "Accept:");Both replacing and canceling internal headers should be done with carefulconsideration and you should be aware that you may violate the HTTP protocolwhen doing so..IP "Enforcing chunked transfer-encoding"By making sure a request uses the custom header "Transfer-Encoding: chunked"when doing a non-GET HTTP operation, libcurl will switch over to "chunked"upload, even though the size of the data to upload might be known. By default,libcurl usually switches over to chunked upload automatically if the uploaddata size is unknown..IP "HTTP Version"All HTTP requests includes the version number to tell the server which versionwe support. libcurl speak HTTP 1.1 by default. Some very old servers don'tlike getting 1.1-requests and when dealing with stubborn old things like that,you can tell libcurl to use 1.0 instead by doing something like this: curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -