📄 libcurl-the-guide
字号:
libcurl has full support for HTTP proxies, so when a given URL is wanted, libcurl will ask the proxy for it instead of trying to connect to the actual host identified in the URL. The fact that the proxy is a HTTP proxy puts certain restrictions on what can actually happen. A requested URL that might not be a HTTP URL will be still be passed to the HTTP proxy to deliver back to libcurl. This happens transparantly, and an application may not need to know. I say "may", because at times it is very important to understand that all operations over a HTTP proxy is using the HTTP protocol. For example, you can't invoke your own custom FTP commands or even proper FTP directory listings. Proxy Options To tell libcurl to use a proxy at a given port number: curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080"); Some proxies require user authentication before allowing a request, and you pass that information similar to this: curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password"); If you want to, you can specify the host name only in the CURLOPT_PROXY option, and set the port number separately with CURLOPT_PROXYPORT. Environment Variables libcurl automaticly checks and uses a set of environment variables to know what proxies to use for certain protocols. The names of the variables are following an ancient de facto standard and are built up as "[protocol]_proxy" (note the lower casing). Which makes the variable 'http_proxy' checked for a name of a proxy to use when the input URL is HTTP. Following the same rule, the variable named 'ftp_proxy' is checked for FTP URLs. Again, the proxies are always HTTP proxies, the different names of the variables simply allows different HTTP proxies to be used. The proxy environment variable contents should be in the format "[protocol://]machine[:port]". Where the protocol:// part is simply ignored if present (so http://proxy and bluerk://proxy will do the same) and the optional port number specifies on which port the proxy operates on the host. If not specified, the internal default port number will be used and that is most likely *not* the one you would like it to be. There are two special environment variables. 'all_proxy' is what sets proxy for any URL in case the protocol specific variable wasn't set, and 'no_proxy' defines a list of hosts that should not use a proxy even though a variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches all hosts. SSL and Proxies SSL is for secure point-to-point connections. This involves strong encryption and similar things, which effectivly makes it impossible for a proxy to operate as a "man in between" which the proxy's task is, as previously discussed. Instead, the only way to have SSL work over a HTTP proxy is to ask the proxy to tunnel trough everything without being able to check or fiddle with the traffic. Opening an SSL connection over a HTTP proxy is therefor a matter of asking the proxy for a straight connection to the target host on a specified port. This is made with the HTTP request CONNECT. ("please mr proxy, connect me to that remote host"). Because of the nature of this operation, where the proxy has no idea what kind of data that is passed in and out through this tunnel, this breaks some of the very few advantages that come from using a proxy, such as caching. Many organizations prevent this kind of tunneling to other destination port numbers than 443 (which is the default HTTPS port number). Tunneling Through Proxy As explained above, tunneling is required for SSL to work and often even restricted to the operation intended for SSL; HTTPS. This is however not the only time proxy-tunneling might offer benefits to you or your application. As tunneling opens a direct connection from your application to the remote machine, it suddenly also re-introduces the ability to do non-HTTP operations over a HTTP proxy. You can in fact use things such as FTP upload or FTP custom commands this way. Again, this is often prevented by the adminstrators of proxies and is rarely allowed. Tell libcurl to use proxy tunneling like this: curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE); In fact, there might even be times when you want to do plain HTTP operations using a tunnel like this, as it then enables you to operate on the remote server instead of asking the proxy to do so. libcurl will not stand in the way for such innovative actions either! Proxy Auto-Config Netscape first came up with this. It is basicly a web page (usually using a .pac extension) with a javascript that when executed by the browser with the requested URL as input, returns information to the browser on how to connect to the URL. The returned information might be "DIRECT" (which means no proxy should be used), "PROXY host:port" (to tell the browser where the proxy for this particular URL is) or "SOCKS host:port" (to direct the brower to a SOCKS proxy). libcurl has no means to interpret or evaluate javascript and thus it doesn't support this. If you get yourself in a position where you face this nasty invention, the following advice have been mentioned and used in the past: - Depending on the javascript complexity, write up a script that translates it to another language and execute that. - Read the javascript code and rewrite the same logic in another language. - Implement a javascript interpreted, people have successfully used the Mozilla javascript engine in the past. - Ask your admins to stop this, for a static proxy setup or similar.Persistancy Is The Way to Happiness Re-cycling the same easy handle several times when doing multiple requests is the way to go. After each single curl_easy_perform() operation, libcurl will keep the connection alive and open. A subsequent request using the same easy handle to the same host might just be able to use the already open connection! This reduces network impact a lot. Even if the connection is dropped, all connections involving SSL to the same host again, will benefit from libcurl's session ID cache that drasticly reduces re-connection time. FTP connections that are kept alive saves a lot of time, as the command- response roundtrips are skipped, and also you don't risk getting blocked without permission to login again like on many FTP servers only allowing N persons to be logged in at the same time. libcurl caches DNS name resolving results, to make lookups of a previously looked up name a lot faster. Other interesting details that improve performance for subsequent requests may also be added in the future. Each easy handle will attempt to keep the last few connections alive for a while in case they are to be used again. You can set the size of this "cache" with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom any point in changing this value, and if you think of changing this it is often just a matter of thinking again. When the connection cache gets filled, libcurl must close an existing connection in order to get room for the new one. To know which connection to close, libcurl uses a "close policy" that you can affect with the CURLOPT_CLOSEPOLICY option. There's only two polices implemented as of this writing (libcurl 7.9.4) and they are: CURLCLOSEPOLICY_LEAST_RECENTLY_USED simply close the one that hasn't been used for the longest time. This is the default behavior. CURLCLOSEPOLICY_OLDEST closes the oldest connection, the one that was createst the longest time ago. There are, or at least were, plans to support a close policy that would call a user-specified callback to let the user be able to decide which connection to dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION an existing option still today. Nothing ever uses this though and this will not be used within the forseeable future either. To force your upcoming request to not use an already existing connection (it will even close one first if there happens to be one alive to the same host you're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECT to TRUE. In a similar spirit, you can also forbid the upcoming request to be "lying" around and possibly get re-used after the request by setting CURLOPT_FORBID_REUSE to TRUE.HTTP Headers Used by libcurl When you use libcurl to do HTTP requeests, it'll pass along a series of headers automaticly. It might be good for you to know and understand these ones. Host This header is required by HTTP 1.1 and even many 1.0 servers and should be the name of the server we want to talk to. This includes the port number if anything but default. Pragma "no-cache". Tells a possible proxy to not grap a copy from the cache but to fetch a fresh one. Accept: "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*". Cloned from a browser once a hundred years ago. Expect: When doing multi-part formposts, libcurl will set this header to "100-continue" to ask the server for an "OK" message before it proceeds with sending the data part of the post.Customizing Operations There is an ongoing development today where more and more protocols are built upon HTTP for transport. This has obvious benefits as HTTP is a tested and reliable protocol that is widely deployed and have excellent proxy-support. When you use one of these protocols, and even when doing other kinds of programming you may need to change the traditional HTTP (or FTP or...) manners. You may need to change words, headers or various data. libcurl is your friend here too. CUSTOMREQUEST If just changing the actual HTTP request keyword is what you want, like when GET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST is there for you. It is very simple to use: curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST"); When using the custom request, you change the request keyword of the actual request you are performing. Thus, by default you make GET request but you can also make a POST operation (as described before) and then replace the POST keyword if you want to. You're the boss. Modify Headers HTTP-like protocols pass a series of headers to the server when doing the request, and you're free to pass any amount of extra headers that you think fit. Adding headers are this easy: struct curl_slist *headers=NULL; /* init to NULL is important */ headers = curl_slist_append(headers, "Hey-server-hey: how are you?"); headers = curl_slist_append(headers, "X-silly-content: yes"); /* pass our list of custom made headers */ curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); curl_easy_perform(easyhandle); /* transfer http */ curl_slist_free_all(headers); /* free the header list */ ... and if you think some of the internally generated headers, such as Accept: or Host: don't contain the data you want them to contain, you can replace them by simply setting them too: headers = curl_slist_append(headers, "Accept: Agent-007"); headers = curl_slist_append(headers, "Host: munged.host.line"); Delete Headers If you replace an existing header with one with no contents, you will prevent the header from being sent. Like if you want to completely prevent the "Accept:" header to be sent, you can disable it with code similar to this: headers = curl_slist_append(headers, "Accept:"); Both replacing and cancelling internal headers should be done with careful consideration and you should be aware that you may violate the HTTP protocol when doing so. Enforcing chunked transfer-encoding By making sure a request uses the custom header "Transfer-Encoding: chunked" when doing a non-GET HTTP operation, libcurl will switch over to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -