⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 libcurl-the-guide

📁 harvest是一个下载html网页得机器人
💻
📖 第 1 页 / 共 4 页
字号:
 Since we write an application, we most likely want libcurl to get the upload data by asking us for it. To make it do that, we set the read callback and the custom pointer libcurl will pass to our read callback. The read callback should have a prototype similar to:    size_t function(char *bufptr, size_t size, size_t nitems, void *userp); Where bufptr is the pointer to a buffer we fill in with data to upload and size*nitems is the size of the buffer and therefore also the maximum amount of data we can return to libcurl in this call. The 'userp' pointer is the custom pointer we set to point to a struct of ours to pass private data between the application and the callback.    curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);    curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata); Tell libcurl that we want to upload:    curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE); A few protocols won't behave properly when uploads are done without any prior knowledge of the expected file size. So, set the upload file size using the CURLOPT_INFILESIZE for all known file sizes like this[1]:    curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE, file_size); When you call curl_easy_perform() this time, it'll perform all the necessary operations and when it has invoked the upload it'll call your supplied callback to get the data to upload. The program should return as much data as possible in every invoke, as that is likely to make the upload perform as fast as possible. The callback should return the number of bytes it wrote in the buffer. Returning 0 will signal the end of the upload.Passwords Many protocols use or even require that user name and password are provided to be able to download or upload the data of your choice. libcurl offers several ways to specify them. Most protocols support that you specify the name and password in the URL itself. libcurl will detect this and use them accordingly. This is written like this:        protocol://user:password@example.com/path/ If you need any odd letters in your user name or password, you should enter them URL encoded, as %XX where XX is a two-digit hexadecimal number. libcurl also provides options to set various passwords. The user name and password as shown embedded in the URL can instead get set with the CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to a string in the format "user:password:". In a manner like this:        curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret"); Another case where name and password might be needed at times, is for those users who need to athenticate themselves to a proxy they use. libcurl offers another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar to the CURLOPT_USERPWD option like this:        curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret");  There's a long time unix "standard" way of storing ftp user names and passwords, namely in the $HOME/.netrc file. The file should be made private so that only the user may read it (see also the "Security Considerations" chapter), as it might contain the password in plain text. libcurl has the ability to use this file to figure out what set of user name and password to use for a particular host. As an extension to the normal functionality, libcurl also supports this file for non-FTP protocols such as HTTP. To make curl use this file, use the CURLOPT_NETRC option:    curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE); And a very basic example of how such a .netrc file may look like:    machine myhost.mydomain.com    login userlogin    password secretword All these examples have been cases where the password has been optional, or at least you could leave it out and have libcurl attempt to do its job without it. There are times when the password isn't optional, like when you're using an SSL private key for secure transfers. You can in this situation either pass a password to libcurl to use to unlock the private key, or you can let libcurl prompt the user for it. If you prefer to ask the user, then you can provide your own callback function that will be called when libcurl wants the password. That way, you can control how the question will appear to the user. To pass the known private key password to libcurl:    curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword"); To make a password callback:    int enter_passwd(void *ourp, const char *prompt, char *buffer, int len);    curl_easy_setopt(easyhandle, CURLOPT_PASSWDFUNCTION, enter_passwd);HTTP POSTing We get many questions regarding how to issue HTTP POSTs with libcurl the proper way. This chapter will thus include examples using both different versions of HTTP POST that libcurl supports. The first version is the simple POST, the most common version, that most HTML pages using the <form> tag uses. We provide a pointer to the data and tell libcurl to post it all to the remote site:    char *data="name=daniel&project=curl";    curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data);    curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/");    curl_easy_perform(easyhandle); /* post away! */ Simple enough, huh? Since you set the POST options with the CURLOPT_POSTFIELDS, this automaticly switches the handle to use POST in the upcoming request. Ok, so what if you want to post binary data that also requires you to set the Content-Type: header of the post? Well, binary posts prevents libcurl from being able to do strlen() on the data to figure out the size, so therefore we must tell libcurl the size of the post data. Setting headers in libcurl requests are done in a generic way, by building a list of our own headers and then passing that list to libcurl.    struct curl_slist *headers=NULL;    headers = curl_slist_append(headers, "Content-Type: text/xml");    /* post binary data */    curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, binaryptr);    /* set the size of the postfields data */    curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23);    /* pass our list of custom made headers */    curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);    curl_easy_perform(easyhandle); /* post away! */    curl_slist_free_all(headers); /* free the header list */ While the simple examples above cover the majority of all cases where HTTP POST operations are required, they don't do multipart formposts. Multipart formposts were introduced as a better way to post (possibly large) binary data and was first documented in the RFC1867. They're called multipart because they're built by a chain of parts, each being a single unit. Each part has its own name and contents. You can in fact create and post a multipart formpost with the regular libcurl POST support described above, but that would require that you build a formpost yourself and provide to libcurl. To make that easier, libcurl provides curl_formadd(). Using this function, you add parts to the form. When you're done adding parts, you post the whole form. The following example sets two simple text parts with plain textual contents, and then a file with binary contents and upload the whole thing.    struct curl_httppost *post=NULL;    struct curl_httppost *last=NULL;    curl_formadd(&post, &last,                 CURLFORM_COPYNAME, "name",                 CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END);    curl_formadd(&post, &last,                 CURLFORM_COPYNAME, "project",                 CURLFORM_COPYCONTENTS, "curl", CURLFORM_END);    curl_formadd(&post, &last,                 CURLFORM_COPYNAME, "logotype-image",                 CURLFORM_FILECONTENT, "curl.png", CURLFORM_END);    /* Set the form info */    curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post);    curl_easy_perform(easyhandle); /* post away! */    /* free the post data again */    curl_formfree(post); Multipart formposts are chains of parts using MIME-style separators and headers. It means that each one of these separate parts get a few headers set that describe the individual content-type, size etc. To enable your application to handicraft this formpost even more, libcurl allows you to supply your own set of custom headers to such an individual form part. You can of course supply headers to as many parts you like, but this little example will show how you set headers to one specific part when you add that to the post handle:    struct curl_slist *headers=NULL;    headers = curl_slist_append(headers, "Content-Type: text/xml");    curl_formadd(&post, &last,                 CURLFORM_COPYNAME, "logotype-image",                 CURLFORM_FILECONTENT, "curl.xml",                 CURLFORM_CONTENTHEADER, headers,                 CURLFORM_END);    curl_easy_perform(easyhandle); /* post away! */    curl_formfree(post); /* free post */    curl_slist_free_all(post); /* free custom header list */ Since all options on an easyhandle are "sticky", they remain the same until changed even if you do call curl_easy_perform(), you may need to tell curl to go back to a plain GET request if you intend to do such a one as your next request. You force an easyhandle to back to GET by using the CURLOPT_HTTPGET option:    curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE); Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from doing a POST. It will just make it POST without any data to send!Showing Progress For historical and traditional reasons, libcurl has a built-in progress meter that can be switched on and then makes it presents a progress meter in your terminal. Switch on the progress meter by, oddly enough, set CURLOPT_NOPROGRESS to FALSE. This option is set to TRUE by default. For most applications however, the built-in progress meter is useless and what instead is interesting is the ability to specify a progress callback. The function pointer you pass to libcurl will then be called on irregular intervals with information about the current transfer. Set the progress callback by using CURLOPT_PROGRESSFUNCTION. And pass a pointer to a function that matches this prototype:        int progress_callback(void *clientp,                              double dltotal,                              double dlnow,                              double ultotal,                              double ulnow); If any of the input arguments is unknown, a 0 will be passed. The first argument, the 'clientp' is the pointer you pass to libcurl with CURLOPT_PROGRESSDATA. libcurl won't touch it.libcurl with C++ There's basicly only one thing to keep in mind when using C++ instead of C when interfacing libcurl:    "The Callbacks Must Be Plain C" So if you want a write callback set in libcurl, you should put it within 'extern'. Similar to this:     extern "C" {       size_t write_data(void *ptr, size_t size, size_t nmemb,                         void *ourpointer)       {         /* do what you want with the data */       }    } This will of course effectively turn the callback code into C. There won't be any "this" pointer available etc.Proxies What "proxy" means according to Merriam-Webster: "a person authorized to act for another" but also "the agency, function, or office of a deputy who acts as a substitute for another". Proxies are exceedingly common these days. Companies often only offer internet access to employees through their HTTP proxies. Network clients or user-agents ask the proxy for docuements, the proxy does the actual request and then it returns them.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -