开发者

How do I fetch a HTML page source with libcurl in C

开发者 https://www.devze.com 2023-02-21 20:59 出处:网络
I\'ve tried looking at the simple example on the libcurl website, found at this url: http://curl.haxx.se/libcurl/c/simple.html, however I haven\'t figured out how to do what I need, which is to store

I've tried looking at the simple example on the libcurl website, found at this url: http://curl.haxx.se/libcurl/c/simple.html, however I haven't figured out how to do what I need, which is to store the page's source in a string variable.

Any help would be开发者_StackOverflow社区 greatly appreciated.


You most likely want to call curl_easy_setopt, with the CURLOPT_WRITEFUNCTION option. This allows you to supply a function with this prototype:

size_t function( void *ptr, size_t size, size_t nmemb, void *userdata);

Which curl will call when there is data. You can then use the userdata argument to pass in a pointer to a std::string (if in C++) or a char ** in C, which allows you to save the data to memory. Alternatively, you can just deal with the data as it comes in, and avoid the need to store it altogether.

Edit: Here is such a function that I wrote once upon a time, for C++ (it uses std::string) This is my code, but if you need licensing info, it's public domain — do whatever you want with it. :-)

size_t curl_to_string(void *ptr, size_t size, size_t nmemb, void *data)
{
    std::string *str = (std::string *) data;
    char *sptr = (char *) ptr;
    int x;

    for(x = 0; x < size * nmemb; ++x)
    {
        (*str) += ptr[x];
    }

    return size * nmemb;
}

Probably should use C++-style casts, but I was a younger programmer then. :-) You would use the above like:

curl_easy_setopt(m_curl, CURLOPT_WRITEFUNCTION, curl_to_string);
curl_easy_setopt(m_curl, CURLOPT_WRITEDATA, &pagedata);

Where pagedata is a std::string. After the call to curl_easy_perform, if there were no errors, pagedata has your page.

Edit 2: A C version of this is slightly more complicated, mostly because we have to manage memory to fit to the incoming data. (Something that things like std::string and std::vector do for us in C++...)

typedef struct
{
    size_t size;
    size_t allocated;
    char *data;
} c_vector;

size_t curl_to_string(void *ptr, size_t size, size_t nmemb, void *data)
{
    if(size * nmemb == 0)
        return 0;

    c_vector *vec = (c_vector *) data;

    // Resize the data array if needed
    if(vec->size + size * nmemb > allocated)
    {
        char *new_data = realloc(vec->data, sizeof(char) * (vec->size + size * nmemb));
        if(!new_data)
            return 0;
        vec->data = new_data;
        vec->allocated = vec->size + size * nmemb;
    }

    memcpy(vec->data + vec->size, ptr, size * nmemb);
    vec->size += size * nmemb;

    return size * nmemb;
}

It's basically C++'s std::vector, boiled down to C. (Things like this is why I'm a C++ programmer by default over C...)

You'd call that with,

c_vector vec = {0};

curl_easy_setopt(m_curl, CURLOPT_WRITEFUNCTION, curl_to_string);
curl_easy_setopt(m_curl, CURLOPT_WRITEDATA, &pagedata);

Beware of bugs in the above code; I have only proved it correct, not tried it.


As a coincidence, the curl site hosts an example for exactly this question:

http://curl.haxx.se/libcurl/c/getinmemory.html


You need to set the CURLOPT_WRITEFUNCTION and CURLOPT_WRITEDATA whith curl_easy_setopt in order to process the received data, if I remember correctly

0

精彩评论

暂无评论...
验证码 换一张
取 消