开发者

How can I use libcurl to parse through a webpage line-by-line?

开发者 https://www.devze.com 2023-03-04 17:53 出处:网络
Ok, so I\'m building this program in C for a Linux system. I need to be able to retrieve the content of a URL, and then read it line-by-line so I can do my own custom parsing on it.

Ok, so I'm building this program in C for a Linux system. I need to be able to retrieve the content of a URL, and then read it line-by-line so I can do my own custom parsing on it.

Now, what's very important to me is speed, meaning I'd really like to do this without saving the entire thing to a file, then reading the file (since, for example, there may be content on the first line of the file that means I don't need to read the开发者_如何学C rest of it).

Also very important is that it is thread-safe. I tried using the code here: http://curl.haxx.se/libcurl/c/fopen.html but it uses global variables that make it impossible to safely multithread.

Any ideas?


Examples are just that: examples. If they work slightly wrong, then fix it to work better.

I would guess that you're better off starting with another example, perhaps this getinemory.c:

http://curl.haxx.se/libcurl/c/getinmemory.html

libcurl delivers data "chunk by chunk" and not line by line, so your application needs to figure out when you have enough data and you can then tell libcurl to stop transferring.


If you just want to retrieve the data for a page, it's fairly easy to use the socket API directly. There are also quite a few libraries around that make it a bit easier still. Unfortunately, you haven't said what system you want this for so it's hard to recommend which library you probably want (Windows demands a bit of special code to startup/shut down Winsock that isn't necessary and won't compile or link on almost any other system).

0

精彩评论

暂无评论...
验证码 换一张
取 消