开发者

Fastest way to parallel process/download a set of webpages in PHP

开发者 https://www.devze.com 2023-03-09 00:12 出处:网络
I have a set of Justin.tv/livestream URLs from which I grab both stream status and also a thumbnail image.There will not be more than ~50 such URLs most likely, at any given point

I have a set of Justin.tv/livestream URLs from which I grab both stream status and also a thumbnail image. There will not be more than ~50 such URLs most likely, at any given point

What I have tried -

1) Naive serial download/process, obviously terrible.

2) cURL multi, but still seems kind of slow unless I am doing it horribly wrong - sometimes one page just takes a while to load and bottlenecks everything.

Both of the above seem to be limited and "bad" in principle because I am loading another page in order to load the main content. Since I am loading so many pages simultaneously, it seems likely there will randomly be a slow url and thus bottleneck things.

I have considered having a PHP script running in the background which updates a database table continuously with the stream status and thumbnail image, and then in loading the page I simply query the database, which should be much much faster. Would this be the most effective solution though, or is there something better?

I'm most concerned about the possible overheard involved with such a continuously running script, as I do want updates to be as "live" as possible. I would think that the load is not anything to w开发者_开发问答orry about, since each page is in itself not so large, so I'd imagine the HTTP handshake time dominates transmission time.

Any suggestions regarding this?


justin.tv has an API: http://www.justin.tv/p/api you may want to look into that instead of trying to screenscrapp

0

精彩评论

暂无评论...
验证码 换一张
取 消