I have a set of Justin.tv/livestream URLs from which I grab both stream status and also a thumbnail image. There will not be more than ~50 such URLs most likely, at any given point
What I have tried -
1) Naive serial download/process, obviously terrible.
2) cURL multi, but still seems kind of slow unless I am doing it horribly wrong - sometimes one page just takes a while to load and bottlenecks everything.
Both of the above seem to be limited and "bad" in principle because I am loading another page in order to load the main content. Since I am loading so many pages simultaneously, it seems likely there will randomly be a slow url and thus bottleneck things.
I have considered having a PHP script running in the background which updates a database table continuously with the stream status and thumbnail image, and then in loading the page I simply query the database, which should be much much faster. Would this be the most effective solution though, or is there something better?
I'm most concerned about the possible overheard involved with such a continuously running script, as I do want updates to be as "live" as possible. I would think that the load is not anything to w开发者_开发问答orry about, since each page is in itself not so large, so I'd imagine the HTTP handshake time dominates transmission time.
Any suggestions regarding this?
justin.tv has an API: http://www.justin.tv/p/api you may want to look into that instead of trying to screenscrapp
精彩评论