开发者

Python - Web scraping pages using Comet and HTTP streaming

开发者 https://www.devze.com 2023-03-30 03:20 出处:网络
I have to extract the data from fxstreet I extracted the HTML code with firebug and it seems like the webpage is using C开发者_Go百科omet and HTTP streaming.

I have to extract the data from fxstreet I extracted the HTML code with firebug and it seems like the webpage is using C开发者_Go百科omet and HTTP streaming.

I would like to fill in a dictionary every second with the data without having to refresh the page. I did it with urllib.urlopen but I am obliged to do a query every second.

Someone knows a proper way to pull the data from the Comet & http streaming? thanks


You'll probably want to use gevent, Tornado, or Twisted to write an asynchronous HTTP client to consume your service. There have been quite a few projects built for working with the Twitter Streaming API that you might look to for inspiration:

  • https://github.com/fiorix/twisted-twitter-stream - twisted
  • https://github.com/dustin/twitty-twister - twisted
  • https://github.com/atl/twitstream - asyncore, pycurl or tornado
  • https://github.com/godavemon/TwitTornado - tornado
  • https://github.com/thruflo/close.consumer - gevent


I am not an expert in this matter, but what I know is that Comet works in a way that responds request only when the time is near the limit or there was some change on the server. Thus you can make Comet requests and assume that nothing changed unless the Comet request has returned something.

So, basically, you can make Comet requests and store the data returned by Comet request in the table. By the other request (which can be sent every second), check the table for newly added data and return it if found.

Is this what you expected?


Just pull the data and instantiate the query again in the same instant, not in 1 second - Comet just means the server won't respond until it has new data available.

0

精彩评论

暂无评论...
验证码 换一张
取 消