开发者

Saving select updating data points from an external webpage to a text file

开发者 https://www.devze.com 2023-04-01 01:27 出处:网络
I am trying to take updating weather data from a website that isn\'t mine and put a chunk of it into a generic text file every 30 minutes. The text file should not have any html tags or anything but c

I am trying to take updating weather data from a website that isn't mine and put a chunk of it into a generic text file every 30 minutes. The text file should not have any html tags or anything but could be delimited by commas or periods or tabs. The website generating the data puts the data in a table with no class or id. What i need is the text from one tag and each of its individual tags within. The tag is on the same line number every time regardless of the updated data.

This seems a bit silly of a challenge as the method for getting the data doesn't seem ideal. I'm open to suggestions for different methods for getting an updated 开发者_如何学Go(hourly-twice dailyish) temperature/dewpoint/time/etc data point and for it to be put in a text file.

With regards to automating it every 30 minutes or so, i have an automation program that can download webpages at any time interval.

I hope i was specific enough with this rather weird(to me at least) challenge. I'm not even sure where to start. I have lots of experience with html and basic knowledge of Python, javascript, PHP, and SQL but i am open to taking code or learning syntax of other languages.


For Python

  • For timed tasks for N minutes create an UNIX cron job or Windows equivalent which runs your .py script regularly

  • Download the weather data using urllib2 module in .py script

  • Parse HTML using BeautifulSoup or lxml libraries

  • Select the relevant bits of HTML using XPath selectors or CSS selectors (lxml)

  • Process data and write it to a text file

The actual implementation is left as an exercise to a reader :)


This is called screen-scraping, but this is often frowned upon and if you just want weather data there are several APIs that may, depending on your specific needs, be a better solution.

Other than that, we need more specifics like the code of the page to help you out with this.


Maybe you can use it in a cronjob:

$file = file_get_contents ($url);
$onlyText = strip_tags ($file);
$fp = fopen('data.txt', 'w');
fwrite($fp, $onlyText);
fclose($fp);
0

精彩评论

暂无评论...
验证码 换一张
取 消