开发者

Programmatically monitor a webpage

开发者 https://www.devze.com 2023-01-09 01:41 出处:网络
Every project on drupal.org has its own page: http://drupal.org/project/marinelli When a new release is made, it gets added to that project\'s release page

Every project on drupal.org has its own page:

http://drupal.org/project/marinelli

When a new release is made, it gets added to that project's release page

http://drupal.org/node/185969/release

I'm trying to monitor when the page, but of course I don't want to keep checking on it manually. I need to do it programmatically with php.

  • Do I have to scrape the page? Is this page scrapable?

  • I see an RSS feed, but not sure how that works or if it can help me with monitoring or how.

    开发者_JAVA技巧
  • Does drupal.org offer a cleaner solution like an API? or is there a way to monitor the repository directly?

  • Other solutions welcome


There is a core module "Update Status" that checks if there are any updates available for your installed modules. You can either use that directly, if that fits your needs, or check the source how the module requests the data.


Instead of trying to scrappe the page, like you said, a better solution could be to use its RSS feed -- for example, in your case : http://drupal.org/node/185969/release/feed

The advantage is that RSS is a well-defined format : there are less chances of getting any un-necessary information in an HTML soup.


To extract data from that XML feed, you can use SimpleXML to work with the XML data "by-hand", or some library like SimplePie that knows RSS/ATOM.

Then, in you case, you have to keep track of the last update -- and each time you fetch the RSS feed, check if there is an update that's more recent than the last one you saw the previous time.


In the XML for your Marinelli module, you'll see that each entry contains a <pubDate> tag, that corresponds to its date ; for example :

<pubDate>Tue, 25 Aug 2009 07:28:26 +0000</pubDate>

If today the most recent entry is from 2009-08-25, and, tomorrow, there is an entry from 2010-07-27... Well, it means the module has been updated ;-)


What about the site's own feeds? http://drupal.org/node/185969/release/feed Simply subscribe for it in any RSS reader (Google Reader for example)

What do you mean you need to check it programmatically? Is there a backend that download and installs the updates without user interaction?


You can get the releases for a project at http://updates.drupal.org/release-history/$project_name/$api_version, see for instance http://updates.drupal.org/release-history/marinelli/6.x

0

精彩评论

暂无评论...
验证码 换一张
取 消