开发者

Getting data with out scraping

开发者 https://www.devze.com 2023-02-06 09:16 出处:网络
I\'ve got a directory where people submit data.It\'s stored and pending while it\'s moderated to make sure it\'s o.k.

I've got a directory where people submit data. It's stored and pending while it's moderated to make sure it's o.k.

Once approved I'd like another couple of sites that I control and a few I won't (on different servers) to be able to grab that data. This would be on a cron or something so there wouldn't be any human interaction. Moderation is fully dependent on that first moderation.

How do I go about doing this securely.

I've thought about grabb开发者_JAVA技巧ing it as rss, parsing and storing. I've thought about doing soap requests, grabbing xml files, etc....

What would YOU do?


A logical means of securely distributing the data would be to use (S)FTP, ideally with a firewall that only permits access the various permitted machines by IP, etc.

To enable this, once you have the file on the "source" machine, you could simply:

  1. Move the file into a local FTP folder. (You'll quite possibly have to FTP it in (even though it's on the same machine) depending on user rights, etc.) As a tip, FTP is into a temp directory in the FTP folder and then move (rename in FTP parlance) it into a "for collection" folder once the FTP has completed. By doing this, you'll ensure that no partial files are collected.)

  2. Periodically check (via cron) the "for collection" folder from the various permitted machines.

  3. Grab the file(s) if there are any new files awaiting collection.

There are a variety of PHP functions to assist with this, including ftp_ssl_connect which uses SSL-FTP.

However, all that aside, it might be a lot less hassle to use something like rsync over ssh.


Why not have the storage site shoot a request over to the "subscribing" sites indicating new information is available (push notification)?

IE - just make a page request to a "newinfo.php?newinfo=true" or whatever on each of the sites. Then, each of those sites can do whatever they like knowing there's more information available.

0

精彩评论

暂无评论...
验证码 换一张
取 消