When accessing http://www.example.net
, a CSV file is downloaded with the most current data regarding that site. I want to have my site, http:开发者_JS百科//www.example.com
, access http://www.example.net
on an hour by hour basis in order to get updated information.
I want to then use the updated information stored in the CSV file to compare changes from data in previous CSV files. I obviously have no idea what the best plan of attack would be so any help would be appreciated. I am just looking for a general outline of how I should proceed, but the more information the better.
By the way, I'm using a LAMP bundle so PHP and mySQL solutions are preferred.
I think the most easy way for you to handle this would be to have a cron job running every hour (or scheduled task if are on windows), downloading the CSV with curl or file_get_contents
(manual). When you have downloaded the CSV you can import new data in your MySQL database.
The CSV should have some kind of timestamp on every row so you can easily separate new and old data.
Also handling XML would be better then plain CSV.
A better way to setup that would be you to create a webservice on http://www.example.com
and update in real time from your http://www.example.net
. But it requires you to have access to both websites.
Depending on the OS you're using, you're looking at a scheduled task (Windows) or a cron job (*nix) to kick up a service/app that would pull the new CSV and compare it to an older copy.
You'll definitely want to go the route of a cron job. I'm not exactly sure what you want to do with the differences, however, if you just want an email, here is one potential (and simplified) option:
wget http://uri.com/file.txt && diff file.txt file_previous.txt | mail -s "Differences" your@email.com && mv file.txt file_previous.txt
Try this command by itself from your command line (I'm guessing you are using a *nix box) to see if you can get it working. From there, I would save this to a shell file in the directory where you want to save your CSV files.
cd /path/to/directory
vi process_csv.sh
And add the following:
#!/bin/bash
cd /path/to/directory
wget http://uri.com/file.txt
diff file.txt file_previous.txt | mail -s "Differences" your@email.com
mv file.txt file_previous.txt
Save and close the file. Make the new shell script executable:
chmod +x process_csv.sh
From there, start investigating the cronjob route. It could be as easy as checking to see if you can edit your crontab file:
crontab -e
With luck, you'll be able to enter your cronjob and save/close the file. It will look something like the following:
01 * * * * /path/to/directory/process_csv.sh
I hope you find this helpful.
精彩评论