I'm really stuck on this o开发者_开发知识库ne.
Users enter some data on my website that I need to process later in a series of batch jobs. I really know always how to work with online transactions. I don't know what are the trends on top techologies to process data on a batch fashion. Should I use cron jobs, it's okay to use Perl for those batch scripts I need? Is there a main approach to do what I need?
Best to all, Demian
Cron jobs are for running some job or process repeatedly at a set interval of time. You can use perl or any server side language you want. The cron job just runs whatever you tell it.
Here is an example of someone running a crontab for a perl script: http://www.linuxquestions.org/questions/linux-software-2/adding-a-perl-script-to-cron.daily-cron.d-to-setup-a-cron-job-592762/
You can google for documentation yourself, but the post I linked to was similar to your situation from the sound of it.
Of course, I am assuming your server lets you run cron jobs and has perl installed.
Which language you choose is kind of irrelevant (pick one you're most familiar with), but Perl is excellent at this (having used it for that purpose, among others).
You can definitely use a cron daemon on Unix/Linux to schedule the jobs, if that is what you have available. There are other schedulers for different OS's, both free/included and commercial (e.g. Autosys), depending on what/how you need to batch off. But cronjobs are usually Good Enough and easy to work with.
There are many points to be made about joining web app to a batch processor, but since you didn't provide enough details to build on, I will merely point out the first choice stumble upon when you start - you need to communicate the data somehow.
Easiest is some sort of database backend (pick your poison based on your needs/reqs/budget - from BerkleyDB/SQLite on simlistic end to MySQL/Postgress on free side to Sybase/Oracle for real world stuff.
Otherwise, you can use files for some simple data processing, but be prepared to need to fine tune file permissioning - the files created by web app usually have different user ID from your batch user (for security reasons).
You could have a set of daemons which poll your queue for new tasks. Or you could use a message queuing product or something like Gearman. Or you could run cron jobs which check for new work to do from time to time.
You could keep a task queue in your database; this might not be efficient if you have large numbers of tasks and processes, so you might want to investigate the many message queuing products.
You could run one or more agent tasks (daemons etc) per server, or just one. You could run them synchronously or asynchronously to gather data from whichever servers they need to.
You could write the results to another database, or send them via email, or in a file. You could write all of this in any language or a mixture of several.
The options are practically endless :)
In Java world that's usually a job for a message queue system. But I think a cron job is a good choice if you are in *nix world.
精彩评论