I'm a Perl programmer with some nice scripts that go fetch HTTP pages (from a text file-list of URLs) with cURL and save them to a folder.
However, the number of pages to get is in the tens of millions. Sometimes the script fails on number 170,000 and I have to start the script again manually. It automatically reads the URL and sees if there is a page downloaded and skips. But, with a few hundred thousand, it still takes a few hours to skip back up to where it left off. Obviously, this is not going to pan out in the end.
So, I'm thinking a solution is to build a Visual Basic program that opens the command prompts, collects console output and restarts the script if needed at the last mi开发者_StackOverflow中文版ssed number.
I've never made a VB program, but I hear it's cake. Could I get a layman's explanation of how to do this (open prompts, send commands, capture output, restart prompts)? Or is there a better way to solve my problem?
Change how you are doing things. Maintain the queue of pages to check outside of the script. When you check one, mark it as viewed and record the date that you checked it.
When you restart your script, reset the queue to just the pages that have been marked as checked outside the time window.
A database might be in handy here.
Fix the problem and you don't have to build a lot of junk around the problem.
You say that sometimes you can't create a directory. That should be an easy problem to catch. However, that doesn't mean that you can ignore it in your script. Not at errors are recoverable, but at least you can log the problem so you can investigate. How are you creating directories?
My suggestion would be to forget the VBA and also the cURL and use either the LWP or Mech perl modules to get your pages. You can then handle the error gracefully in your script without needing to resort to VB.
精彩评论