开发者

Pipeline For Downloading and Processing Files In Unix/Linux Environment With Perl

开发者 https://www.devze.com 2022-12-27 15:50 出处:网络
I have a list of files URLS where I want to download them: http://somedomain.com/foo1.gz http://somedomain.com/foo2.gz

I have a list of files URLS where I want to download them:

http://somedomain.com/foo1.gz
http://somedomain.com/foo2.gz
http://somedomain.com/foo3.gz

What I want to do is the following for each file:

  1. Download foo1,2.. in parallel with wget and nohup.
  2. Every time it completes download process them 开发者_JS百科with myscript.sh

What I have is this:

#! /usr/bin/perl

@files = glob("foo*.gz");

foreach $file (@files) {
   my $downurls = "http://somedomain.com/".$file;
   system("nohup wget $file &");
   system("./myscript.sh $file >> output.txt");
}

The problem is that I can't tell the above pipeline when does the file finish downloading. So now it myscript.sh doesn't get executed properly.

What's the right way to achieve this?


Why to do this using perl. use bash instead. Below is just a sample.

#!/bin/bash

for file in foo1 foo2 foo3
do
    wget http://samedomain.com/$file.gz .

    if [ -f $file.gz ];
    then
        ./myscript.sh $file.gz >> output.txt
    fi
done


Try combining the commands using &&, so that the 2nd one runs only after the 1st one completes successfully.

system("(nohup wget $file  && ./myscript.sh $file >> output.txt) &");


If you want parallel processing, you can do it yourself with forking, or use a built in module to handle it for you. Try Parallel::ForkManager. You can see a bit more on it's usage in How can I manage a fork pool in Perl?, but the CPAN page for the module will have the real useful info. You probably want something like this:

use Parallel::ForkManager;

my $MAX_PROCESSES = 8; # 8 parallel processes max
my $pm = new Parallel::ForkManager($MAX_PROCESSES);

my @files = glob("foo*.gz");

foreach $file (@all_data) {
  # Forks and returns the pid for the child:
  my $pid = $pm->start and next; 

  my $downurls = "http://somedomain.com/".$file;
  system("wget $file");
  system("./myscript.sh $file >> output.txt");

  $pm->finish; # Terminates the child process
}

print "All done!\n";
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号