I wrote a quick script to download files using LWP::Simple library and its getstore() function. It is working rather well, but occasionally downloaded file is not complete. I do not know what is causing this, but when I download it afterward manually using wget in command line file is OK.
I would guess corrupted files are caused by connection drop or something similar, although I run my script on开发者_开发知识库 dedicated line in datacenter connection might drop somewhere between my server and remote server.
This is my code:
sub download {
my $status = getstore($_[0], $_[1]);
if (is_success($status)) { return 1; } else { return 0; }
}
What are the possible solutions for this problem? How to check if transfer went alright and if file is complete and not corrupted?
Thank you for your valuable replies.
The is_success() sub returns true for any 2XX HTTP code, so if you are for example getting "206 Partial Content", that will count as success.
You can just check whether status is 200 or not, and act accordingly.
We can do like so:
use LWP;
use HTTP::Request::Common;
my $ua = LWP::UserAgent->new;
$ua->timeout(3);
my $res = $ua->request(HEAD $url); # just to get headers of a file
my $length_full = $res->headers->{'content-length'};
...
$res = $request(GET $url);
my $length_got = $res->content_length;
if ($length_got != $length_full) { print "File have not been downloaded completely!\n";
...
The $status
values you can get are listed in the LWP::Simple documentation. If the servers return an error status every time you get a partial or corrupted download, just checking the return value would be enough.
Otherwise, you would need a more sophisticated strategy. If there are MD5 or SHA checksums for the files, you can check those after download. If not, you need to inspect the headers, find out how much the server was planning to send and how much you received.
精彩评论