开发者

How can I map URLs to filenames with perl?

开发者 https://www.devze.com 2022-12-18 10:51 出处:网络
In a simple webapp I need to map URLs to filenames or filepaths. This app has a requirement that it can depend only on modules in the core Perl ditribution (5.6.0 and later).

In a simple webapp I need to map URLs to filenames or filepaths.

This app has a requirement that it can depend only on modules in the core Perl ditribution (5.6.0 and later). The problem is that filename length on most filesystems is limited to 255. Another limit is about 32k subdirectories in a single folder.

My solution:

my $filename = $url;

if (length($filename) > $MAXPATHLEN)开发者_开发技巧 { # if filename longer than 255
    my $part1 = substr($filename, 0, $MAXPATHLEN - 13);        # first 242 chars
    my $part2 = crypt(0, substr($filename, $MAXPATHLEN - 13)); # 13 chars hash
    $filename = $part1.$part2;
}
$filename =~ s!/!_!g; # escape directory separator

Is it reliable ? How can it be improved ?


crypt on most platforms will ignore anything after the first 8 characters of input. Given your requirements, I would suggest Digest::MD5.

Update: Given the new 5.6.0 requirement, look up a hashing algorithm and implement it to get a number, then base64 encode it (manually, since MIME::Base64 also isn't core until 5.7.3.) A quick way to do so would be to just copy the md5_base64 subroutine from Digest::Perl::MD5 on CPAN (and the other subroutines and constants there that it calls/uses).


For simplicity I'd try breaking the URL into it's (logical) constituent parts so you end up with a nice neat directory structure that maps to the URL:

/
/http
/https
/http/com
/http/com/google
/http/com/stackoverflow
/http/com/stackoverflow/questions
/http/com/stackoverflow/questions/2173839

This would probably make good sense if you're processing a large variety of different domains & websites but I haven't seen your sample data so I can't tell.

If you're likely to run into collisions with this (or any) style of URL mapping then try treating the file system as a hash structure. You could consider the root directory as a hash (with anywhere from 32k to 255^255 buckets, depending on the system) and place files directly in there. How you deal with collisions will depend on the volume of data & likelihood of occurrence.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号