In a simple webapp I need to map URLs to filenames or filepaths.
This app has a requirement that it can depend only on modules in the core Perl ditribution (5.6.0 and later). The problem is that filename length on most filesystems is limited to 255. Another limit is about 32k subdirectories in a single folder.
My solution:
my $filename = $url;
if (length($filename) > $MAXPATHLEN)开发者_开发技巧 { # if filename longer than 255
my $part1 = substr($filename, 0, $MAXPATHLEN - 13); # first 242 chars
my $part2 = crypt(0, substr($filename, $MAXPATHLEN - 13)); # 13 chars hash
$filename = $part1.$part2;
}
$filename =~ s!/!_!g; # escape directory separator
Is it reliable ? How can it be improved ?
crypt on most platforms will ignore anything after the first 8 characters of input. Given your requirements, I would suggest Digest::MD5.
Update: Given the new 5.6.0 requirement, look up a hashing algorithm and implement it to get a number, then base64 encode it (manually, since MIME::Base64 also isn't core until 5.7.3.) A quick way to do so would be to just copy the md5_base64 subroutine from Digest::Perl::MD5 on CPAN (and the other subroutines and constants there that it calls/uses).
For simplicity I'd try breaking the URL into it's (logical) constituent parts so you end up with a nice neat directory structure that maps to the URL:
/
/http
/https
/http/com
/http/com/google
/http/com/stackoverflow
/http/com/stackoverflow/questions
/http/com/stackoverflow/questions/2173839
This would probably make good sense if you're processing a large variety of different domains & websites but I haven't seen your sample data so I can't tell.
If you're likely to run into collisions with this (or any) style of URL mapping then try treating the file system as a hash structure. You could consider the root directory as a hash (with anywhere from 32k to 255^255 buckets, depending on the system) and place files directly in there. How you deal with collisions will depend on the volume of data & likelihood of occurrence.
精彩评论