开发者

How do I access random element in a Perl DBM hash?

开发者 https://www.devze.com 2022-12-18 02:48 出处:网络
I have a Perl DBM hash containing a list of URLs that I want to pick randomly from to load balance spidering sites.As a result I want to pick a key at random, or select the n开发者_运维知识库th elemen

I have a Perl DBM hash containing a list of URLs that I want to pick randomly from to load balance spidering sites. As a result I want to pick a key at random, or select the n开发者_运维知识库th element (so I can randomise n).

I'm aware this goes against the concept of a hash, but is this possible?

NOTE: missed a valuable point that hash size will be too large to load all the keys to randomly select.


I don't think any of the DBM packages have an API for retrieving a random key, or for retrieving keys by index number. You can look up a particular key, or you can read through all the keys in whatever order the database chooses to return them in (which may change if the database is modified, and may or may not be "random" enough for whatever you want to do).

You could read through all the keys and pick one, but that would require reading the entire database each time (or at least a sizable chunk of it), and that's probably too slow.

I think you'll need to rearrange your data structure.

  1. You could use a real SQL database (like SQLite), so you could look up rows both by a sequential row number and by URL. This would be the most flexible.

  2. You could use a sequential integer as the key for your DBM file. That would make picking a random one easy, but you could no longer look up entries by URL.

  3. You could use two DBM files: the one you have now and a second keyed by sequential integer with the URL as value. (Actually, since URLs don't look like integers, you could store both sets of records in the same DBM file, but that would complicate any code that uses each.) This would use twice the disk space, and would make inserting/removing entries a bit more complicated. You'd probably be better off with approach #1, unless you can't install SQLite for some reason.


Picking a random element from an array is simpler so you can use keys(%foo) to get the array of keys and pick randomly from that.

I believe this will return a random element $x from an array:

$x = $array[rand @array];

If you want to shuffle an array, consider List::Util::shuffle. See http://search.cpan.org/perldoc/List::Util#shuffle_LIST


Of course, it is possible. First, get a list of the keys. Then, randomize the list, using shuffle from List::Util.

Then, loop over the keys.

If there are too many keys (so keeping them all in a list and shuffling is not possible), just remember that you are using tied hashes. Just use each to iterate over key value pairs.

The order will be deterministic but AFAIK, it will not be alphabetical or order of insertion. That, by itself, might be able to get you what you want.


You could use DBM::Deep instead of a traditional DB file to keep your data.

tie %hash, "DBM::Deep", {
    file => "foo.db",
    locking => 1,
    autoflush => 1
};

# $hash{keys} = [ ... ]
# $hash{urls} = { ... } <- same as your current DB file.

my $like_old = $hash{urls}; # a ref to a hash you can use like your old hashref.
my $count = @{$hash{keys}};

With that you can pull out random values as needed.

0

精彩评论

暂无评论...
验证码 换一张
取 消