开发者

How can I check if a Perl array contains a particular value?

开发者 https://www.devze.com 2022-12-30 23:36 出处:网络
I am trying 开发者_运维百科to figure out a way of checking for the existence of a value in an array without iterating through the array.

I am trying 开发者_运维百科to figure out a way of checking for the existence of a value in an array without iterating through the array.

I am reading a file for a parameter. I have a long list of parameters I do not want to deal with. I placed these unwanted parameters in an array @badparams.

I want to read a new parameter and if it does not exist in @badparams, process it. If it does exist in @badparams, go to the next read.


Best general purpose - Especially short arrays (1000 items or less) and coders that are unsure of what optimizations best suit their needs.

# $value can be any regex. be safe
if ( grep( /^$value$/, @array ) ) {
  print "found it";
}

It has been mentioned that grep passes through all values even if the first value in the array matches. This is true, however grep is still extremely fast for most cases. If you're talking about short arrays (less than 1000 items) then most algorithms are going to be pretty fast anyway. If you're talking about very long arrays (1,000,000 items) grep is acceptably quick regardless of whether the item is the first or the middle or last in the array.

Optimization Cases for longer arrays:

If your array is sorted, use a "binary search".

If the same array is repeatedly searched many times, copy it into a hash first and then check the hash. If memory is a concern, then move each item from the array into the hash. More memory efficient but destroys the original array.

If same values are searched repeatedly within the array, lazily build a cache. (as each item is searched, first check if the search result was stored in a persisted hash. if the search result is not found in the hash, then search the array and put the result in the persisted hash so that next time we'll find it in the hash and skip the search).

Note: these optimizations will only be faster when dealing with long arrays. Don't over optimize.


Simply turn the array into a hash:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

$params{$newparam} = 1;

And later get a list of (unique) params back:

@badparams = keys %params;


You can use smartmatch feature in Perl 5.10 as follows:

For literal value lookup doing below will do the trick.

if ( "value" ~~ @array ) 

For scalar lookup, doing below will work as above.

if ($val ~~ @array)

For inline array doing below, will work as above.

if ( $var ~~ ['bar', 'value', 'foo'] ) 

In Perl 5.18 smartmatch is flagged as experimental therefore you need to turn off the warnings by turning on experimental pragma by adding below to your script/module:

use experimental 'smartmatch';

Alternatively if you want to avoid the use of smartmatch - then as Aaron said use:

if ( grep( /^$value$/, @array ) ) {
  #TODO:
}


This blog post discusses the best answers to this question.

As a short summary, if you can install CPAN modules then the most readable solutions are:

any(@ingredients) eq 'flour';

or

@ingredients->contains('flour');

However, a more common idiom is:

any { $_ eq 'flour' } @ingredients

But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the ~~ "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list.

any() will stop as soon as it finds your value.

Check out the blog post for more details.


Method 1: grep (may careful while value is expected to be a regex).

Try to avoid using grep, if looking at resources.

if ( grep( /^$value$/, @badparams ) ) {
  print "found";
}

Method 2: Linear Search

for (@badparams) {
    if ($_ eq $value) {
       print "found";
       last;
    }
}

Method 3: Use a hash

my %hash = map {$_ => 1} @badparams;
print "found" if (exists $hash{$value});

Method 4: smartmatch

(added in Perl 5.10, marked is experimental in Perl 5.18).

use experimental 'smartmatch';  # for perl 5.18
print "found" if ($value ~~ @badparams);

Method 5: Use the module List::MoreUtils

use List::MoreUtils qw(any);
@badparams = (1,2,3);
$value = 1;
print "found" if any {$_ == $value} @badparams;


@eakssjo's benchmark is broken - measures creating hashes in loop vs creating regexes in loop. Fixed version (plus I've added List::Util::first and List::MoreUtils::any):

use List::Util qw(first);
use List::MoreUtils qw(any);
use Benchmark;

my @list = ( 1..10_000 );
my $hit = 5_000;
my $hit_regex = qr/^$hit$/; # precompute regex
my %params;
$params{$_} = 1 for @list;  # precompute hash
timethese(
    100_000, {
        'any' => sub {
            die unless ( any { $hit_regex } @list );
        },
        'first' => sub {
            die unless ( first { $hit_regex } @list );
        },
        'grep' => sub {
            die unless ( grep { $hit_regex } @list );
        },
        'hash' => sub {
            die unless ( $params{$hit} );
        },
    });

And result (it's for 100_000 iterations, ten times more than in @eakssjo's answer):

Benchmark: timing 100000 iterations of any, first, grep, hash...
       any:  0 wallclock secs ( 0.67 usr +  0.00 sys =  0.67 CPU) @ 149253.73/s (n=100000)
     first:  1 wallclock secs ( 0.63 usr +  0.01 sys =  0.64 CPU) @ 156250.00/s (n=100000)
      grep: 42 wallclock secs (41.95 usr +  0.08 sys = 42.03 CPU) @ 2379.25/s (n=100000)
      hash:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)


Even though it's convenient to use, it seems like the convert-to-hash solution costs quite a lot of performance, which was an issue for me.

#!/usr/bin/perl
use Benchmark;
my @list;
for (1..10_000) {
    push @list, $_;
}

timethese(10000, {
  'grep'    => sub {
            if ( grep(/^5000$/o, @list) ) {
                # code
            }
        },
  'hash'    => sub {
            my %params = map { $_ => 1 } @list;
            if ( exists($params{5000}) ) {
                # code
            }
        },
});

Output of benchmark test:

Benchmark: timing 10000 iterations of grep, hash...
          grep:  8 wallclock secs ( 7.95 usr +  0.00 sys =  7.95 CPU) @ 1257.86/s (n=10000)
          hash: 50 wallclock secs (49.68 usr +  0.01 sys = 49.69 CPU) @ 201.25/s (n=10000)


@files is an existing array

my @new_values =  grep(/^2[\d].[\d][A-za-z]?/,@files);

print join("\n", @new_values);

print "\n";

/^2[\d].[\d][A-za-z]?/ = vaues starting from 2 here you can put any regular expression


You certainly want a hash here. Place the bad parameters as keys in the hash, then decide whether a particular parameter exists in the hash.

our %bad_params = map { $_ => 1 } qw(badparam1 badparam2 badparam3)

if ($bad_params{$new_param}) {
  print "That is a bad parameter\n";
}

If you are really interested in doing it with an array, look at List::Util or List::MoreUtils


If you need to know the amount of every element in array besides existing of that element you may use

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;
%bad_param_lookup = map { $_ => $bad_param_lookup{$_}++} @bad_params;

and then for every $i that is in @bad_params, $bad_param_lookup{$i} contains amount of $i in @bad_params


There are two ways you can do this. You can use the throw the values into a hash for a lookup table, as suggested by the other posts. ( I'll add just another idiom. )

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;

But if it's data of mostly word characters and not too many meta, you can dump it into a regex alternation:

use English qw<$LIST_SEPARATOR>;

my $regex_str = do { 
    local $LIST_SEPARATOR = '|';
    "(?:@bad_params)";
 };

 # $front_delim and $back_delim being any characters that come before and after. 
 my $regex = qr/$front_delim$regex_str$back_delim/;

This solution would have to be tuned for the types of "bad values" you're looking for. And again, it might be totally inappropriate for certain types of strings, so caveat emptor.


my @badparams = (1,2,5,7,'a','zzz');

my $badparams = join('|',@badparams);   # '|' or any other character not present in params

foreach my $par (4,5,6,7,'a','z','zzz')
{
    if ($badparams =~ /\b$par\b/)
    {
        print "$par is present\n";
    }
    else
    {
        print "$par is not present\n";
    }
}

You may want to check for numerical leading spaces consistancy

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号