In Perl, why are tied arrays so slow?_问答_开发者

In my testing, I have noticed that iterating over tied arrays are at best half as fast as using the internal accessor methods (FETCH and FETCHSIZE). The following benchmark shows the issue:

{package Array;
    sub new {
        my $class = shift;
        tie my @array, $class, [@_];
        \@array
    }
    sub TIEARRAY {
        my ($class, $self) = @_;
        bless $self => $class;
    }
    sub FETCH     {$_[0][$_[1]]}
    sub FETCHSIZE {scalar @{$_[0]}}
}

use List::Util 'sum';
use Benchmark 'cmpthese';

for my $mag (map 10**$_ => 1 .. 5) {

    my $array = Array->new(1 .. $mag);
    my $tied  = tied(@$array);
    my $sum   = sum @$array;

    print "$mag: \n";
    cmpthese -2 => {
        tied => sub {
            my $x = 0;
            $x += $_ for @$array;
            $x == $sum or die $x
        },
        method => sub {
            my $x = 0;
            $x += $tied->FETCH($_) for 0 .. $tied->FETCHSIZE - 1;
            $x == $sum or die $x
        },
        method_while => sub {
            my $x = 0;
            my $i = 0; $x += $tied->FETCH($i++) while $i < $tied->FETCHSIZE;
            $x == $sum or die $x
        },
        method_while2 => sub {
            my $x = 0;
            my $i = 0;
            $x += tied(@$array)->FETCH($i++) 
                while $i < tied(@$array)->FETCHSIZE;
            $x == $sum or die $x
        },
        method_while3 => sub {
            my $x = 0;
            my $i = 0;
            while ($i <开发者_开发百科; tied(@$array)->FETCHSIZE) {
                local *_ = \(tied(@$array)->FETCH($i++));
                $x += $_
            }
            $x == $sum or die $x
        },
    };
    print "\n";
}

With an array size of 1000, the benchmark returns:

1000: 
                Rate   tied method_while3 method_while2 method_while   method
tied           439/s     --          -40%          -51%         -61%     -79%
method_while3  728/s    66%            --          -19%         -35%     -65%
method_while2  900/s   105%           24%            --         -19%     -57%
method_while  1114/s   154%           53%           24%           --     -47%
method        2088/s   375%          187%          132%          87%       --

I have omitted the other runs because the size of the array does not produce a meaningful change in the relative speeds.

method is of course the fastest, because it does not check the size of the array on each iteration, however method_while and method_while2 seem to be operating on the tied array in the same manner as the for loop, yet even the slower method_while2 is twice as fast as the tied array.

Even adding localization of $_ and aliased assignment to method_while2 in method_while3 results in 66% faster execution than the tied array.

What extra work is happening in the for loop that is not happening in method_while3? Is there any way to improve the speed of the tied array?

Every time you use the tied array, it has to look up the tied object, then look up the methods, then invoke them. With your other versions, you are doing some or all of that lookup either at compile time or once before the loop instead of on every access.

(The comparison of speeds between method and the other method_* versions is a good example of this, by the way: you're seeing the expense of doing that FETCHSIZE, even having the tied object already looked up. Now apply that cost to every single operation that touches the array.)

In the benchmark, you do

... for @$array;

What if you had done

++$_ for @$array;

It would work. Tie magic wraps the value returned by FETCH into an lvalue when the value is returned in lvalue context. You can see this using Devel::Peek.

use Devel::Peek;
Dump( $array->[2] );

SV = PVLV(0x14b2234) at 0x187b374
  REFCNT = 1
  FLAGS = (TEMP,GMG,SMG,RMG)
  IV = 0
  NV = 0
  PV = 0
  MAGIC = 0x14d6274
    MG_VIRTUAL = &PL_vtbl_packelem
    MG_TYPE = PERL_MAGIC_tiedelem(p)
    MG_FLAGS = 0x02
      REFCOUNTED
    MG_OBJ = 0x14a7e5c
    SV = IV(0x14a7e58) at 0x14a7e5c
      REFCNT = 2
      FLAGS = (ROK)
      RV = 0x187b324
      SV = PVAV(0x187c37c) at 0x187b324
        REFCNT = 1
        FLAGS = (OBJECT)
        STASH = 0x14a842c       "Array"
        ARRAY = 0x0
        FILL = -1
        MAX = -1
        ARYLEN = 0x0
        FLAGS = (REAL)
    MG_LEN = 2
  TYPE = t
  TARGOFF = 0
  TARGLEN = 0
  TARG = 0x187b374

Wrapping the value returned by FETCH into an magical SV and processing that magic accounts for at least some of the difference.

tied_rvalue => sub {
    my $x = 0;
    $x += $_ for @$array;
    $x == $sum or die $x
},
tied_lvalue => sub {
    my $x = 0;
    $x += $array->[$_] for 0 .. $tied->FETCHSIZE - 1;
    $x == $sum or die $x
},

100:
                 Rate tied_rvalue method_while3 tied_lvalue method_while2 method_while method
tied_rvalue    3333/s          --          -33%        -36%          -50%         -58%   -77%
method_while3  4998/s         50%            --         -4%          -25%         -36%   -66%
tied_lvalue    5184/s         56%            4%          --          -23%         -34%   -65%
method_while2  6699/s        101%           34%         29%            --         -15%   -55%
method_while   7856/s        136%           57%         52%           17%           --   -47%
method        14747/s        342%          195%        184%          120%          88%     --

It's worth noting that local is very slow which accounts for some of the performance loss in method_while3 vs the other method benchmarks. It is also doing a block entry and exit which has costs.

Even though method_while3 is programmatically equivalent to statement for x..y, perl can optimize the for loop better than you can. As a rule of thumb, the more code you do inside perl the faster your code will be.