开发者

A fast way to extract similar data in array

开发者 https://www.devze.com 2023-03-08 16:08 出处:网络
I don\'t exactly know how to state my problem below in a question so please bear with me. The problem:

I don't exactly know how to state my problem below in a question so please bear with me.

The problem:

I have a multi-dimension array that looks like this:

$raw_list[0]['123','foo','foo1','300']
$raw_list[1]['456','foo2','foo3','4']
$raw_list[2]['123','foo4','foo5','67']
$raw_list[3]['456','foo6','foo7','34']

This usually gets very large (can possibly get to over a thousand indexes?)

What I want to do with it is to separate all records with the same 0th element value in $raw_list[nth][0] and operate on each group such that...

$raw_list[0]['123','foo','foo1','300']
$raw_list[2]['123','foo4','foo5','67']

Then I operate on this group to get various statistica开发者_Python百科l info. For example, the sum of element values '300' and '67' and so on.

Current solution:

At the moment this is how my code actually looks like.

my @anum_group = ();
@die_raw_list = sort {$a->[0] <=> $b->[0]} @die_raw_list;

my $anum_reference = @die_raw_list[0][0];

for my $row (0..$#die_raw_list) 
{
    if ($die_raw_list[$row][0] == $anum_reference)
    {
        push @anum_group, $die_raw_list[$row];
    }
    else
    {
        # Profile ANUM group
        # ... operation to get statistical info on group here


        # Initialize next ANUM group
        $anum_reference = $die_raw_list[$row][0];
        @anum_group = ();
        push @anum_group, $die_raw_list[$row];
    }
}

# Profile last ANUM group
#  ... operation to get statistical info on group here

Final thoughts and question:

I realized that on very large data this tends to be very slow and I want to speed things up.

I'm new with Perl and don't know how to best solve this problem.


A thousand indexes is not that many... What makes you think your code is slow? And what part is slow?

If the first element is that important, you could re-arrange your data structure to index it that way in the first place:

my %raw_list = ('123' => [['foo', 'foo1', '300'],
                          ['foo4', 'foo5', '67']],
                '456' => [['foo2', 'foo3', '4'],
                          ['foo6', 'foo7', '34']]);

You could build it dynamically something like this:

my %raw_list;
my $elt0 = '123';
my @rec = ('foo', 'foo1', '300');
push @{$raw_list{$elt0}}, \@rec;

And process it like this:

foreach my $elt0 (keys %raw_list) {
    my $records = $raw_list{$elt0};
    foreach my $rec (@$records) {
        # Now $elt0 is (e.g.) '123'
        # and $rec->[0] is 'foo', $rec->[1] is 'foo1', $rec->[2] is '300'
    }
}

To be really clean, you would want to encapsulate all of this in an object...


If I understand you correctly, you want to grab the records with the same value in the first value in the second dimension, 123 in your example, sort them by the other fields, and then compare certain values inside them.

This can all be accomplished by sorting by the different values:

my @sorted = sort { 
    $a->[0] <=> $b->[0] || # <=> for numerical
    $a->[1] cmp $b->[1] || # cmp for non-numerical
    $a->[2] cmp $b->[2] ...etc
} @die_raw_list;

Then you can simply loop your way through your data, picking out the values you need.

If you only want some of the values, you can do a partial selection with something simple like:

my @partial;
for my $refs (@die_raw_list) {
    push @partial, $ref if $ref->[0] == '123';
}


You can put your data into a hash indexed by the first element and then quickly go through each element of the hash:

#test data
my $foo = [[1,2,3],[1,5,6],[2,8,9]];

#group elements 1..n by first element
my %bar;
map { $bar{$_->[0]} ||= (); push(@{$bar{$_->[0]}},[@{$_}[1..@$_-1]]) } @$foo;

#lame dump
foreach (keys %bar) {
    print "key: $_\n";
    foreach (@{$bar{$_}}) {
        foreach (@{$_}) {
            print "$_ ";
        }
        print "\n";
    }
    print "\n";
}

Of course, this solution might only make sense if you need to process each group, and want to do them separately, and might need to make multiple passes.


map($keys{$_->[0]} = 1, @raw_list);
foreach $k (keys %keys)
{
 @a = grep($_->[0]==$k,@raw_list);
 # do something with @a;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号