开发者

GROUP BY using Perl

开发者 https://www.devze.com 2023-01-29 07:26 出处:网络
I have a ton of SQL logs that I want to ext开发者_如何学Pythonract data from. This task takes a very long time because I am grouping by several columns. Thus, I decided to extract the logs along with

I have a ton of SQL logs that I want to ext开发者_如何学Pythonract data from. This task takes a very long time because I am grouping by several columns. Thus, I decided to extract the logs along with the columns I would typically group on without doing a GROUP BY on the SQL side. Instead I want use Perl to do my grouping by. When using Perl, the solution I'm thinking of is to create an n-dimensional hash to group by on the different columns. Are there any command line utilities or Perl functions that will allow me to do the same?


  1. As Ether said in the comment, let the tool that was actually engineered and optimized for the job do the job. A database server running properly optimized query is VERY unlikely to be any slower than you yourself can achieve outside of DB.

    Among other things, you will waste resources on transmitting more data over the network and will need more memory.

    As one of optimizations, try to use a temp table, though without having full schema and query and DB engine I wouldn't venture into giving any specific optimization advice.

    The outside-of DB approach could sometimes be better, for example if there are VERY VERY few rows that have duplicate "grouped by" keys, in which case there's pretty much no savings in terms of resources to transmit the grouped data; AND when your logic on the Perl side would have necessitated storing every row in memory anyway instead of iterating over them and throwing iterated ones away.

  2. If you still want to try to do this in Perl, a good approach is to do a SINGLE level hash, and develop a cheap way to encode the values in your unique key columns into a single hash value (pack/unpack can be used in some circumstances, or split/join, or more situation specific but better performing ways). The only requirement is that the encoded value can be uniquely mapped back to the unique key column values.

    # Store
    my %storage;
    foreach my $row (@$result_set) {
        my $hash_key = encode_hash_key(row);
        my $new_row = $row;
        if (exists $storage{$hash_key}) {
            $new_row = merge_rows($row, $storage{$hash_key});
        }
        $storage{$hash_key} = $new_row;
    }
    
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号