开发者

Why do these different Perl sort subroutines give me different orders of elements?

开发者 https://www.devze.com 2022-12-11 23:41 出处:网络
I copied this example from perldoc -f sort. I added the @old array and the prints of the @new arrays. Why do I get three different @new arrays? Is there something wrong with my @old?

I copied this example from perldoc -f sort. I added the @old array and the prints of the @new arrays. Why do I get three different @new arrays? Is there something wrong with my @old?

@old = qw( =332 =43 =avxc =aed =jjj =3322 =aa44 =ssss );
say "\nold  :   @old\n";

# inefficiently sort by descending numeric compare using
# the first integer after the first = sign, or the
# whole record case-insensitively otherwise
@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
||
uc($a) cmp uc($b)
} @old;
say "new_1:   @new"; # =3322 =332 =43 =aa44 =aed =avxc =jjj =ssss

# same thing, but much more efficiently;
# we'll build auxi开发者_如何学Pythonliary indices instead
# for speed
@nums = @caps = ();
for (@old) {
push @nums, /=(\d+)/;
push @caps, uc($_);
}
@new = @old[ sort {
$nums[$b] <=> $nums[$a]
||
$caps[$a] cmp $caps[$b]
} 0..$#old
];
say "new_2:   @new"; # =avxc =332 =43 =3322 =aa44 =aed =jjj =ssss

# same thing, but without any temps
@new = map { $_->[0] }
sort { $b->[1] <=> $a->[1]
||
$a->[2] cmp $b->[2]
} map { [$_, /=(\d+)/, uc($_)] } @old;
say "new_3:   @new\n"; # =3322 =332 =43 =avxc =aed =jjj =aa44 =ssss


Empty lists

The three lists are different because the regular expression

/=(\d+)/

may return an empty list (when it doesn't match) --- messing up your structures. I inserted the following lines into your code and got the indicated answers:

say "nums: @nums";  # nums: 332 43 3322
say "caps: @caps";  # caps: =332 =43 =AVXC =AED =JJJ =3322 =AA44 =SSSS

You see, any item that doesn't match the regexp is missing in @nums and messes up your list. This can be fixed by changing the definition of @nums to:

push @nums, /=(\d+)/? $1:undef;

(Edit: added clarification) The same problem happens in your last example:

For $_="=123", [$_, /=(\d+)/, uc($_)] = ["123",  123,  123].
For $_="abcd", [$_, /=(\d+)/, uc($_)] = ["abcd", "ABCD"].

The match disappears and the uppercased string moves into its place. This is not what you intended. One fix is to replace, as above, the regular expression by an expression that always produces exactly one scalar:

[$_, /=(\d+)/? $1:undef, uc($_)]

Another solution would be to swap the two last elements in the list:

@new = map { $_->[0] }                                                          
sort { $b->[2] <=> $a->[2]                                                      
||                                                                              
$a->[1] cmp $b->[1]                                                             
} map { [$_, uc($_), /=(\d+)/] } @old;                                          
say "new_3:   @new\n";

Now, the regular expression is at the end. If it produces no matches, the list is short (just two elements). Nevertheless, $a->[2] produces the desired result: undef. (You may or may not like this approach of letting the bug bite but have the correct result come out as a side-effect of perl reading beyond the end of the shorter lists).

With these fixes, all three lists produce the same result.

Warnings

Please run your program with the "-w" warnings switch on. You find that you compare quite a lot of undefined values and non-numerics. You should fix your program to do without this, e.g:

@new = map { $_->[0] }                                                          
sort {                                                                          
defined($b->[2]) && defined($a->[2]) && ($b->[2] <=> $a->[2])                   
||                                                                              
$a->[1] cmp $b->[1]                                                             
} map { [$_, uc($_), /=(\d+)/] } @old;                                          
say "new_4:   @new\n";


Those are three examples of sorting an array using a user defined comparator. Each example outputs the sorted array after it's run.

Since there are three examples, the sorted array is printed three times. Of course, you are not expected to use all three versions when you want to sort.

The last example uses the Schwartzian Transform.

I think the original author's intention was for all examples to produce the same sorted output, but due to the bugs explained by Yaakov Belch, they don't.

Note that your data do not match the expectations of the comparator you are using. The comparator your are using does expect all records to be of the form =\d.+ (i.e. an equal sign followed by a digit followed by arbitrary characters). If all records do not conform to this format, you need to be careful in the comparator.

Fixed first and third examples below. I don't think there is any need to use example 2: Either it pays to pre-compute or it doesn't. If it does, use the Schwartzian Transform. If it doesn't, use the regular sort that extracts keys for each comparison.

#!/usr/bin/perl

use 5.010;
use strict; use warnings;

my @old = qw( =332 =43 =avxc =aed =jjj =3322 =aa44 =ssss );

my @new_1 = sort {
    my $ad = $a =~ /=(\d+)/ ? $1 : undef;
    my $bd = $b =~ /=(\d+)/ ? $1 : undef;
    return  1 if defined $bd and not defined $ad;
    return -1 if not defined $bd and defined $ad;
    return $bd <=> $ad if defined $ad and defined $bd;
    return uc $a cmp uc $b;
} @old;

say "new_1:\t@new_1\n"; # =3322 =332 =43 =avxc =aed =jjj =aa44 =ssss

my @new_3 = map { $_->[0] }
    sort {
        return  1 if defined $b->[1] and not defined $a->[1];
        return -1 if not defined $b->[1] and defined $a->[1];
        return  $b->[1] <=> $a->[1] if defined $b->[1] and defined $a->[1];
        return  $a->[2] cmp $b->[2];
} map { [$_, /=(\d+)/ ? $1 : undef, uc($_)] } @old;

say "new_3:\t@new_3\n"; # =3322 =332 =43 =avxc =aed =jjj =aa44 =ssss

Output:

new_1:  =3322 =332 =43 =aa44 =aed =avxc =jjj =ssss

new_3:  =3322 =332 =43 =aa44 =aed =avxc =jjj =ssss
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号