开发者

Sorting by second word in perl

开发者 https://www.devze.com 2023-02-14 18:46 出处:网络
Hey guys, I have this file called phonebook Steve Blenheim:239-923-7366:238-934-7865:95 Latham Lane, Easton, PA 83755:11/12/56:20300

Hey guys, I have this file called phonebook

Steve Blenheim:239-923-7366:238-934-7865:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Betty Boop:245-836-8357:245-876-7656:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500
Igor Chevsky:385-375-8395:385-333-8976:3567 Populus Place, Caldwell, NJ 23875:6/18/68:23400
Norma Corder:397-857-2735:397-857-7651:74 Pine Street, Dearborn, MI 23874:3/28/45:245700

And I am trying to sort the text in reverse alphabetical order from the second word (the last name) and have not been able to find out how to do it. I am reading from the file by doing this

  open (FILE, phonebook);
  @line = <FILE>;
  close(FILE);

any ideas? I can sort the first field in alphabetic开发者_如何学Pythonal order and reverse, but can't seem to get the second one to sort properly. Thanks in advance


I share tadmc's concern that the second field, by whitespace isn't always going to be the surname, but answering the question as it pertains to the second field, you can get it using split, and you can sort it like this:

The simple but horribly slow version (easy to read, but it re-splits every field every single time it compares two lines, which is inefficient).

@lines = sort { # Compare second fields
    (split " ", $a)[1]
    cmp
    (split " ", $b)[1]
} @lines;

The Schwartzian transform version (does the exact same thing as the previous one, only much faster):

@lines = map { # Get original line back
    $_->[0]
} sort { # Compare second fields
    $a->[1] cmp $b->[1]
} map { # Turn each line into [original line, second field]
    [ $_, (split " ", $_)[1] ]
} @lines;


If you don't mind using the shell, sort -r -k2 will sort your file in reverse order.


Based on Miguel Prz solution I replaced the 'cmd' to '<=>'. It is important for numbers. If the CMP is used, then sorting will work as a string (digits) - first character is most important, then second and so on. If you have the numbers: 607, 8 and 35 then CMP will sort it as: 8, 607, 35. To sort it as numbers we use the "<=>' method and the result will be: 607, 35, 8

use strict;

open my $FILE, '<', 'phonebook';
my @lines = <$FILE>;

my @sorted = sort { 
                my @a = split(/\s+/,$a); 
                my @b = split(/\s+/,$b); 
                $b[1] <=> $a[1] } @lines;

foreach my $item(@sorted) {
    print "$item\n";
}

close $FILE;


You'll need to read the file line by line to do that. Something like this:

my %list;
open(FILE, phonebook);
while(<FILE>){
    my @vals = split(/:/, $_);
    (my $key = $vals[0]) =~ s/(\S+)\s+(.+)/$2 $1/; # split first field, reverse word order
    $list{$key} = $_; #save row keyed on $key
}

foreach my $key(sort {$b cmp $a} keys(%list)){
    print $list{$key};
}


I think it's interesting to write in a Modern Perl way (the solution is the same), and this is the complete script:

use strict;

open my $FILE, '<', 'phonebook';
my @lines = <$FILE>;

my @sorted = sort { 
                my @a = split(/\s+/,$a); 
                my @b = split(/\s+/,$b); 
                $b[1] cmp $a[1] } @lines;

foreach my $item(@sorted) {
    print "$item\n";
}

close $FILE;


I am surprised nobody has mentioned this, but if we are sorting a phonebook, we probably don't really want a pure ASCII sort.

Does Bob DeCarlo really belong before Ralph Dearborn? If you sort by using cmp Mr. DeCarlo winds up first in the results.

Even if you normalize for case, you've still got issues. There are a host of complications with sorting and filing things. Different organizations have rules for handling these issues.

Since sort is an expensive operation, you'll want to make each comparison work as quickly as possible. The way to do this is to use the simplest code possible for all your comparisons. Since cmp won't give us the desired result by itself, we need to generate and cache a normalized sort term for each item in the phone book.

So, assuming you've already got your phone book data in an array:

sub extract_and_normalize {
     # Do stuff here to embody your alphabetization rules.

     return [ $normed, $line ];   
}

# Generate your sort terms
my @processed = map extract_and_normalize($_), @lines;

# Sort by the normalized values
my @sorted = sort {$a->[0] cmp $b->[0]}, @processed;

# Extract the lines from the sorted set.
@lines = map $_->[1], @sorted;

Or use the Schwartzian Transform, as hobbs suggests, to avoid making all the intermediate variables:

@lines = map $_->[1],
         sort { $a->[0] cmp $b->[0] }
         map extract_and_normalize($_), @lines;
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号