开发者

how to find distance between elements of two arrays?

开发者 https://www.devze.com 2023-01-30 03:02 出处:网络
I\'m writing in perl, but it seems more like an algorithm question to me.Replies in other languages are welcome.

I'm writing in perl, but it seems more like an algorithm question to me. Replies in other languages are welcome.

I have two sorted arrays of integers, short and long. For each element in short, I want to find the closest element in long, and in my particular case i want to make a histogram of the distances.

Here's the algorithm I'm using:

sub makeDistHist {
    my ($hist, $short, $long, $max) = @_; # first 3 are array references

    my $lIndex = 0;
    foreach my $s (@$short) {
        my $distance = abs( $s - $long->[$lIndex] );
        while (abs( $s - $long->[$lIndex+1] ) < $distance) {
            $distance = abs( $s - $long->[$lIndex] );
            $lIndex++;
        }
        $distance = $max if $distance>$max; # make overflow bin
        $hist->[$distance]++;
    }  
}

This relies on short and long being sorted.

Here's a subroutine i wrote to test my algorithm. The first test succeeds, but the second fails:

sub test { # test makeDistHist 

    my @long = qw(100 200 210 300 350 400 401 402 403 404 405 406);
    my @short = qw(3 6 120 190 208 210 300 350);
    my @tarHist;
    $tarHist[97]++;
    $tarHist[94]++;
    $tarHist[20]++;
    $tarHist[10]++;
    $tarHist[2]++;
    $tarHist[0]+=3;

    my $max = 3030;
    my @gotHist;
    makeDistHist(\@gotHist, \@short, \@long, $max);

    use Test::More tests => 2;
    is_deeply(\@gotHist, \@tarHist, "did i get the correct distances for two different arrays?");

    @gotHist = ();
    @tarHist = ( @long+0 );
    makeDistHist(\@gotHist, \@long, \@long, $max);
    is_deeply(\@gotHist, \@tarHist, "did i get the correct distances between an array and itself?");  # nope!
    print Dumper(\@gotHist);
}

here's the dump:

$VAR1 = [
          7,
          5
        ];

(the problem persists if I compare long to a copy of it minus one element, so it's not that the algorithm requires short to be strictly shorter than long. also, if I change 401, 402... to 402, 404... gotHist becomes (7, undef, 5).)

Here's what I'd like from y'all: first and for开发者_Go百科emost, a working algorithm for this. Either fix what I've got or devise another from whole cloth. Secondly, I could use help in my debugging skills. How would you go about identifying the problem with the existing algorithm? If I could do that I wouldn't need to ask this question :)

Thanks!


You should break up the subroutine: Calculating the distances and building the histogram are two different things and much clarity is lost by trying to combine the two.

Start with the simplest solution first. I understand the potential optimization by using a sorted @long, but resort to that only if List::Util::min is slow.

You can use Statistics::Descriptive to generate the frequency distribution.

#!/usr/bin/perl

use strict; use warnings;
use List::Util qw( min );
use Statistics::Descriptive;

my $stat = Statistics::Descriptive::Full->new;

my @long = (100, 200, 210, 300, 350, 400, 401, 402, 403, 404, 405, 406);
my @short = (3, 6, 120, 190, 208, 210, 300, 350);

for my $x ( @short ) {
    $stat->add_data(find_dist($x, \@long));
}

my $freq = $stat->frequency_distribution_ref([0, 2, 10, 20, 94, 97]);
for my $bin ( sort { $a <=> $b } keys %$freq ) {
    print "$bin:\t$freq->{$bin}\n";
}

sub find_dist {
    my ($x, $v) = @_;
    return min map abs($x - $_), @$v;
}

Output:

[sinan@archardy so]$ ./t.pl
0:      3
2:      1
10:     1
20:     1
94:     1
97:     1

Of course, it is possible to do this without using any modules and using your assumption of a sorted @long:

#!/usr/bin/perl

use strict; use warnings;

my @long = (100, 200, 210, 300, 350, 400, 401, 402, 403, 404, 405, 406);
my @short = (3, 6, 120, 190, 208, 210, 300, 350);

my @bins = reverse (0, 2, 10, 20, 94, 97);
my %hist;

for my $x ( @short ) {
    add_hist(\%hist, \@bins, find_dist($x, \@long));
}

for my $bucket ( sort { $a <=> $b } keys %hist ) {
    print "$bucket:\t$hist{$bucket}\n";
}

sub find_dist {
    my ($x, $v) = @_;
    my $min = abs($x - $v->[0]);
    for my $i ( 1 .. $#$v ) {
        my $dist = abs($x - $v->[$i]);
        last if $dist >= $min;
        $min = $dist;
    }
    return $min;
}

sub add_hist {
    my ($hist, $bins, $x) = @_;
    for my $u ( @$bins ) {
        if ( $x >= $u ) {
            $hist{ $u } += 1;
            last;
        }
    }
    return;
}


Regarding the part about debugging, use an IDE that allows breakpoints. I don't have an example for perl, but for PHP and ASP.NET, there are Eclipse and Visual Studio (or the free version, Visual Web Developer), respectively.

0

精彩评论

暂无评论...
验证码 换一张
取 消