
Perl - How do I count and print occurrences of domains in email address array?

I have been struggling with this for a couple days now and cannot seem to figure it out. I have an array of email addresses that were created via push(@emails,$email) in a while loop.

I have an array of email addresses that were created via push(@emails,$email) in a while loop.

I am attempting to create a list of unique domains with occurrence count of each in the array.

Ordered by number of occurrences.

So, if the array @emails has:

john@yadoo.com ringo@geemail.net george@zoohoo.org paul@yadoo.com

I can print:

yadoo.com 2
geemail.net 1
zoohoo.org 1

I found this example based on emails in a file but, WAY over my head. Can someone help me in a more verbose code example that can be used with an array of email addresses?

perl -e 'while(<>){chomp;/^[^@]+@([^@]+)$/;$h{$1}++;}
foreach $k (sort { $h{$b} <=> $h{$a} } keys %h)  {print $h{$k}." ".$k."\n";} infile

I also tried: (more to my level of lack of understanding)

foreach my $domain (sort keys %$domains) {
  print "$domain"."=";
  print $domains->{$domain}."\n";


my %countdoms;
$countdoms{$_}++ for @domains;
print "$_ $countdoms{$_}\n" for keys %countdoms;

The best result I got of many different attempts was a total count (which was 1812 (accurate count) with a number 2 next to it. I am close, possibly?

Instead of giving you another answer, let me explain you what your code example is doing:

foreach $k (sort { $h{$b} <=> $h{$a} } keys %h)  {print $h{$k}." ".$k."\n";} 

The first line counts the domains from emails in files.

while(<>) iterates over the input files line by line. The input files are the file(s) passed as arguments or stdin if no arguments were passed. Each line is placed in $_.

chomp; simply removes the newline from the end of $_.

/^[^@]+@([^@]+)$/ is the regular expression that parses out the domain and is applied to $_. It checks for something that has no '@' in the first part, then a '@' and then no '@' in the last part. It remembers the last part, which will be stored in $1. ^ and $ stand for the beginning and the end of the string, respectively.

$h{$1}++; uses the domain (in $1) to increment the count in the hash %h. This works even if it's not present, because undef behaves here like 0.

In order to make this work for your list, you can just do

foreach(@emails) {/^[^@]+@([^@]+)$/;$h{$1}++;}

The second line prints the domains from the hash %h.

sort { $h{$b} <=> $h{$a} } keys %h returns a list of domains sorted by descending occurrence by using the comparison function $h{$b} <=> $h{$a} to look up the count. Note that it's b <=> a, not a <=> b, this makes it descending.

The rest of line 2 prints out the result.

If you have your email address populated in an array this'll get you a count for each domain. I'm sure someone can produce something prettier!

my @emails = ('john@yadoo.com','ringo@geemail.net','george@zoohoo.org','paul@yadoo.com');

my %domainCount;

    if ($_ =~ /@(\w+.*)/){

for my $domain (sort { $domainCount{$b} <=> $domainCount{$a}} keys %domainCount ){
    print "$domain - $domainCount{$domain}\n";

It's a bit crude because I am rusty on Perl but this should do the job:

use strict;
my ($dom, %hsh);
my @arr = ('john@yadoo.com', 'ringo@geemail.net', 'george@zoohoo.org', 'paul@yadoo.com');
foreach (@arr) {
    ($dom) = ($_ =~ /.*\@(.*)$/);
foreach (keys %hsh) {
    print ("$_:$hsh{$_}\n");

Another variation:

use strict;
use warnings;

my @array 
    = qw<john@yadoo.com ringo@geemail.net george@zoohoo.org paul@yadoo.com>
my %dom_count;
$dom_count{ $_ }++ foreach map { ( split '@' )[-1] } @array;
foreach my $pair ( 
    sort { $b->[1] <=> $a->[1] or $a->[0] cmp $b->[0] } 
    map  { [ $_ => $dom_count{ $_ } ] } keys %dom_count 
    ) { 
    print "@$pair\n";


