I have a text file that looks like this:
A A A G A A
A A A A A A
G A G A G G
A G G G G G
G A A A A A
T C T C C C
A A A G A A
C C C C C C
T G G G G G
T T T T T T
I want to count the number of occurrences of each letter by row. There is a fair bit of documentation on doing this by field, but not by row. I have been thinking 开发者_如何转开发something like:
for(i=1; i <= NF, i++)
to loop through the columns in each row and then make a counter variable to add occurrences to. Is there a simpler way to do this?
I'm not much with awk, here's a perl version:
perl -ne 'my %c; $c{$_}++ for split; print scalar keys %c'
Output
212222212
If you prefer, add newline
perl -ne 'my %c; $c{$_}++ for split; print scalar keys %c . "\n"'
Edit
In reaction to the comment, perhaps this is more like what you meant:
perl -ne 'my %c; $c{$_}++ for split; print "$_:$c{$_} " for keys %c; print "\n"'
Output:
A:5 G:1
A:6
A:2 G:4
A:1 G:5
A:5 G:1
T:2 C:4
A:5 G:1
C:6
T:1 G:5
T:6
In awk, I don't think there is a simpler way to iterate over the fields in a line.
awk '
{
delete a
for (i=1; i<=NF; i++)
a[$i]++
printf("%d -- ", NR)
for (val in a)
printf("%s:%d, ", val, a[val])
print ""
}
'
Given your input, this outputs
1 -- A:5, G:1,
2 -- A:6,
3 -- A:2, G:4,
4 -- A:1, G:5,
5 -- A:5, G:1,
6 -- C:4, T:2,
7 -- A:5, G:1,
8 -- C:6,
9 -- G:5, T:1,
10 -- T:6,
精彩评论