开发者

How do I get a count of unique characters by row with awk?

开发者 https://www.devze.com 2023-03-16 09:17 出处:网络
I have a text file that looks like this: A A A G A A A A A A A A G A G A G G A G G G G G G A A A A A T C T C C C

I have a text file that looks like this:

A A A G A A
A A A A A A
G A G A G G
A G G G G G
G A A A A A
T C T C C C
A A A G A A
C C C C C C
T G G G G G
T T T T T T

I want to count the number of occurrences of each letter by row. There is a fair bit of documentation on doing this by field, but not by row. I have been thinking 开发者_如何转开发something like: for(i=1; i <= NF, i++) to loop through the columns in each row and then make a counter variable to add occurrences to. Is there a simpler way to do this?


I'm not much with awk, here's a perl version:

perl -ne 'my %c; $c{$_}++ for split; print scalar keys %c'

Output

212222212

If you prefer, add newline

perl -ne 'my %c; $c{$_}++ for split; print scalar keys %c . "\n"'

Edit

In reaction to the comment, perhaps this is more like what you meant:

perl -ne 'my %c; $c{$_}++ for split; print "$_:$c{$_} " for keys %c; print "\n"'

Output:

A:5 G:1 
A:6 
A:2 G:4 
A:1 G:5 
A:5 G:1 
T:2 C:4 
A:5 G:1 
C:6 
T:1 G:5 
T:6 


In awk, I don't think there is a simpler way to iterate over the fields in a line.

awk '
  {
    delete a
    for (i=1; i<=NF; i++)
      a[$i]++
    printf("%d -- ", NR)
    for (val in a)
      printf("%s:%d, ", val, a[val])
    print ""
  }
'

Given your input, this outputs

1 -- A:5, G:1, 
2 -- A:6, 
3 -- A:2, G:4, 
4 -- A:1, G:5, 
5 -- A:5, G:1, 
6 -- C:4, T:2, 
7 -- A:5, G:1, 
8 -- C:6, 
9 -- G:5, T:1, 
10 -- T:6, 
0

精彩评论

暂无评论...
验证码 换一张
取 消