开发者

Flat file data analysis

开发者 https://www.devze.com 2023-03-03 16:46 出处:网络
I have a flat file that consists of the following structure: A1 B1 C1 D1 E1 F1 G1 A2 B2 C2 D2 E2 F2 G2 A3 B3 C3 D3 E3 F3 G3

I have a flat file that consists of the following structure:

A1 B1 C1 D1 E1 F1 G1  
A2 B2 C2 D2 E2 F2 G2  
A3 B3 C3 D3 E3 F3 G3

This file has around 1 million rows.

I would like to generate the following statistics:开发者_运维百科

  1. Number of rows in the file.
  2. Number of unique records in a particular row (e.g. B).
  3. Sort by row F and create a file containing the top n records in that row.

What would be the best way of doing this analysis? I'm currently using Mac OSX, so a Linux/Mac solution would be preferred.


Pretty easy to do in bash (your mac command line shell).

Something like:

# 1. row count
wc -l filename

# 2. uniq count in col 1
cut -d " " -f 1 <filename> | sort | uniq | wc -l

# 3. top n uniq values in col 6, and their counts
cut -d " " -f 6 <filename> | sort | uniq -c | sort -nr | head -n <numrows>
0

精彩评论

暂无评论...
验证码 换一张
取 消