开发者

csv file filtering

开发者 https://www.devze.com 2023-03-14 11:46 出处:网络
I have a .csv file with a header row like so; headerA,headerB,headerC bill,jones,p mike,smith,f sally,silly,p

I have a .csv file with a header row like so;

headerA,headerB,headerC
bill,jones,p
mike,smith,f
sally,silly,p
开发者_如何学C

I'd like to filter out any records with the f value in the headerC column.

Can I do that with sed or awk?


If header does not contains only f at the third columns name:

sed '/,f$/d' FILE

will do (deletes every line from the input if it ends with ,f).

If it has, I'd go with:

sed -n -e '1p;/,[^f]$/p' FILE

(Does not print anything by default (-n) but the 1st line must 1p, and if the lines are ends with other char than f... Note: this will not work, if the 3rd columnc contains more than one char.)

And an awk one:

awk -F, 'NF == 1 ; NF > 1 && $3 != "f"' FILE

(This always prints the first line (NF == 1 is true, then default action, which is print $0, then the next condtitions are checking if we had got over the 1st line, and the 3rd field is not f then default action...)

HTH


well, if you know that headerC is always in the third column, the following sed command would work:

sed -r '/[^,]+(,[^,]+){1},f/ d' < file.csv > filefiltered.csv

And the following awk command does the same:

awk 'BEGIN {FS=","} {if($3 != "f") print}' file.csv

If you don't know headerC is always in a particular column it gets a little more tricky. Does this work?


A bit unclear, is this what you are asking for?

$ awk -F, '{ if($3 == "f")print}' input
mike,smith,f

With a header and formatted using column

$ awk -F, '{ if (NR == 1)print}{if($3 == "f")print}' input | column -t -s,
headerA  headerB  headerC
mike     smith    f


grep works, look at example.

grep ",.*,.*f" << EOF
headerA,headerB,headerC
bill,josef,p
mike,smith,f
sally,silly,p
EOF

outputs:

mike,smith,f


no need for sed or awk, this can be done with more simpler commands like cut and grep piped together like this

cut -d"," -f 3| grep -i f

I am assuming the delimiter is coma and Column c is thrid one. if it si not change the values above appropriately. And i have used grep with i option so that it ignore case. If you want to match only lowercse f or upppercase f then remove the i option and change it accordingly.

0

精彩评论

暂无评论...
验证码 换一张
取 消