parse a csv file that contains commans in the fields with awk_问答_开发者

parse a csv file that contains commans in the fields with awk

开发者 https://www.devze.com 2023-01-28 15:44 出处：网络

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.

相关专题：csv

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.

awk -F, {print $1} testfile.csv

my output `ends up looking like

307.00
$132.34
30.23

What am I doing wrong.

"$141,818.88","$52,8开发者_开发问答31,578.53","$52,788,069.53" this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.

If the input is "$307.00","$132.34","$30.23" I want the output to be in a

$307.00
$132.34
$30.23

Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).

awk -F'","|^"|"$' '{print $2}' testfile.csv

Input

# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"

Output

# awk -F'","|^"|"$' '{print $2}' testfile.csv
$141,818.88
$2,558.20
$786.48

You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

I think what you're saying is that you want to split the input into CSV fields while not getting tripped up by the commas inside the double quotes. If so...

First, use "," as the field separator, like this:

awk -F'","' '{print $1}'

But then you'll still end up with a stray double-quote at the beginning of $1 (and at the end of the last field). Handle that by stripping quotes out with gsub, like this:

awk -F'","' '{x=$1; gsub("\"","",x); print x}'

Result:

echo '"abc,def","ghi,xyz"' | awk -F'","' '{x=$1; gsub("\"","",x); print x}'

abc,def

In order to let awk handle quoted fields that contain the field separator, you can use a small script I wrote called csvquote. It temporarily replaces the offending commas with nonprinting characters, and then you restore them at the end of your pipeline. Like this:

csvquote testfile.csv | awk -F, {print $1} | csvquote -u

This would also work with any other UNIX text processing program like cut:

csvquote testfile.csv | cut -d, -f1 | csvquote -u

You can get the csvquote code here: https://github.com/dbro/csvquote

The data file:

$ cat data.txt
"$307.00","$132.34","$30.23"

The AWK script:

$ cat csv.awk
BEGIN { RS = "," }
{ gsub("\"", "", $1);
  print $1 }

The execution:

$ awk -f csv.awk data.txt
$307.00
$132.34
$30.23

parse a csv file that contains commans in the fields with awk

Input

Output

精彩评论

关注公众号

热门标签

图文推荐

parse a csv file that contains commans in the fields with awk

Input

Output

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：