开发者

How to find rows which has a certain char less than a certain count?

开发者 https://www.devze.com 2023-03-11 05:10 出处:网络
I am trying to write a shell/perl command which will give me the row numbers, which has number of fields less than a certain count.

I am trying to write a shell/perl command which will give me the row numbers, which has number of fields less than a certain count. E.g. I have a comma-delimited text file. I am trying to find 开发者_运维技巧those rows which has less than, say 15, fields. So I guess the problem essentially boils down to returning rows which has less than 14 commas.

Can anyone help me with that?

Thanks!


You can do this easily in bash by calling awk. This sort of script is exactly what awk was designed to do.

awk -F, '{ if (NF < 15 ) print NR "," $0 }' fileToTest

-F, tells awk to split each line on the comma char, AND NF (Number_of_Fields) indicates how many fields where split in each line. Change the 15 value as needed to help you validate your files.

Don't forget that CSV files may have commas embedded inside the fields if the field is surrounded by quotes, i.e.

 fld1, "text for, fld2", fld3, fld4,....

Solving that problem is significantly harder Use a tab char to separate your fields (or some other character you can be sure will never appear in your data), and then sleep easy at night ;-)

I hope this helps.


Cute version

perl -lne 'print if tr/,// < 14

tr/x// is a Perl idiom for counting the number of xes in a string.

More flexible version

perl -F, -lane 'print if @F < 15`

-a enables "autosplit mode", -F sets the delimiter to comma, and the code in the -e says to print if there are less than 15 fields. This is nice if you eventually want to do something else with the contents of the fields, since they're available in @F already split on comma.

Properly CSV version

Doesn't make a nice one-liner, but you might consider using Text::xSV or Text::CSV_XS if your data is really CSV and not merely "comma separated" — the difference is that CSV can contain embedded commas, newlines, and other weird things by using quoted fields.


You also asked for Perl. This is not the only way and it assumes the commas are always field delimiters–

perl -ne 'print "$.: $_" if 15 > split/,/' my-comma-file.txt
0

精彩评论

暂无评论...
验证码 换一张
取 消