I have a huge (10+ GB) .csv file on a Linux server. The lines look somehow like this:
6;20000327;20000425;990099,0;20000327;LL;UBXO;7;-1;62;F;30;001;NO;NO;wgB;0;99;0002;5530;001;708;196;1;AA;N;N;100;53,81;0;0;0;1;1;;1; 6;20000327;20000425;990099,0;20000425;LL;OLD*;62;62;92;F;30;001;NO;NO;ueB;0;99;0002;XXXX;001;;;1;AA;N;N;;;0;0;1;0;0;;30;
I am searching for a fast script to do the following:
- change any occurrence of
<number>,<number>
to<number>.<number>
- delete the last semicolon of each line
I have especially problems with the second one, because the script shouldn't mind if it is a Linux file or a windows file.
I tried to do it with sed but failed thus far.
[edit]
开发者_开发知识库I finally used a mix of Dennis Williams and SiegeX solutions:
sed 's/;\([0-9]*\),\([0-9]*\);/;\1.\2;/g;s/;\(\r\?\)$/\1/' inputfile
(the part with s/;[[:blank:]]*$// didn't work at my file...)
sed 's/;\([0-9]*\),\([0-9]*\);/;\1.\2;/g;s/;[[:blank:]]*$//' ./infile
$ cat file
6;20000327;20000425;990099,0;20000327;LL;UBXO;7;-1;62;F;30;001;NO;NO;wgB;0;99;0002;5530;001;708;196;1;AA;N;N;100;53,81;0;0;0;1;1;;1;
6;20000327;20000425;990099,0;20000425;LL;OLD*;62;62;92;F;30;001;NO;NO;ueB;0;99;0002;XXXX;001;;;1;AA;N;N;;;0;0;1;0;0;;30;
$ perl -p -e 's/(\d+),(\d+)/\1.\2/g; s/;$//' file
6;20000327;20000425;990099.0;20000327;LL;UBXO;7;-1;62;F;30;001;NO;NO;wgB;0;99;0002;5530;001;708;196;1;AA;N;N;100;53.81;0;0;0;1;1;;1
6;20000327;20000425;990099.0;20000425;LL;OLD*;62;62;92;F;30;001;NO;NO;ueB;0;99;0002;XXXX;001;;;1;AA;N;N;;;0;0;1;0;0;;30
Note: perl handles different line endings for you.
Give this a try:
sed 's/,/./g;s/;\r\?$//' inputfile
To preserve the carriage return if it's there:
sed 's/,/./g;s/;\(\r\?\)$/\1/' inputfile
If you are handy with perl, you You can use a perl one liner to do these things. Here's an example of you might do the number change:
perl -i -pe 's/(\d),(\d)/$1\.$2/' yourfile
be very careful with the -i option, as it causes perl to operate on the existing file in place.
精彩评论