I had a need to convert uTorrent-style ipfilter.dat into a bluetack-style ipfilter file, and wrote this shell script to achieve this:
#!/bin/bash
# read ipfilter.dat-formatted file line by line
# (example: 000.000.000.000-008.008.003.255,000,Badnet
# - ***here, input file's lines/fields are always the same length***)
# and convert into a bluetack.co.uk-formatted output
# (example: Badnet:0.0.0.0-8.8.3.255
# - fields moved around, leading zeros removed)
while read record
do
start=`echo ${record:0:15} | awk -F '.' '{for(i=1;i<=NF;i++)$i=$i+0;}1' OFS='.'`
end=`echo ${record:16:15} | awk -F '.' '{for(i=1;i<=NF;i++)$i=$i+0;}1' OFS='.'`
echo ${record:36:7}:${start}-${end}
done < $1
However, on a 2000-line input file this script takes on average 10(!) seconds to complete - a mere 200 lines/sec.
I'm sure this same result can be achieved with sed, and sed-version is likely to be much faster.
Is there a sed-guru around to suggest a solution for this kind of fixed-positions replacements?
Feel free to suggest a solution in other languages as well - I would enjoy testing a Python or a C version, for exampl开发者_StackOverflowe. A more efficient shell/bash version would be welcome as well.
You could try this.
sed -r 's/^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)-0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+),...,(.*)$/\9:\1.\2.\3.\4-\5.\6.\7.\8/' inputfile
I didn't test the performance but I guess it could be faster than 200 lines/sec.
You will be sacrificing performance using the shell's while read loop on a big file. It is empirically proven that tools such as awk/sed
(and some languages eg Perl/Python/Ruby) are better at iterating big files and processing the lines than the shell's while read loop. Moreover, in your script, while iterating over the lines, you are also piping a few calls to awk
. This is extra overheads.
Ruby(1.9+)
$ cat file
000.000.000.000-008.008.003.255,000,Badnet
001.010.110.111-002.020.220.222,111,Badnet
$ ruby -F"," -ane 'puts "#{$F[-1].chomp}:" + $F[0].gsub(/(00|0)([0-9]+)([.-])/,"\\2\\3")' file
Badnet:0.0.0.0-8.8.3.255
Badnet:1.10.110.111-2.20.220.222
I really wanted to get this to work in a single sed command, but I wasn't able to figure it out. Surely this will still be faster than 200 lines/s though.
sed 's/\.0\{1,2\}/\./g' | sed 's/^0\{1,2\}//'
#!/bin/tclsh
#Regsub TCL script to remove the leading zeros from the ip address.
#Author : Shoeb Masood , Bangalore
puts "Enter the ip address"
set ip [gets stdin]
set list_ip [split $ip .]
foreach index $list_ip {
regsub {^0|^00} $index {\1} index
lappend list_ip2 $index
}
set list_ip2 [join $list_ip2 "."]
puts $list_ip2
精彩评论