开发者

Can this be done faster (read file, substitute [sed], write new file)

开发者 https://www.devze.com 2023-01-15 15:09 出处:网络
I use this piece of code in my bash script to read a file containing several hex strings, do some substitution and then write it to a new file. It takes about 30 minutes for about 300 Mb.I\'m wonderin

I use this piece of code in my bash script to read a file containing several hex strings, do some substitution and then write it to a new file. It takes about 30 minutes for about 300 Mb.

I'm wondering if this can be done faster ?

sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
 printf "%b" ${line} >> ${out_file}
 printf '\000\000' >> ${out_file}
done

Update:

I did some testing and got the following results:

The winner is:


sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
    printf "%b" ${line} >> ${out_file}
    printf '\000\000' >> ${out_file}
done

real 44m27.021s

user 29m17.640s

sys 15m1.070s


sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while开发者_StackOverflow read line; do
    printf '%b\000\000' ${line} 
done >> ${out_file}

real 18m50.288s

user 8m46.400s

sys 10m10.170s


export LANG=C
sed 's/$/0000/' ${in_file} | xxd -r -ps >> ${out_file}

real 0m31.528s

user 0m1.850s

sys 0m29.450s



You need xxd command that comes with Vim.

export LANG=C
sed 's/$/0000/' ${in_file} | xxd -r -ps > ${out_file}


This is slow because of the loop in bash. If you can get sed/awk/perl/etc to do the loop, it will be much faster. I can't see how you can do it in sed or awk though. It's probably pretty easy for perl, but I dont know enough perl to answer that for you.

At the very least, you should be able to save a little time by refactoring what you have to:

sed 's,[0-9A-Z]\{2\},\\\\x&,g' ${in_file} | while read line; do
 printf '%b\000\000' ${line} 
done >> ${out_file}

At least this way, you're running printf once per iteration and opening/closing ${out_file} once only.


Switch to a full programming language? Here's a Ruby one-liner:

ruby -ne 'print "#{$_.chomp.gsub(/[0-9A-F]{2}/) { |s| s.to_i(16).chr }}\x00\x00"'


if you have Python and assuming data is simple

$ cat file
99
AB

script:

o=open("outfile","w")
for line in open("file"):
    s=chr(int(line.rstrip(),16))+chr(000)+chr(000)
    o.write(s)
o.close()
0

精彩评论

暂无评论...
验证码 换一张
取 消