开发者

rule based file parsing

开发者 https://www.devze.com 2023-01-02 05:14 出处:网络
I need to parse a file line by line on given rules. Here is a requirement. file can have multiple lines with different data..

I need to parse a file line by line on given rules.

Here is a requirement.

file can have multiple lines with different data..

01200344545143554145556524341232131
1120034454514355414555652434123213101200344545143554145556524341232131
2120034454514

and rules can be like this.

  • if byte[0,1] == "0" then extract 开发者_JAVA技巧this line to /tmp/record0.dat
  • if byte[0,1] == "1" then extract this line to /tmp/record1.dat
  • if byte[0,1] == "2" then extract this line to /tmp/record2.dat

I am looking for any language which can do this in a fast manner with a very long file size like >2 GB.

Appreciate all the help in advance.

Thanks


It doesn't appear in your list of tags, but I'd use:

sed -n -e '/^0/w /tmp/record0.dat' \
       -e '/^1/w /tmp/record1.dat' \
       -e '/^2/w /tmp/record2.dat' "$@"

You can also do it in the other languages, but for conciseness and probable correctness, in this case, sed is hard to beat.


This will work regardless of the value of the first character so it scales without having to add more rules:

awk '{c=substr($0,0,1); print $0 > "/tmp/record" c ".dat"}' inputfile.dat


awk -vFS= 'NF{print $0>"/tmp/record"$1".dat"}' file
0

精彩评论

暂无评论...
验证码 换一张
取 消