开发者

Use matching value of a RegExp to name the output file

开发者 https://www.devze.com 2022-12-26 18:01 出处:网络
I have this file \"file.txt\" which I want to split into many smaller ones. This a piece of it: 0 id:2293 7:0.78235 12:0.69205 17:0.79421 21:0.77818 ..

I have this file "file.txt" which I want to split into many smaller ones. This a piece of it:

0 id:2293 7:0.78235 12:0.69205 17:0.79421 21:0.77818 ..

4 id:2293 7:0.78235 8:0.97904 12:0.69205 17:0.31709 ..

1 id:2294 7:0.78235 8:0.90994 17:0.49058 21:0.59326 ..

Each line of the file has an id field which looks like "id:1" for a line belonging to id 1. For each id in the file, I like to create a file named idid.txt and put all lines that belong to thi开发者_JAVA百科s id in that file. My brute force bash script solution reads as follows.

count=1

while [ $count -lt 19945 ] do

cat file.txt | grep "id:$count " >> ./sets/id$count.txt

count='expr $count + 1'

done

Now this is very inefficient as I have do read through the file about 20.000 times. Is there a way to do the same operation with only one pass through the file? - What I'm probably asking for is a way to use the value that matches for a regular expression to name the associated output file.


$ cat file
0 id:2293 7:0.78235 12:0.69205 17:0.79421 21:0.77818 ..
4 id:2293 7:0.78235 8:0.97904 12:0.69205 17:0.31709 ..
1 id:2294 7:0.78235 8:0.90994 17:0.49058 21:0.59326 ..

$ awk -F"[: ]" '{print $0 > "id_"$3".txt"}' file

$ more id_2293.txt
0 id:2293 7:0.78235 12:0.69205 17:0.79421 21:0.77818 ..
4 id:2293 7:0.78235 8:0.97904 12:0.69205 17:0.31709 ..

$ more id_2294.txt
1 id:2294 7:0.78235 8:0.90994 17:0.49058 21:0.59326 ..


You can build a solution similar to this

Creating multiple csv files from data within a csv file


Try this AWK script:

#!/usr/bin/awk -f
{
    if (match($0, /id:([0-9]+)/, a))
        print $0 >> "file" a[1] ".txt";
}
0

精彩评论

暂无评论...
验证码 换一张
取 消