开发者

Bash or Python or Awk to match and modify files

开发者 https://www.devze.com 2022-12-18 07:20 出处:网络
I have a set of 10000 files c1.dat ... c10000.dat. Each of these files contains a line which starts with @ and contains a string with spaces spe开发者_如何学Pythoncific for this file, lije c377.379 6.

I have a set of 10000 files c1.dat ... c10000.dat. Each of these files contains a line which starts with @ and contains a string with spaces spe开发者_如何学Pythoncific for this file, lije c37 7.379 6.23.

I have another set of 10000 files kind of determined_cXXX_send.dat (where XXX goes from 1 to 10000). Each of these files has only one line. Each line is of thsis type:

_1 1 3456.000000 -21 0 -98.112830 -20.326192

What I would like to do is, for each number XXX (between 1 to 10000), get from the cXXX.dat file the string like c37 7.379 6.23 , and add it in the file determined_cXXX_send.dat to the beginning of the file so I get:

c37 7.379 6.23 _1 1 3456.000000 -21 0 -98.112830 -20.326192

I tried with both bash and python but got no good solution.

What would be the best approach?

thanks


In Python, you could do something like that

# loop on all the files
for num in range(1,1000):

    cfile = open ( 'c%u.dat'%num, mode='r')

    # find the specific line
    for line in cfile:
        if line[0]=='@':

            # open the determined file and add the line
            dfile = open( 'determined_c%u_send.dat'%num, mode='a')
            dfile.write( line[1:-1] )
            dfile.close()

    cfile.close()

It's untested, but it should work

EDIT: I realized you wanted to add the line at the beginning of the determined_cXXX_send.dat, not at the end.

So, based on Dennis Williamson's answer, I can also propose the following bash code

for i in {1..2}
do
    mv "determined_c${i}_send.dat" "temp.out"
    cat c1.dat | grep @ | tr -d "@" >  "determined_c${i}_send.dat"
    cat temp.out >> "determined_c${i}_send.dat"
done
rm temp.out


A language basically made for processing text: Perl!


If each of the two types of files only has one line:

for i in {1..10000}
do
    paste "c${i}.dat" "determined_c${i}_send.dat" > c${i}.out && 
    mv "c{$i}.out" "determined_c${i}_send.dat"
done

Edit:

for i in {1..10000}
do
    line=$(grep -o "^c${i}.*")
    line="${line#@*}"
    read data < determined_c${i}_send.dat
    echo "$line $data" > c${i}.out &&
    mv "c{$i}.out" "determined_c${i}_send.dat"
done


Doing this in Python should be pretty trivial. It's probably possible in awk, but sounds a bit too complicated to be fun. It's surely is possible in bash, but programming in bash is for masochists.

I'd go with Python, of the given options, although Perl and Ruby are good options too if you know them.


if "c37 7.379 6.23" is constant, then there's no need to grab this string from cXXX.dat files. But i am guessing this string is dynamic ,and it comes after @., so you can try this

#!/bin/bash
shopt -s nullglob
for file in c{1..1000}.dat
do
    if [ -e "$file" ];then
        tag=${file%.dat}
        while read -r line
        do
            case "$line" in
                @*)
                    mystring=${line##@};;
            esac
        done < "$file"
        if [ -e "determined_${tag}_send.dat" ]; then
            while read -r line
            do
                echo "$mystring $line"
            done < "determined_${tag}_send.dat" > temp
            mv temp "determined_${tag}_send.dat"
        fi
    fi
done

output

$ cat c1.dat
@ c37 7.379 6.23

$ cat determined_c1_send.dat
_1 1 3456.000000 -21 0 -98.112830 -20.326192

$ ./shell.sh
$ cat determined_c1_send.dat
 c37 7.379 6.23 _1 1 3456.000000 -21 0 -98.112830 -20.326192
0

精彩评论

暂无评论...
验证码 换一张
取 消