开发者

Bash or python for changing spacing in files

开发者 https://www.devze.com 2022-12-24 09:05 出处:网络
I have a set of 10000 files. In all of them, the second line, looks like: AAA 3.429 3.84 so there is just one space (requirement) between AAA and the two other columns. The rest of lines on each fi

I have a set of 10000 files. In all of them, the second line, looks like:

AAA 3.429 3.84

so there is just one space (requirement) between AAA and the two other columns. The rest of lines on each file are completely different and correspond to 10 columns of numbers.

Randomly, in around 20% of the files, and due to some errors, one gets

BBB  3.429 3.84

so now there are two spaces between the first and second column.

This is a big error so I need to fix it, changing from 2 to 1 space in the files where the error takes place.

The first approach I though开发者_如何转开发t of was to write a bash script that for each file reads the 3 values of the second line and then prints them with just one space, doing it for all the files.

I wonder what do oyu think about this approach and if you could suggest something better, bashm python or someother approach.

Thanks


Performing line-based changes to text files is often simplest to do in sed.

sed -e '2s/  */ /g' infile.txt

will replace any runs of multiple spaces with a single space. This may be changing more than you want, though.

sed -e '2s/^\([^ ]*\)  /\1 /' infile.txt

should just replace instances of two spaces after the first block of space-free text with a single space (though I have not tested this).

(edit: inserted 2 before s in each instance to tie the edit to the second line, specifically.)


Use sed.

for file in *
do
  sed -i '' '2s/  / /' "$file"
done

The -i '' flag means to edit in-place without a backup.

Or use ed!

for file in *
do
  printf "2s/  / /\nwq\n" |ed -s "$file"
done


if the error always can occur at 2nd line,

for file in file*
do
    awk 'NR==2{$1=$1}1' file >temp
    mv temp "$file"    
done

or sed

sed -i.bak '2s/  */ /' file* # do 2nd line

Or just pure bash scripting

i=1
while read -r line
do
  if [ "$i" -eq 2 ];then
    echo $line
  else
    echo "$line"
  fi
  ((i++))
done <"file"


Since it seems every column is separated by one space, another approach not yet mentioned is to use tr to squeeze all multi spaces into single spaces:
tr -s " " < infile > outfile


I am going to be different and go with AWK:

awk '{print $1,$2,$3}' file.txt > file1.txt

This will handle any number of spaces between fields, and replace them with one space

To handle a specific line you can add line addresses:

awk 'NR==2{print $1,$2,$3} NR!=2{print $0}' file.txt > file1.txt

i.e. rewrite line 2, but leave unchanged the other lines.

A line address can be a regular expression as well:

awk '/regexp/{print $1,$2,$3} !/regexp/{print}' file.txt > file1.txt


This answer assumes you don't want to mess with any except the second line.

#!/usr/bin/env python
import sys, os
for fname in sys.argv[1:]:
    with open(fname, "r") as fin:
        line1 = fin.readline()
        line2 = fin.readline()
        fixedLine2 = " ".join(line2.split()) + '\n'
        if fixedLine2 == line2:
            continue
        with open(fname + ".fixed", "w") as fout:
            fout.write(line1)
            fout.write(line2)
            for line in fin:
                fout.write(line)
    # Enable these lines if you want the old files replaced with the new ones.
    #os.remove(fname)
    #os.rename(fname + ".fixed", fname)


I don't quite understand, but yes, sed is an option. I don't think any POSIX compliant version of sed has an in file option (-i), so a fully POSIX compliant solution would be.

sed -e 's/^BBB  /BBB /' <file> > <newfile>


Use sed:

sed -e 's/[[:space:]][[:space:]]/ /g' yourfile.txt >> newfile.txt

This will replace any two adjacent spaces with one. The use of [[:space:]] just makes it a little bit clearer


sed -i -e '2s/  / /g' input.txt

-i: edit files in place

0

精彩评论

暂无评论...
验证码 换一张
取 消