I have a set of 10000 files. In all of them, the second line, looks like:
AAA 3.429 3.84
so there is just one space (requirement) between AAA and the two other columns. The rest of lines on each file are completely different and correspond to 10 columns of numbers.
Randomly, in around 20% of the files, and due to some errors, one gets
BBB 3.429 3.84
so now there are two spaces between the first and second column.
This is a big error so I need to fix it, changing from 2 to 1 space in the files where the error takes place.
The first approach I though开发者_如何转开发t of was to write a bash script that for each file reads the 3 values of the second line and then prints them with just one space, doing it for all the files.
I wonder what do oyu think about this approach and if you could suggest something better, bashm python or someother approach.
Thanks
Performing line-based changes to text files is often simplest to do in sed
.
sed -e '2s/ */ /g' infile.txt
will replace any runs of multiple spaces with a single space. This may be changing more than you want, though.
sed -e '2s/^\([^ ]*\) /\1 /' infile.txt
should just replace instances of two spaces after the first block of space-free text with a single space (though I have not tested this).
(edit: inserted 2
before s
in each instance to tie the edit to the second line, specifically.)
Use sed.
for file in *
do
sed -i '' '2s/ / /' "$file"
done
The -i ''
flag means to edit in-place without a backup.
Or use ed!
for file in *
do
printf "2s/ / /\nwq\n" |ed -s "$file"
done
if the error always can occur at 2nd line,
for file in file*
do
awk 'NR==2{$1=$1}1' file >temp
mv temp "$file"
done
or sed
sed -i.bak '2s/ */ /' file* # do 2nd line
Or just pure bash scripting
i=1
while read -r line
do
if [ "$i" -eq 2 ];then
echo $line
else
echo "$line"
fi
((i++))
done <"file"
Since it seems every column is separated by one space, another approach not yet mentioned is to use tr to squeeze all multi spaces into single spaces:
tr -s " " < infile > outfile
I am going to be different and go with AWK:
awk '{print $1,$2,$3}' file.txt > file1.txt
This will handle any number of spaces between fields, and replace them with one space
To handle a specific line you can add line addresses:
awk 'NR==2{print $1,$2,$3} NR!=2{print $0}' file.txt > file1.txt
i.e. rewrite line 2, but leave unchanged the other lines.
A line address can be a regular expression as well:
awk '/regexp/{print $1,$2,$3} !/regexp/{print}' file.txt > file1.txt
This answer assumes you don't want to mess with any except the second line.
#!/usr/bin/env python
import sys, os
for fname in sys.argv[1:]:
with open(fname, "r") as fin:
line1 = fin.readline()
line2 = fin.readline()
fixedLine2 = " ".join(line2.split()) + '\n'
if fixedLine2 == line2:
continue
with open(fname + ".fixed", "w") as fout:
fout.write(line1)
fout.write(line2)
for line in fin:
fout.write(line)
# Enable these lines if you want the old files replaced with the new ones.
#os.remove(fname)
#os.rename(fname + ".fixed", fname)
I don't quite understand, but yes, sed
is an option. I don't think any POSIX compliant version of sed
has an in file option (-i), so a fully POSIX compliant solution would be.
sed -e 's/^BBB /BBB /' <file> > <newfile>
Use sed:
sed -e 's/[[:space:]][[:space:]]/ /g' yourfile.txt >> newfile.txt
This will replace any two adjacent spaces with one. The use of [[:space:]] just makes it a little bit clearer
sed -i -e '2s/ / /g' input.txt
-i: edit files in place
精彩评论