I have a blacklist.txt file that contains keywords I want to remove using sed.
Here's what the blacklist.txt file contain
winston@linux ] $ cat blacklist.txt
obscure
keywords
here
...
And here's what I have so far, but currently doe开发者_运维技巧sn't work.
blacklist=$(cat blacklist.txt);
output="filtered_file.txt"
for i in $blacklist;
do
cat $input | sed 's/$i//g' >> $output
done
if you want to remove lines that contains words in that blacklist
grep -v -f blacklist.txt inputfile > filtered_file.txt
if you want to remove just the words alone
awk 'FNR==NR{
blacklist[$0]
next
}
{
for(i=1;i<=NF;i++){
if ($i in blacklist){
$i=""
}
}
}1' blacklist inputfile > filtered_file.txt
You want to use sed
twice: once on the blacklist to create a sed program that eliminates every line in blacklist, and then a second time applying that generated sed script to your real data.
First,
$ sed -e 's@^@s/@' -e 's@$@//g' < blacklist.txt > script.sed
If blacklist.txt
looks like
word1
word2
....
wordN
then script.sed
will look like
s/word1//g
s/word2//g
...
s/word3//g
You might find the use of @
characters above a bit confusing. The normal way of writing a sed
substitute command is s/old/new/
. This is quite awkward if either of old or new contain a forward slash. So, sed
allows you to to use any character you want immediately after the substitute command. This means that you can write s@foo/bar@plugh/plover@
instead of s/foo\/bar/plugh\/plover/
. I think you'll agree that the former is much easier to read.
Once you have script.sed
generated, run
$ sed -f script.sed < file > censored-file
You can of course use the new-fangled (ie, less than 20 years old) -i
option to do in-place editing.
精彩评论