开发者

Using a dictionary file with sed

开发者 https://www.devze.com 2022-12-22 19:03 出处:网络
I have a blacklist.txt file that contains keywords I want to remove using sed. Here\'s what the blacklist.txt file contain

I have a blacklist.txt file that contains keywords I want to remove using sed.

Here's what the blacklist.txt file contain

winston@linux ] $ cat blacklist.txt   
obscure
keywords
here
...

And here's what I have so far, but currently doe开发者_运维技巧sn't work.

  blacklist=$(cat blacklist.txt);
  output="filtered_file.txt"

  for i in $blacklist;
    do
      cat $input | sed 's/$i//g' >> $output
    done


if you want to remove lines that contains words in that blacklist

grep -v -f blacklist.txt inputfile > filtered_file.txt

if you want to remove just the words alone

awk 'FNR==NR{
 blacklist[$0]
 next
}
{
 for(i=1;i<=NF;i++){
   if ($i in blacklist){
     $i=""
   }
 }
}1' blacklist inputfile > filtered_file.txt


You want to use sed twice: once on the blacklist to create a sed program that eliminates every line in blacklist, and then a second time applying that generated sed script to your real data.

First,

$ sed -e 's@^@s/@' -e 's@$@//g' < blacklist.txt > script.sed

If blacklist.txt looks like

word1
word2
....
wordN

then script.sed will look like

s/word1//g
s/word2//g
...
s/word3//g

You might find the use of @ characters above a bit confusing. The normal way of writing a sed substitute command is s/old/new/. This is quite awkward if either of old or new contain a forward slash. So, sed allows you to to use any character you want immediately after the substitute command. This means that you can write s@foo/bar@plugh/plover@ instead of s/foo\/bar/plugh\/plover/. I think you'll agree that the former is much easier to read.

Once you have script.sed generated, run

$ sed -f script.sed < file > censored-file

You can of course use the new-fangled (ie, less than 20 years old) -i option to do in-place editing.

0

精彩评论

暂无评论...
验证码 换一张
取 消