开发者

MS Word Doc: Automating find/replace using Shell Scripts

开发者 https://www.devze.com 2023-04-10 03:52 出处:网络
I have a number of word documents that I\'d like to remove some elements from. What I would like to do is as follows:

I have a number of word documents that I'd like to remove some elements from. What I would like to do is as follows:

  1. Copy and paste the entire contents of the word file (may not be necessary) and mov开发者_如何学JAVAe it into a text file OR Convert .doc to .txt
  2. Using regex: replace \[.*\] with "" AND replace \(.*\) with ""
  3. Save the result to a text file with the same name as the original word document.

Thoughts and direction appreciated. As it stands now, I don't know how to do any of these things programatically. I'm doing this manually as it stands.

If it matters, I'm using Ubuntu 11.04


Since you're open to using plain text, some improvements to your algo:

  1. Use antiword to automate conversion from doc to tx
  2. Use sed to do in-place regex modification: sed -i -e's/bad/good/' file.txt

Update (in response to comment):

The regexes are fine, but I didn't understand the objective completely:

  • if you want to replace occurrences of [foo] & (foo) with "" use:

    sed -i -e's/\[.*\]/""/g' file.txt; sed -i -e's/\(.*\)/""/g' file.txt

  • if you want to replace occurrences [foo] & (foo) with "foo" each use:

    sed -i -e's/\[\(.*\)\]/"\1"/g' file.txt; sed -i -e's/(\(.*\))/"\1"/g' file.txt

0

精彩评论

暂无评论...
验证码 换一张
取 消