I'm using one-off perl -pi -e
commands to do simple search and replace from within a bash script. Most of my regexes work fine, until I get to these:
perl -pi -e 's#\<开发者_开发问答\?mso-application.*\<Table.*Rows="1"\>#\<Table\>#s' 1.xml
perl -pi -e 's#\</Table.*#\</Table\>#s' 1.xml
Please don't mind the # marks instead of slashes, I didn't want to escape even more characters. These regexes are supposed to essentially delete chunks of an XML file exported from excel, but aren't working. This seems to be because I'm using logic that applies to strings, and trying to apply it to a file (though I admit I have only a basic understanding of perl's in-place editing).
Is there an alternative way to do this (whether in perl, awk, or sed) that can be issued from within a shell script?
I would recommend that you give up the notion of editing XML files on the command line using regexes and use a proper XML parser instead.
You have perl setup in line processing mode, but chances are the patterns you are trying to match span multiple lines. You will need to expand your perl scripts to read in the entire file, and then run the regexes against the entire file.
From the command line, add the -0777 flag to make perl read the entire file (and make sure you have the /s regex flag to make . match newlines, which you do). So:
perl -pi -0777 -e 's#\<\?mso-application.*\<Table.*Rows="1"\>#\<Table\>#s' 1.xml
perl -pi -0777 -e 's#\</Table.*#\</Table\>#s' 1.xml
A couple of things:
- Avoid using regexes to manipulate XML files because there are better tools for the job. Consider the
XML::Simple
orXML::Twig
modules to achieve the same need. - Seeing that you have multiple search-and-replace operations, replace the one-liners with a proper Perl script and call that from your Bash script instead.
精彩评论