开发者

How can I find and replace text in XML using Perl?

开发者 https://www.devze.com 2022-12-16 08:19 出处:网络
My XML file looks something like this: <doc> <RU1> <conf> <prop name=\"a\" val=\"http://a.org/a.html>

My XML file looks something like this:

<doc>
    <RU1>
       <conf> 
              <prop name="a" val="http://a.org/a.html> 
       </conf>    
    </RU1>
    <RAU1>
     <conf> 
              <prop name="a" val="http://a.org/开发者_运维知识库a.html> 
       </conf>
    </RAU1>
    <RU2>
     <conf> 
              <prop name="a" val="http://a.org/a.html> 
       </conf>
    </RU2>
</doc>

I want to replace "a.org" in the value of the prop field, under all parent tags which start with RU in perl, with "b.com".How do I obtain the changed as an xml file?


Assuming that your XML is well formed (it isn't) you can use a number of CPAN modules for the job. Most of the will involve parsing the document, finding your bit with an XPath query, and printing the document out again.

Here's an example with XML::Twig. I had to fix up the XML to get it to parse.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new(
    twig_handlers => {
        'conf/prop' => sub { $_->{att}{val} =~ s/a.org/b.org/; }
    },
    pretty_print => "indented"
);
$twig->parse(join "", <DATA>);

$twig->print;


__END__
<foo>
<RU1>
   <conf>
          <prop name="a" val="http://a.org/a.html" />
   </conf>
</RU1>
<RAU1>
   <conf>
          <prop name="a" val="http://a.org/a.html" />
   </conf>
</RAU1>
<RU2>
 <conf> 
          <prop name="a" val="http://a.org/a.html" />
   </conf>
</RU2>
</foo>


Grab an XML parser off the CPAN and use it. They're there for a reason.

Once you've done that, it's some fairly simple XPath expressions to get the nodes you want, and then some quick text replacement on the specific attributes themselves.


Using the following stylesheet

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="//*[starts-with(local-name(), 'RU')]//prop/@val">
    <xsl:call-template name="replace-aorg" />
  </xsl:template>

  <xsl:template name="replace-aorg">
    <xsl:param name="text" select="." />
    <xsl:choose>
      <xsl:when test="contains($text, 'a.org')">
        <xsl:value-of select="substring-before($text, 'a.org')"/>
        <xsl:text>b.com</xsl:text>
        <xsl:call-template name="replace-aorg">
          <xsl:with-param name="text" select="substring-after($text, 'a.org')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

and adjusting your XML document to

<doc>
<RU1>
   <conf> 
          <prop name="a" val="http://a.org/a.html" /> 
   </conf>    
</RU1>
<RAU1>
 <conf> 
          <prop name="a" val="http://a.org/a.html" /> 
   </conf>
</RAU1>
<RU2>
 <conf> 
          <prop name="a" val="http://a.org/a.html" /> 
   </conf>
</RU2>
</doc>

Output:

$ xsltproc sty.xml doc.xml
<?xml version="1.0"?>
<doc>
<RU1>
   <conf>
          <prop name="a">http://b.com/a.html</prop>
   </conf>
</RU1>
<RAU1>
 <conf>
          <prop name="a" val="http://a.org/a.html"/>
   </conf>
</RAU1>
<RU2>
 <conf>
          <prop name="a">http://b.com/a.html</prop>
   </conf>
</RU2>
</doc>

So from Perl, that would be something such as

system("xsltproc", "style.xsl", "doc.xml") == 0
  or warn "$0: xsltproc exited " . ($? >> 8);
0

精彩评论

暂无评论...
验证码 换一张
取 消