开发者

How to select unique XML nodes using Ruby?

开发者 https://www.devze.com 2023-02-11 08:56 出处:网络
I have the following XML, I am trying to get the unique nodes based on the name child node. Original XML:

I have the following XML, I am trying to get the unique nodes based on the name child node.

Original XML:

<products>
  <product>
    <name>White Socks</name>
    <price>2.00</price>
  </product>
  <product>
    <name>White Socks/name>
    <price>2.00</price>
  </product>
  <product>
    <name>Blue Socks</name>
    <price>3.00</price>
  </product>
</products>

What I'm trying to get:

<products>
  <product>
    <name>W开发者_开发问答hite Socks</name>
    <price>2.00</price>
  </product>
  <product>
    <name>Blue Socks</name>
    <price>3.00</price>
  </product>
</products>

I've tried various things but not worth listing here, the closest I got was using XPath but that just returned the names like below. However, this is wrong as I want the full XML as above, not just the node values.

White Socks
Blue Socks

I'm using Ruby and trying to iterate over the nodes like so:

@doc.xpath("//product").each do |node|

Obviously the above currently gets ALL product nodes, whereas I want all unique product nodes (using the child node "name" as the unique identifier)


This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kProdByName" match="product"
  use="name"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "product
    [not(generate-id()
        =
         generate-id(key('kProdByName',name)[1])
         )
    ]"/>
</xsl:stylesheet>

when applied on the provided XML document (corrected to be well-formed):

<products>
    <product>
        <name>White Socks</name>
        <price>2.00</price>
    </product>
    <product>
        <name>White Socks</name>
        <price>2.00</price>
    </product>
    <product>
        <name>Blue Socks</name>
        <price>3.00</price>
    </product>
</products>

produces the wanted, correct result:

<products>
  <product>
    <name>White Socks</name>
    <price>2.00</price>
  </product>
  <product>
    <name>Blue Socks</name>
    <price>3.00</price>
  </product>
</products>

Do note:

  1. The identity rule copies every node "as-is".

  2. The Muenchian method for grouping is used.

  3. There is a single overriding template that excludes any product element that is not the first in its group.


XPath-one-liner (Note this is O(N^2) -- will be very slow on many product elements):

 /*/product[not(name = following-sibling::product/name)]


With XSLT you can use Muenchian grouping to eliminate duplicates as follows:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:key name="prod-by-name" match="product" use="name"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="product[not(generate-id() = generate-id(key('prod-by-name', name)[1]))]"/>

</xsl:stylesheet>
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号