开发者

splitting an XML file into multiple files using h1 and id

开发者 https://www.devze.com 2023-02-11 22:21 出处:网络
I\'m an XSLT noob. I\'m transforming an XML file into HTML. The resulting files will take the form of .inc files to be used as Server Side Includes. For now, I need to split the XML file at the h1 nod

I'm an XSLT noob. I'm transforming an XML file into HTML. The resulting files will take the form of .inc files to be used as Server Side Includes. For now, I need to split the XML file at the h1 node and write it to multiple .inc files (containing everything between each h1 node) using the h1 id for the filename. The h1 id takes the form of a 'scriptLabel'. Right now, the document splits out ok - BUT simply writes the h1 itself and ignores the content after. What am I doing wrong?

Here's sample XML:

`<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document SYSTEM "RRfront150610.dtd">
<document>
  <section charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
  scriptlabel="Gov-chairman-intro">
    <h1 charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
    开发者_开发问答scriptlabel="Gov-chairman-intro">chairman&#8217;s
    introduction</h1>
    <p charstyle="No Style" pagenum="56"
    parastyle="Gov&#8211;Head-B-CI" scriptlabel="">
      <strong charstyle="No Style" pagenum="56"
      parastyle="Gov&#8211;Head-B-CI" scriptlabel="">Lorem ipsum
      dolor sit amet, consectetur adipiscing elit. Morbi et leo
      purus. Maecenas at metus massa. Donec rutrum tortor ac enim
      tincidunt ut posuere purus aliquam.</strong>
    </p>
    <p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI"
    scriptlabel="">Lorem ipsum dolor sit amet, consectetur
    adipiscing elit. Morbi et leo purus. Maecenas at metus massa.
    Donec rutrum tortor ac enim tincidunt ut posuere purus
    aliquam.</p>
  </section>
</document>`

Here's the XSLT to perform the split:

`<xsl:template match="/">
  <xsl:apply-templates />
</xsl:template>
<xsl:template match="document">
  <xsl:apply-templates />
</xsl:template>
<xsl:template match="h1">
  <xsl:variable name="filename"
  select="concat(@scriptlabel,'.inc')" />
  <xsl:value-of select="$filename" />
  <xsl:result-document href="{$filename}">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:result-document>
</xsl:template>`


In a short answer, this stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="section">
        <xsl:for-each-group select="node()" group-starting-with="h1">
            <xsl:result-document href="{@scriptlabel}.inc">
                <xsl:copy-of select="current-group()"/>
            </xsl:result-document>
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

Serialize this Gov-chairman-intro.inc:

<h1 charstyle="No Style" 
    pagenum="56" 
    parastyle="Gov-Head-A" 
    scriptlabel="Gov-chairman-intro"
 >chairman’s     introduction</h1>
<p charstyle="No Style" 
   pagenum="56" 
   parastyle="Gov–Head-B-CI" 
   scriptlabel="">
    <strong charstyle="No Style" 
                 pagenum="56" 
                 parastyle="Gov–Head-B-CI" 
                 scriptlabel=""
          >Lorem ipsum       dolor sit amet, consectetur adipiscing elit. Morbi et leo       purus. Maecenas at metus massa. Donec rutrum tortor ac enim       tincidunt ut posuere purus aliquam.</strong>
</p>
<p charstyle="No Style" 
   pagenum="56" 
   parastyle="Gov-Body-CI" 
   scriptlabel=""
 >Lorem ipsum dolor sit amet, consectetur     adipiscing elit. Morbi et leo purus. Maecenas at metus massa.     Donec rutrum tortor ac enim tincidunt ut posuere purus     aliquam.</p>

Note: Grouping section children by a starting h1. Copying the whole current group.

Update: Working with section without h1 child and also not starting h1 group.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="section">
        <xsl:for-each-group select="*" group-adjacent="boolean(self::h1)">
            <xsl:if test="not(current-grouping-key())">
                <xsl:variable name="vMark" select="preceding-sibling::h1[1]"/>
                <xsl:result-document
                     href="{((..|$vMark)/@scriptlabel)[last()]}.inc">
                    <xsl:copy-of select="current-group()|$vMark"/>
                </xsl:result-document>
            </xsl:if>
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

With this input:

<document>
    <section charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
             scriptlabel="Gov-chairman-intro">
        <h1 charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
             scriptlabel="Gov-chairman-intro">chairman&#8217;s
             introduction</h1>
        <p charstyle="No Style" pagenum="56"
           parastyle="Gov&#8211;Head-B-CI" scriptlabel="">
            <strong charstyle="No Style" pagenum="56"
                    parastyle="Gov&#8211;Head-B-CI" scriptlabel=""
             >Lorem ipsum dolor sit amet, consectetur adipiscing elit.
              Morbi et leo purus. Maecenas at metus massa. Donec
              rutrum tortor ac enim tincidunt ut posuere purus
              aliquam.</strong>
        </p>
        <p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI"
           scriptlabel="">Lorem ipsum dolor sit amet, consectetur
           adipiscing elit. Morbi et leo purus. Maecenas at metus
           massa. Donec rutrum tortor ac enim tincidunt ut posuere
           purus aliquam.</p>
    </section>
    <section charstyle="No Style" pagenum="56" parastyle="Gov-Head-A"
             scriptlabel="Test-no-H1">
        <p charstyle="No Style" pagenum="56"
           parastyle="Gov&#8211;Head-B-CI" scriptlabel="">
            <strong charstyle="No Style" pagenum="56"
                    parastyle="Gov&#8211;Head-B-CI" scriptlabel=""
             >Lorem ipsum dolor sit amet, consectetur adipiscing elit.
              Morbi et leo purus. Maecenas at metus massa. Donec
              rutrum tortor ac enim tincidunt ut posuere purus
              aliquam.</strong>
        </p>
        <p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI"
           scriptlabel="">Lorem ipsum dolor sit amet, consectetur
           adipiscing elit. Morbi et leo purus. Maecenas at metus
           massa. Donec rutrum tortor ac enim tincidunt ut posuere
           purus aliquam.</p>
    </section>
</document>

Correctly serialize Gov-chairman-intro.inc

<h1 charstyle="No Style" pagenum="56" parastyle="Gov-Head-A" scriptlabel="Gov-chairman-intro">chairman’s
             introduction</h1><p charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel=""><strong charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur adipiscing elit.
              Morbi et leo purus. Maecenas at metus massa. Donec
              rutrum tortor ac enim tincidunt ut posuere purus
              aliquam.</strong></p><p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur
           adipiscing elit. Morbi et leo purus. Maecenas at metus
           massa. Donec rutrum tortor ac enim tincidunt ut posuere
           purus aliquam.</p>

And Test-no-H1.inc

<p charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel=""><strong charstyle="No Style" pagenum="56" parastyle="Gov–Head-B-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur adipiscing elit.
              Morbi et leo purus. Maecenas at metus massa. Donec
              rutrum tortor ac enim tincidunt ut posuere purus
              aliquam.</strong></p><p charstyle="No Style" pagenum="56" parastyle="Gov-Body-CI" scriptlabel="">Lorem ipsum dolor sit amet, consectetur
           adipiscing elit. Morbi et leo purus. Maecenas at metus
           massa. Donec rutrum tortor ac enim tincidunt ut posuere
           purus aliquam.</p>

Note: Group adjacents by "Am I the mark?", copy group and preceding mark.


Your matching on "h1", so its only putting the h1 in the result document.

Can you re-organize your data so that you have...

<section>
  <h1>Content 1</h1>
  <p>...</p>
  <p>...</p>
</section>
<section>
  <h1>Content 2</h1>
  <p>...</p>
  <p>...</p>
</section>

You can rename the section tag to whatever you want, to not break existing code. Then your xslt will look like this

<xsl:template match="section">
  <xsl:variable name="filename"
  select="concat(@scriptlabel,'.inc')" />
  <xsl:value-of select="$filename" />
  <xsl:result-document href="{$filename}">
      <xsl:copy-of select=" ./* " />
  </xsl:result-document>
</xsl:template>
0

精彩评论

暂无评论...
验证码 换一张
取 消