开发者

XSLT processing recursion depth

开发者 https://www.devze.com 2023-02-20 00:31 出处:网络
First of let me state that I have no clue of XSLT at all. I was given a task to investigate开发者_运维知识库 some JVM dumps of a Java OutOfMemory exception that occurred during XSLT processing.

First of let me state that I have no clue of XSLT at all. I was given a task to investigate开发者_运维知识库 some JVM dumps of a Java OutOfMemory exception that occurred during XSLT processing.

I have found that the OutOfMemory occurred during recursive XSLT processing (we use XALAN).

What I found shocking was that the recursion was >100 000 calls deep.

What are the circumstances under which can a recursion this deep during XSLT processing be acceptable?


Note that the thread stack trace is about 300k lines long and filled with a variation of this until the moment the OutOfMemory occurs:

at org/apache/xalan/transformer/TransformerImpl.executeChildTemplates(Bytecode PC:150(Compiled Code)) at org/apache/xalan/templates/ElemElement.execute(Bytecode PC:352(Compiled Code)) at org/apache/xalan/transformer/TransformerImpl.executeChildTemplates(Bytecode PC:150(Compiled Code))


This can happen when processing a very long sequence with primitive recursion.

Imagine implementing the sum() function with a recursive named template:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:call-template name="sum">
   <xsl:with-param name="pSeq" select="/*/*"/>
  </xsl:call-template>
 </xsl:template>

 <xsl:template name="sum">
  <xsl:param name="pAccum" select="0"/>
  <xsl:param name="pSeq"/>

  <xsl:choose>
   <xsl:when test="not($pSeq)">
     <xsl:value-of select="$pAccum"/>
   </xsl:when>
   <xsl:otherwise>
    <xsl:call-template name="sum">
     <xsl:with-param name="pAccum"
          select="$pAccum+$pSeq[1]"/>
     <xsl:with-param name="pSeq"
          select="$pSeq[position() >1]"/>
    </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

when applied on the following XML document:

<nums>
  <num>01</num>
  <num>02</num>
  <num>03</num>
  <num>04</num>
  <num>05</num>
  <num>06</num>
  <num>07</num>
  <num>08</num>
  <num>09</num>
  <num>10</num>
</nums>

the result is:

55

Now, imagine that nums has 1000000 (1M) num children. This would be a legitimate attempt to find the sum of one million numbers, however most XSLT processors typically crash at a recursion depth at or around 1000.

The solution:

  1. Use tail-recursion (a special kind of recursion where the recursive call is the last instruction in the template). Some XSLT processors recognize tail recursion and optimize it internally to iteration, so there is no recursion and no stack overflow.

  2. Use DVC-style recursion (Divide and Conquer). This works with all XSLT processors. The maximum recursion depth is log2(N) and is feasible for most practical purposes. For example, processing a sequence of 1M items requires a stack depth of only 19.

Here is a DVC implementation of the sum template:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:call-template name="sum">
   <xsl:with-param name="pSeq" select="/*/*"/>
  </xsl:call-template>
 </xsl:template>

 <xsl:template name="sum">
  <xsl:param name="pSeq"/>

  <xsl:variable name="vCnt" select="count($pSeq)"/>

  <xsl:choose>
   <xsl:when test="$vCnt = 0">
     <xsl:value-of select="0"/>
   </xsl:when>
   <xsl:when test="$vCnt = 1">
     <xsl:value-of select="$pSeq[1]"/>
   </xsl:when>
   <xsl:otherwise>
    <xsl:variable name="vHalf" select=
     "floor($vCnt div 2)"/>

    <xsl:variable name="vSum1">
     <xsl:call-template name="sum">
      <xsl:with-param name="pSeq" select=
      "$pSeq[not(position() > $vHalf)]"/>
     </xsl:call-template>
    </xsl:variable>

    <xsl:variable name="vSum2">
     <xsl:call-template name="sum">
      <xsl:with-param name="pSeq" select=
      "$pSeq[position() > $vHalf]"/>
     </xsl:call-template>
    </xsl:variable>

    <xsl:value-of select="$vSum1+$vSum2"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

Using this template to find the sum of one million numbers takes some time, but produces the correct result without crashing.


It's most likely a bug in the XSLT that's resulting in infinite recursion (where "infinite" is defined as "right up until you run out of memory"). Consider the following template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:apply-templates select="/"/>
    </xsl:template>
</xsl:stylesheet>

The only template in the document matches the root element and then calls apply-templates on itself, which starts a process that will never terminate.

0

精彩评论

暂无评论...
验证码 换一张
取 消