开发者

XML end element is read twice using XMLReader with PHP

开发者 https://www.devze.com 2023-02-12 11:15 出处:网络
I want to read a XML file, using XMLReader but the END ELEMENT is twice called for each element during parsing.

I want to read a XML file, using XMLReader but the END ELEMENT is twice called for each element during parsing.

<publications>
  <article id="Xu86oazdn">
    <title>Learning</title>
    <authors>
      <author>
        <firstname>Michel</firstname>
        <lastname>Browsky</lastname>
      </author>
    </authors>
  </article>
</publications>

This is the piece of code which parse the author entries:

<?php
$xml = new XMLReader();
$xml->open("php://stdin");
$author = null;

while($xml->read()) {

  switch($xml->nodeType) {
    case XMLReader::ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("+" . $xml->name);
          break;
    }

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
开发者_Go百科    }
  }
?>

But strangely, the END_ELEMENT is called twice for each </author>, as shown by the echo messages:

+author
-author
-author

If I replace the echo message by a call to $xml->readOuterXML(), the first END_ELEMENT is the following:

<author>
  <firstname>Michel</firstname>
  <lastname>Browsky</lastname>
</author>

And the second one is the following:

<author/>

What is wrong with my code ? Did I use END_ELEMENT in a wrong way ? What is the right way to detect the end element ?


Add a break statement after the end of the first switch condition on the nodeType:

<?php
$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {

  switch($xml->nodeType) {
    case XMLReader::ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("+" . $xml->name);
          break;
    }

    // THIS LINE IS MISSING
    break;

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }
  }
?>

Add another break after reading the END_ELEMENT, as well, if only for symmetry.

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }

    break;

The problem happened because of the coding style. Simplify the code. For example:

$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {    
  switch($xml->nodeType) {
    case XMLReader::ELEMENT: {
      startElement( $xml->name );
      break;
    }

    case XMLReader::END_ELEMENT: {
      endElement( $xml->name );
      break;
    }
  }
}

There are further simplifications you can make. PHP has an XML marshalling package, but you could also abstract the code into classes. Instances of those classes would then be able to read (or write) themselves from (or to) an XML file. For example:

$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {    
  if( $xml->name == 'author' ) {
    $author = new Author();
    $author->marshall( $xml );
  }
}

This couples the details of how the object is stored with the object itself. Any time you change the Author object, you know you must change how it marshalls itself. You could abstract and extend these concepts even further using appropriate design patterns, XML schemas, and so forth.

Thus your final code might resemble:

$xml = new XMLReader();
$xml->open( "php://stdin" );
$publications = new Publications();
$publications->marshall( $xml );

The Publications object is responsible for reading the XML document and instantiating the appropriate classes whenever their associated XML tags appear:

while($xml->read()) {    
  $article = new Article();
  $article->marshall( $xml );
  add( $article );
}

Use a PHP marshalling framework to save yourself time and effort. Consider XML_Serializer:

  • http://pear.php.net/package/XML_Serializer
0

精彩评论

暂无评论...
验证码 换一张
取 消