开发者

How can I find out the namespace of an element in PHP DOM?

开发者 https://www.devze.com 2023-01-13 05:13 出处:网络
This sounds like a pretty easy question to answer but I haven\'t been able to get it to work.I\'m running PHP 5.2.6.

This sounds like a pretty easy question to answer but I haven't been able to get it to work. I'm running PHP 5.2.6.

I have a DOM element (the root element) which, when I go to $element->saveXML(), it outputs an xmlns attribute:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
...

However, I cannot find any way programmatically within PHP to see that namespace. I want to be able to check whether it exists and what it's set to.

Checking $document->documentElement->namespaceURI would be the obvious answer but that is empty (I've never actually been able to get that to be non-empty). What is generating that xmlns value in the output and how can I read it?

The only practical way I've been able to do this so far is a complete hack - by saving it as XML to a string using saveXML() then reading through that using regular expressions.

Edit:

This may be a peculiarity of loading XML in using loadHTML() rather than loadXML() and then printing it out using saveXML(). When you do that, it appears that for some reason saveXML adds an xmlns attribute even开发者_JAVA百科 though there is no way to detect that this xmlns value is part of the document using DOM methods. Which I guess means that if I had a way of detecting whether the document passed in had been loaded in using loadHTML() then I could solve this a different way.


Like edorian already showed, getting the namespace works fine when the Markup is loaded with loadXML. But you are right that this wont work for Markup loaded with loadHTML:

$html = <<< XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:m="foo" lang="en">
    <body xmlns="foo">Bar</body>
</html>
XML;

$dom = new DOMDocument;
$dom->loadHTML($html);

var_dump($dom->documentElement->getAttribute("xmlns"));
var_dump($dom->documentElement->lookupNamespaceURI(NULL));
var_dump($dom->documentElement->namespaceURI);

will produce empty results. But you can use XPath

$xp = new DOMXPath($dom);
echo $xp->evaluate('string(@xmlns)');
// http://www.w3.org/1999/xhtml;

and for body

echo $xp->evaluate('string(body/@xmlns)'); // foo

or with context node

$body = $dom->documentElement->childNodes->item(0);
echo $xp->evaluate('string(@xmlns)', $body);
// foo

My uneducated assumption is that internally, a HTML Document is different from a real Document. Internally libxml uses a different module to parse HTML and the DOMDocument itself will be of a different nodeType, as you can simply verify by doing

var_dump($dom->nodeType); // 13 with loadHTML, 9 with loadXml

with 13 being a XML_HTML_DOCUMENT_NODE.


With PHP 5.2.6 i've found 2 ways to this:

<?php
$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?'.
       '><html xmlns="http://www.w3.org/1999/xhtml" lang="en"></html>';
$x = DomDocument::loadXml($xml);
var_dump($x->documentElement->getAttribute("xmlns"));
var_dump($x->documentElement->lookupNamespaceURI(NULL));

prints

string(28) "http://www.w3.org/1999/xhtml"
string(28) "http://www.w3.org/1999/xhtml"

Hope thats what you asked for :)


Well, you can do so with a function like this:

function getNamespaces(DomNode $node, $recurse = false) {
    $namespaces = array();
    if ($node->namespaceURI) {
        $namespaces[] = $node->namespaceURI;
    }
    if ($node instanceof DomElement && $node->hasAttribute('xmlns')) {
        $namespaces[] = $xmlns = $node->getAttribute('xmlns');
        foreach ($node->attributes as $attr) {
            if ($attr->namespaceURI == $xmlns) {
                $namespaces[] = $attr->value;
                }
        }
    }
    if ($recurse && $node instanceof DomElement) {
        foreach ($node->childNodes as $child) {
            $namespaces = array_merge($namespaces, getNamespaces($child, vtrue));
        }
    }
    return array_unique($namespaces);
}

So, you feed it a DomEelement, and then it finds all related namespaces:

$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <html xmlns="http://www.w3.org/1999/xhtml" 
         lang="en" 
         xmlns:foo="http://example.com/bar">
           <body>
                <h1>foo</h1>
                <foo:h2>bar</foo:h2>
           </body>
 </html>';
var_dump(getNamespaces($dom->documentElement, true));

Prints out:

array(2) {
  [0]=>
  string(28) "http://www.w3.org/1999/xhtml"
  [3]=>
  string(22) "http://example.com/bar"
}

Note that DomDocument will automatically strip out all unused namespaces...

As for why $dom->documentElement->namespaceURI is always null, it's because the document element doesn't have a namespace. The xmlns attribute provides a default namespace for the document, but it doesn't endow the html tag with a namespace (for purposes of DOM interaction). You can try doing a $dom->documentElement->removeAttribute('xmlns'), but I'm not 100% sure if it will work...

0

精彩评论

暂无评论...
验证码 换一张
取 消