I am trying to digest content from an AP Webfeed, but for some reason what should be a very simple for-each loop is giving me fits. Feed is xml utf-8 coming from AP, using php xsltprocessor and simplexml.
The issue is that I cannot target the correct node I wish to loop on. The feed itself is the root element which has some properties of the feed, and then several 'entry' children articles. Each one of those has children properties of the entry (like copyright) and then the actual nitf content (lead and body)
Seems like I should be able to just do <xsl:for-each select="feed/entry" />
but if I attempt to refer to 'feed' or 'entry' by name I get nothing. I can't even do <xsl:value-of select="feed/id" />
- oddly I can get //nitf@version to return properly, but can not get it through feed/entry/content/nitf/@version
I am able to address some content with <xsl:for-each select="//nitf"
> to get the article body or any descendants of the nitf node but not higher elements (like //entry). The only way I can get to the content closer to root is by nesting <xsl:for-each="/*" />
starting with the root (feed) and drilling down - which just seems wrong.
If anyone can point me in the right direction, I'd REALLY appreciate it, been frustrating me that something seemingly so simple has me stuck for a while now.
Format is:
<feed>
<id></id>
<published></published>
<entry>
<copyright></copyright>
<content>
<nitf>
<head></head>
<body></body>
</nitf>
</content>
</entry>
<entry>
<content>
<nitf>
<head></head>
<body></body>
</nitf>
</content>
</entry>
</feed>
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<!-- this does loop through nitf -->
<xsl:for-each select="descendant::*/nitf">
<nitf_title></nitf_title>
</xsl:for-each>
<!-- I want to loop on these instead but this never loops -->
<xsl:for-each select="descendant::*/entry">
<entry_title><entry_title>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Sorry I was trying to keep it short so I mocked up the source feed, actual example below
<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:apcm="http://ap.org/schemas/03/2005/apcm" xmlns:apnm="http://ap.org/schemas/03/2005/apnm" xmlns:georss="http://www.georss.org/georss">
<id>urn:publicid:ap.org:31998</id>
<title type="xhtml">
<apxh:div xmlns:apxh="http://www.w3.org/1999/xhtml">
<apxh:span>AP Online National News</apxh:span>
</apxh:div>
</title>
<apcm:Property Name="FeedProperties">
<apcm:Property Name="Entitlement" Id="urn:publicid:ap.org:product:31998" Value="AP Online National News" />
<apcm:Property Name="FeedSequencing">
<apcm:Property Name="sequenceNumber" Id="111835329" />
<apcm:Property Name="minDateTime" Value="2011-06-20T16:56:08.047Z" />
</apcm:Property>
</apcm:Property>
<updated>2011-06-20T16:56:08.047Z</updated>
<author>
<name>The Associated Press</name>
<uri>http://www.ap.org</uri>
</author>
<rights></rights>
<link rel="self" href="http://syndication.ap.org" />
<entry xmlns="http://www.w3.org/2005/Atom">
<id>urn:publicid:ap.org:badf779c9d5246b5acb21430ed2214fb</id>
<title>APFN-US--Gas Drilling-Chemicals</title>
<updated>2011-06-20T16:56:08.047Z</updated>
<published>2011-06-20T16:25:39Z</published>
<author>
<name>AP</name>
</author>
<rights>Copyright 2011</rights>
<content type="text/xml">
<nitf version="-//IPTC//DTD NITF 3.4//EN" change.date="October 18, 2006" change.time="19:30" xmlns="">
<head>
<docdata>
<doc-id regsrc="AP" />
<date.issue norm="20110620T162539Z" />
<ed-msg info="Eds: APNewsNow." />
<doc.rights owner="http://www.ap.org" agent="http://license.icopyright.net" type="none" />
<doc.copyright holder="AP" year="2011" />
</docdata>
</head>
<body>
<body.head>
<hedline>
<hl1 id="headline">Texas becomes 1st to require fracking disclosure</hl1>
<hl2 id="originalHeadline">Texas becomes 1st to require fracking disclosure</hl2>
</hedline>
<distributor>The Associated Press</distributor>
<dateline>
<location>HOUSTON</location>
</dateline>开发者_Go百科
</body.head>
<body.content>
<block id="Main">
<p>HOUSTON (AP) — Texas </p>
</block>
</body.content>
<body.end />
</body>
</nitf>
</content>
<apcm:ContentMetadata xmlns:apcm="http://ap.org/schemas/03/2005/apcm">
<apcm:DateLineLocation City="Houston" Country="USA" CountryArea="TX" CountryAreaName="Texas" CountryName="United States" />
<apcm:Priority Numeric="4" Legacy="r" />
<apcm:ConsumerReady>TRUE</apcm:ConsumerReady>
<apcm:DateLine>HOUSTON</apcm:DateLine>
</apcm:ContentMetadata>
</entry>
<entry xmlns="http://www.w3.org/2005/Atom">
<id>urn:publicid:ap.org:57582781c3a841a2b9849231a4abdb63</id>
<title>US--Medicare-Prevention</title>
<updated>2011-06-20T16:54:57.963Z</updated>
<published>2011-06-20T16:54:43Z</published>
...
As pointed out in the comments above, your issue is related to the namespaces in the source document. For example, you're trying to match on an element named "entry" but the actual element has a qualified name of {http://www.w3.org/2005/Atom}:entry.
You should rewrite your xpath to include the namespace qualifier using a prefix and then map that prefix to the appropriate value. As a result, "entry" becomes "atom:entry" and some enclosing element has the declaration of atom as xmlns:atom="http://www.w3.org/2005/Atom".
Differently from the comments above, you actually do not need any namespace declaration to select the nitf
element, being your element in no namespace (xmlns=""
). In fact, you can select any nitf
element in the document simply using //
. For instance:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="//nitf"/>
</xsl:template>
</xsl:stylesheet>
Will copy in the output all nodes of type nitf
, no matter the namespace their parent derive.
Different namespace instead has entry
, that is xmlns="http://www.w3.org/2005/Atom"
. To correctly select this element you have to declare the namespace prefix in your document, and use it accordingly. For instance:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:atom="http://www.w3.org/2005/Atom">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:for-each select="//nitf">
<!-- iterate on nitf children -->
</xsl:for-each>
<xsl:for-each select="//atom:entry">
<!-- iterate on entry children -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Will iterate first on the children of any {}:nitf
, and then on those of any {http://www.w3.org/2005/Atom}:entry
.
Why your first XPath is not working
In the first XPath:
descendant::*/nitf
you are using the axis to select any nitf
. The use of *
forces XPath to select all nitf
elements, descendant of elements in null namespace. BUT, you have no nitf
element descendant of a null namespace. nitf
is child of elements qualified in the http://www.w3.org/2005/Atom
namespace uri.
The correct way to use the axis here is, after having declared the namespace prefix for http://www.w3.org/2005/Atom
(as in the previous examples):
descendant::atom:*/nitf
or, you can also use the low-level:
descendant::node()/nitf
Finally, the most easy way is (as shown in the first example above):
//nitf
Notice that these last two XPath will select nitf
elements descendant of elements qualified in any namespace. So you should use them when you absolutely know your input document and you are aware of what you are doing.
精彩评论